Jump to content

Wikisource:OCR

From Wikisource

See also: Wikisource:ProofreadPage#Text layer extraction from djvu/pdf file

This observatory of OCR systems lists known optical character recognition (OCR) systems which could be useful to Wikimedians. All systems — open, free or paid — are relevant to be listed and documented below. If you have used an effective OCR system, please list it below (optionally with some comments).

Commons.js

Wikisource:Google OCR (old)
Wikisource:Tesseract OCR (new)

Extension

Section to expand.

mw:Help:Extension:Wikisource/Wikimedia OCR. Based on Wikimedia's Google OCR and Tesseract OCR cited above.

Free

Online and free

Section to expand.

Wikimedia https://ocr.wmcloud.org . Based on Wikimedia's Google OCR and Tesseract OCR cited above. Image input only (no pdf).

Other free systems

Kraken https://kraken.re/master/ ("optimized for historical and non-Latin script material")
- models for 17th century French, see [1]
- catalog of several training sets for various languages and types of documents: https://htr-united.github.io/catalog.html

Paid system

Section to expand.

Retrieved from "https://wikisource.org/w/index.php?title=Wikisource:OCR&oldid=929672"