User:BrolloBot/Projects

From Wikisource

< User:BrolloBot

Jump to navigation Jump to search

Project 1[edit]

Aim: to upload into new pages (nsPage) OCR coming from djvu file without a text layer. Needs tesseract hOCR coming from converted (ddjvu, imagemagick) and dewrapped (scantailor-cli) images + text refinement (paragraphs, RH, categorization, scannos fixing.... something like fr.source "mise en page") by python.

Project 2[edit]

Aim: the same of preject 1, but with a "human" step opening refined text into an external text editor just before uploading into nsPage, for a final review. Here a great job could be done if text editor is an advanced one (dict facilities.... and more).

Retrieved from "https://wikisource.org/w/index.php?title=User:BrolloBot/Projects&oldid=443253"