User:BrolloBot/Projects

From Wikisource
Jump to navigation Jump to search

Project 1[edit]

Aim
to upload into new pages (nsPage) OCR coming from djvu file without a text layer. Needs tesseract hOCR coming from converted (ddjvu, imagemagick) and dewrapped (scantailor-cli) images + text refinement (paragraphs, RH, categorization, scannos fixing.... something like fr.source "mise en page") by python.

Project 2[edit]

Aim
the same of preject 1, but with a "human" step opening refined text into an external text editor just before uploading into nsPage, for a final review. Here a great job could be done if text editor is an advanced one (dict facilities.... and more).