User:BrolloBot/Projects

Project 1

Aim: to upload into new pages (nsPage) OCR coming from djvu file without a text layer. Needs tesseract hOCR coming from converted (ddjvu, imagemagick) and dewrapped (scantailor-cli) images + text refinement (paragraphs, RH, categorization, scannos fixing.... something like fr.source "mise en page") by python.

Aim: the same of preject 1, but with a "human" step opening refined text into an external text editor just before uploading into nsPage, for a final review. Here a great job could be done if text editor is an advanced one (dict facilities.... and more).