User:BrolloBot/Projects
Appearance
Project 1
[edit]- Aim
- to upload into new pages (nsPage) OCR coming from djvu file without a text layer. Needs tesseract hOCR coming from converted (ddjvu, imagemagick) and dewrapped (scantailor-cli) images + text refinement (paragraphs, RH, categorization, scannos fixing.... something like fr.source "mise en page") by python.
Project 2
[edit]- Aim
- the same of preject 1, but with a "human" step opening refined text into an external text editor just before uploading into nsPage, for a final review. Here a great job could be done if text editor is an advanced one (dict facilities.... and more).