User talk:Putnik/TesseractOCR.js

From Wikisource
Jump to navigation Jump to search

OCRed text is in Latin script[edit]

Hi @Putnik:, I was testing the script in my common.js in Bengali Wikisource. The OCR is giving output but not in Bengali alphabets, but in Latin alphabets. -- Bodhisattwa (talk) 13:57, 20 May 2019 (UTC)[reply]

  • Hi @Bodhisattwa:. I tried to do it myself, and there’s really a problem with Bengali. Unfortunately, there is a problem with loading the Tesseract language data for Bengali. I described my observations in GitHub issue, and I hope that the authors of the library will be able to do something about it. putnik (talk) 22:04, 20 May 2019 (UTC)[reply]

TesseractOCR not working for Punjabi language[edit]

Hi @Putnik:, I tried using TesseractOCR in my Common.js but its not working. Can you help me with that.--Benipal hardarshan (talk) 07:54, 27 October 2019 (UTC)[reply]

@putnik: no need for apologies. Thanks for adding Punjabi in the list. But it takes a lots of time to recognize text and I want to use TesseractOCR for books having multiple columns can you do something about it.--Benipal hardarshan (talk) 03:57, 29 October 2019 (UTC)[reply]
@putnik: Just a reminder.--Benipal hardarshan (talk) 07:59, 3 November 2019 (UTC)[reply]

Tesseract OCR for vertical texts[edit]

@Putnik:, I am testing TesseractOCR.js for Japanese texts. Because most of the works we handle in ja.wikisource.org are written vertically, I would like to pass the option -psm5 to the command and use jpn_vert.traineddata.gz instead of jpn.traineddata.gz in such cases. Can we make such a modification to TesseractOCR.js? --CES1596 (talk) 01:03, 21 June 2020 (UTC)[reply]