User talk:Putnik/TesseractOCR.js
Jump to navigation
Jump to search
OCRed text is in Latin script[edit]
Hi @Putnik:, I was testing the script in my common.js in Bengali Wikisource. The OCR is giving output but not in Bengali alphabets, but in Latin alphabets. -- Bodhisattwa (talk) 13:57, 20 May 2019 (UTC)
- Hi @Bodhisattwa:. I tried to do it myself, and there’s really a problem with Bengali. Unfortunately, there is a problem with loading the Tesseract language data for Bengali. I described my observations in GitHub issue, and I hope that the authors of the library will be able to do something about it. putnik (talk) 22:04, 20 May 2019 (UTC)
- Hi @Putnik:, thanks for getting into this. -- Bodhisattwa (talk) 05:01, 21 May 2019 (UTC)
- @Bodhisattwa: May you try again? I copied all script and data files on Toolforge, and everything seems to be working now. putnik (talk) 14:53, 15 June 2019 (UTC)
- @Putnik:, sorry for the late reply. I was pre-occupied elsewhere in real life. It is working but the "Recognizing text" process is taking some time, otherwise it is completely ok. Thanks a lot for working on this. I have added this script into the common.js. By the way, what will happen if there is a release of new trained data. Shall I ping you then to update or shall it be automatically updated? -- Bodhisattwa (talk) 18:28, 17 June 2019 (UTC)
- @Bodhisattwa: May you try again? I copied all script and data files on Toolforge, and everything seems to be working now. putnik (talk) 14:53, 15 June 2019 (UTC)
- Hi @Putnik:, thanks for getting into this. -- Bodhisattwa (talk) 05:01, 21 May 2019 (UTC)
TesseractOCR not working for Punjabi language[edit]
Hi @Putnik:, I tried using TesseractOCR in my Common.js but its not working. Can you help me with that.--Benipal hardarshan (talk) 07:54, 27 October 2019 (UTC)
- @Benipal hardarshan: Please try again. I added Punjabi to the list of available languages. For some reason, I missed it before. Sorry. putnik (talk) 23:13, 28 October 2019 (UTC)
- @putnik: no need for apologies. Thanks for adding Punjabi in the list. But it takes a lots of time to recognize text and I want to use TesseractOCR for books having multiple columns can you do something about it.--Benipal hardarshan (talk) 03:57, 29 October 2019 (UTC)
- @putnik: Just a reminder.--Benipal hardarshan (talk) 07:59, 3 November 2019 (UTC)
Tesseract OCR for vertical texts[edit]
@Putnik:, I am testing TesseractOCR.js for Japanese texts. Because most of the works we handle in ja.wikisource.org are written vertically, I would like to pass the option -psm5 to the command and use jpn_vert.traineddata.gz instead of jpn.traineddata.gz in such cases. Can we make such a modification to TesseractOCR.js? --CES1596 (talk) 01:03, 21 June 2020 (UTC)