Wikisource talk:Google OCR
Where were the Google terms cleared for this usage? There was some discussion on Wikisource-l about the generic Google API terms and the situation was not so clear. Nemo 17:42, 16 September 2016 (UTC)
- @Nemo_bis: That's a good question. I assume this usage was cleared, because Google have donated some number of Vision API requests for this purpose. Whether that extends to other APIs I don't know. It's sounding like it'd be good to be able to get a similar thing going for the Google Drive API (to use its OCR system as well, which appears to be a bit different to the Cloud Vision one). —SWilson (WMF) (talk) 11:20, 18 September 2016 (UTC)
- @Ananth subray: Unfortunately, it doesn't yet support Kannada. :-( The list of available languages is at https://cloud.google.com/vision/docs/languages — there's a note a the top of this wiki page about it. Unsupported languages are: Malayalam, Telugu, Oriya, Gujrati, and Kannada. We're still trying to get Google to use the same OCR engine for this as they use for Google Drive; we'll be sure to post here when we get some news! Sam Wilson 05:25, 19 October 2017 (UTC)
Commons user script
I've been experimenting with a OCR button at Commons, that uses this same service: commons:User:Samwilson/GoogleOCR.js. It can be used to populate the inscription template there. Sam Wilson 04:41, 2 August 2018 (UTC)
Raw OCR Results?
I'm working on a project to extract genes and other biological entities from scientific figures/diagrams like this WNT Pathway, and I'd like to feed the OCR results from your tool into my code that identifies bioentities.
Would it be possible to get the raw OCR results from your tool? There are two reasons the raw results would be useful: 1) positional information and 2) better ability to extract bio-entities. (Because the text on the diagrams doesn't have the typical page/paragraph/sentence/word structure, sometimes the list of words will actually split a single bio-entity or merge multiple bioentities.) Ariutta (talk) 19:23, 25 June 2019 (UTC)
Using Google OCR for old English text
Hi, I'm running a project to upload 3,000 chapbooks from the National Library of Scotland's digitised collections and we're interested in using the Google OCR function instead of Tesseract because it identifies the long f/s letter (ſ) really well. Do you think this would be an acceptable use? https://en.wikisource.org/wiki/Wikisource:WikiProject_NLS Gweduni (talk) 06:57, 30 April 2020 (UTC)