Index talk:Koht jemne mil̦t vьjet.pdf

From Wikisource
Jump to navigation Jump to search

Notes[edit]

This book uses an alphabet that is mostly Latin, but with some Cyrillic letters mixed in. There is also a letter that looks like a small 8, which I have not been able to find in Unicode. I will transcribe this with ŝ for now, until a better solution is found. Better solution found: S with stroke (Ꞩ, ꞩ)

The following Cyrillic letters are used mixed in with Latin:

  • в
  • є
  • з
  • ь

Some of them possibly have better-suited Latin script variants. I will research this later. Jon Harald Søby (talk) 12:31, 26 February 2019 (UTC)[reply]

@Jon Harald Søby: This is the so-called Unified Northern Alphabet. I've also a relevant publication (An OCR system for the Unified Northern Alphabet) on the topic. Michael.riessler (talk) 12:25, 13 November 2019 (UTC)[reply]
@Michael.riessler: Oh, that is awesome. Do you have any idea how we can apply this OCR in Wikisource? I'm afraid I have no experience with OCR at all, really. Jon Harald Søby (talk) 12:34, 13 November 2019 (UTC)[reply]