User:Benwbrum/Cuneiform Perl Scripts

From Wikisource
Jump to navigation Jump to search

Goals[edit]

I'm trying to develop some perl or sed scripts for processing transliterated cuneiform, as found on the Old Hittite and Codex Hammurabi articles.

The constraints are as follows:

  1. Convert input Wikisource to output Wikisource
  2. Render all output as 7-bit ASCII, with special characters HTML-encoded.

The goals are as follows:

  1. Convert bad source encodings such as "0xab" to good source encodings like «
  2. Convert ASCII encodings like $ to standard ANE representation like š (š)
  3. Convert 2 and 3 signs to the accented forms (e.g. u3 becomes ù)
  4. Add subscripts to other numbered signs (e.g. ma4 becomes ma4)
  5. Add superscripts to determinatives (e.g. DINGIR becomes DINGIR or d


Tests[edit]

o User:Benwbrum/Cuneiform Perl Scripts/Hittite Test 1

Scripts[edit]

o User:Benwbrum/Cuneiform Perl Scripts/Hittite Cleaning Script o User:Benwbrum/Cuneiform Perl Scripts/Akkadian Consonant Script

Problems[edit]