User:Benwbrum/Cuneiform Perl Scripts

From Wikisource
Jump to navigation Jump to search


I'm trying to develop some perl or sed scripts for processing transliterated cuneiform, as found on the Old Hittite and Codex Hammurabi articles.

The constraints are as follows:

  1. Convert input Wikisource to output Wikisource
  2. Render all output as 7-bit ASCII, with special characters HTML-encoded.

The goals are as follows:

  1. Convert bad source encodings such as "0xab" to good source encodings like «
  2. Convert ASCII encodings like $ to standard ANE representation like š (š)
  3. Convert 2 and 3 signs to the accented forms (e.g. u3 becomes ù)
  4. Add subscripts to other numbered signs (e.g. ma4 becomes ma4)
  5. Add superscripts to determinatives (e.g. DINGIR becomes DINGIR or d


o User:Benwbrum/Cuneiform Perl Scripts/Hittite Test 1


o User:Benwbrum/Cuneiform Perl Scripts/Hittite Cleaning Script o User:Benwbrum/Cuneiform Perl Scripts/Akkadian Consonant Script