Wikisource:TEI

From Wikisource
Jump to navigation Jump to search

TEI stands for Text Encoding Initiative, which is a consortium that from 1980 maintains the TEI guidelines used in Perseus Project, British National Corpus, FreeDict, and many other projects to encode text in an XML based format. In the frame of the Wikisource vision development, we are evaluating if it would make sense to support this format with an exporting feature, and adapting the guidelines to wikitext in the form of templates where it makes sense so. Initially we would aim for TEI Lite and eventually we would add the function to export ebooks as TEI XML.

How[edit]

This can be done using data-* HTML5 attributes for the TEI tags, either implemented in existing templates or in new ones if necessary. Later on the tags can be used to export as TEI XML.

Current situation[edit]

There is no XML export format supported. Tpt just made a draft of a XSLT stylesheet, inspired by the docbook one of the TEI project: https://github.com/Tpt/tei2wikitext/blob/master/tei2wikitext.xsl The stylesheet does only the basic conversions and needs to be improved. At one time, it could be tested here: https://tei2wikitext.toolforge.org/

Difficulties[edit]

The TEI guidelines are huge, therefore the first step would be to implement a reduced set of the guidelines called TEI Lite.

The other problem that we face is that each Wikisource has their own set of templates, which means that any change or correction should be synchronized across wikisources. To minimize the effort it is suggested to use this approach: Wikisource:Wikisource common template set.

Moreover, the idea of using TEI tags inside the wikitext is problematic. The best approach would be to allow a particular layer or namespace dedicated to TEI, to avoid confusion of normal users (who don't know it) and even problems with TEI/XML parsers.

Discuss here