Wikisource:Wikisource vision development/Wikisource development
|Wikisource vision development
Below you will find a list of suggested projects to transform the core values into a reality. Feel free to create subpages, add more ideas, or discuss existing ideas in the corresponding subpages.
High quality digital books
- Book Manager is a proposed feature to bundle pages into a single unit, instead of a scattered and loosely-organized set of pages.
The BookManager is a project proposed by User:GorillaWarfare to stabilize and expand the Extension:BookManager. This extension would bundle pages into a single unit, instead of a scattered and loosely-organized set of pages.
From the proposal page: "This project aims to improve Extension:BookManager so that it can be used to collect a book (from Wikisource, Wikibooks, or similar) into a single unit, instead of a scattered and loosely-organized set of pages.
"There are a number of wikis (for example, Wikisource and Wikibooks) that consist of content that is structured as a book. MediaWiki does not currently have much in the way of support for this structure, and so these wikis are forced to try to adapt the article structure to suit their needs. Wikisource has adapted by using an Index namespace for individual pages, then collecting each chapter (or an entire work, if the work is short) in the main namespace. If the work spans several subpages (in the case of chapters), the pages are linked together using header templates, and an index page of sorts is typically used for a landing page. Wikibooks uses a similar structure, using subpages of an article for each chapter. These adaptations work, but are severely limited and unstandardized.
"Extension:BookManager was created as an attempt to address this issue. The extension currently needs to be stabilized, but it is a good starting point for this project. It can be modified to use a JSON representation of the book, which will neatly collect all the necessary metadata and organizational information. I have created an example at mw:User:GorillaWarfare/Proposal/JSON. This will be editable via a form (see right); users will not need to manipulate the raw JSON. Each book will have a single main page that can be used to interact with the book as a whole—these interactions will include the ability to watchlist an entire book or print/export a book. There are quite a few enhancements that depend on this organizational structure (see Bugzilla), and I hope to tackle some of these as a part of the project."
- Visual index: One of the biggest hurdles for newcomers is to get familiar with the formatting templates that we use on Wikisource. A way to ease the learning process could be to create a visual index.
- Even if the text is standard for all wikisources, the links to the templates would have to be localized. See also Wikisource:Wikisource common template set.
- If the Visual Editor is implemented, the situation may change and some templates would be no longer necessary.
- Text encoding: Suggested TEI XML markers to complement wikitext/HTML. The TEI XML is widely used by the digital humanities community.
This can be done using data-* HTML5 attributes for the TEI tags, either implemented in existing templates or in new ones if necessary. Later on the tags can be used to export as TEI XML.
There is no XML export format supported. Tpt just made a draft of a XSLT stylesheet, inspired by the docbook one of the TEI project: https://github.com/Tpt/tei2wikitext/blob/master/tei2wikitext.xsl The stylesheet does only the basic conversions and needs to be improved. At one time, it could be tested here: https://tei2wikitext.toolforge.org/
The TEI guidelines are huge, therefore the first step would be to implement a reduced set of the guidelines called TEI Lite.
The other problem that we face is that each Wikisource has their own set of templates, which means that any change or correction should be synchronized across wikisources. To minimize the effort it is suggested to use this approach: Wikisource:Wikisource common template set.
Moreover, the idea of using TEI tags inside the wikitext is problematic. The best approach would be to allow a particular layer or namespace dedicated to TEI, to avoid confusion of normal users (who don't know it) and even problems with TEI/XML parsers.
- Text exporting: EPUB (already supported), ODT (already supported), TEI (not supported yet). There are XSLT 2.0 specifications to transform TEI XML documents to XHTML, to LaTeX, to XSL Formatting Objects, to OOXML (docx), and to ePub format.
- Djvu viewer: both DJVU and PDF formats are supported, however we don't have any web DJVU viewer. It is suggested to adopt the Internet Archive Book reader.
- Annotations: Wikisource could support some annotation system that would allow to comment texts, serve book quotes on demand or act as an anchor for Wikipedia citations.
If successful, it could be used as a basis for the Wikisource Annotation system. Afterwards, annotations in Wikisource could be linked/referenced/transcluded in Wikipedia.
- Book authority control: Wikidata will allow us to make our book database available to the world, and so other servers can understand that a certain book is the same they want, we need to use common book identifiers. For modern books that will be the ISBN, for older books we'll have to append to our book data OCLC or LCC number. See Wikidata:Books task force for more information.
- Metadata exporting: Wikisource has an OAI-PMH beta export tool which uses the data in the index pages. It will have to be modified to be able to export metadata from Wikidata once the system is ready.
- Catalog browsing: by creating a OPDS catalog (http://opds-spec.org), readers will be able to browse our book collection using apps like MegaReader, PageTurner, etc. Drafting stage.
Increase organization and volunteer participation
More efficient way of working
- Customized Book Uploader: It is not possible to undertake big modifications, but the current Commons uploader allows for some scripting that doesn't need code review.
When a user wants to upload a book, he or she has to go through steps thought for the upload of pictures, therefore there are no specific fields for book data. Moreover, the user has to create an "Index:" page for the file with exactly the same data as in Commons, since the data is not propagated.
This project can be deployed either as modification to the UploadWizard:
- On the "Upload" screen there would be another button labeled as "Import book" which would activate a similar functionality as now provides the "Internet Archive Import tool".
- On the "Describe" screen there would be an additional option to import external metadata from external sources either in MARC, BibJson or others. Currently MARCsman exists as a standalone tool
- Extra data fields for books
- Store all book meta data in the commons Template:Book
- Create automatically an "Index:" page on Wikisource with the relevant data.
- FUTURE: When Wikidata is deployed on Commons and Wikisource, create an item in Wikidata with book meta data and link the fields both in Commons and in Wikisource.
- The changes needed on the UploadWizard need code review.
- Wikidata is still not deployed on sister projects
- Wikisource common template set: If TEI is a wanted option, it would be interesting to use the same templates in all projects to keep the maintenance effort to a minimum.
Each Wikisource maintains their own templates, some of them are common but each one is developed independently. Copy and paste of templates is not always easy, since there are dependencies of other (sometimes commons) templates, that again follow different development routes.
Define a set of templates that can be used for all Wikisource projects. The documentation and parameter names would be localized using Extension:Translate. The common set of templates residing in wikisource.org would be synchronized across language wikisources using a bot. Custom-developed templates and common templates could coexist, or the presence of this set could be opted in or out.
Template naming, migration, etc. Also that instead of modifying these templates locally the changes would have to be done on wikisource.org, so the other wikisources would benefit of the changes too.
- Author creation form (for linking/creating authors in Wikidata): It would be a form to create simultaneously an Author page and a Wikidata article. See WS-IT Author pages (edition mode) for some ideas about that kind of form.
Work with other entities to the same goal
- Measuring tools: after contacting some organizations, some typical question that arise are: "how do volunteers know which books are more demanded for transcription?" "How do you measure the times a transcription has been downloaded or viewed?" We don't have tools to collect that kind of information (yet).
- Open Library partnership: Ongoing conversations about a possible partnership between OL and WS.
Which partnership could be possible
From WS towards IA:
- WS could link OL works to proofread WS editions
- Wikisource could upload back into the Internet Archive the proofread editions. Hurdle: WS has no means (yet) of embedding proofread text back into the Djvu files.
- Books in Commons that are missing in the Internet Archive could be linked from OL or uploaded via Archive-It
From OL/IA towards IA:
- An indication when there is a proofread work in Wikisource.
- An invitation to proofread texts in WS when that book is present.
OL is building a working community, not that many resources are available, but there is openness for a collaboration.
- Google Books dialogue: there has been some dialogue with a Google engineer about ways of collaborating. Biggest stumbling block at the moment for running some experiments is that we don't have proper way of synchronizing data bases, for that it is necessary to have a book authoring in place (see above) and use OCLC or LCC identifiers as matching codes. Wikidata and the Books task force are the key to solve this.
- OCLC connection: Wikipedian in residence Max Klein has offered support to increase interaction with OCLC.
- GLAM parnerships: what do GLAMs seek in Wikisource? What features and improvements do they want? How can we ease collaboration?
- Open Access: Wikisource allows to host every document published under a free license. But it's not easy to deal with digital-born documents, as the software (Proofread extension) is made to transcribe digitized paper books. Can Wikisource become a more integrated digital library storing also free licensed scholarly articles, beyond PDFs? How to deal with HTML or PDF articles? Daniel Mietchen has written a script to convert PubMed articles in HTML, and we could customize it for importing articles into Wikisource.
Write your own
- Alignment tools for aligning different versions of a work in a given language. Imagine having 3 versions of a Shakespeare work in English, each both paginated according to the original publication style, and connected to one another. Alignment is useful both for scanning the same section of different works, and for eventually aligning comments or annotations across editions.
- Translation tools for generating free translations of original works, or aligning existing (partial and complete) translations into a wikisource text. This involves both tools for translating, and tools for aligning translations -- similar to alignment tools within a single language.