Wikisource:Wikisource vision development/Wikisource development

From Wikisource
Jump to navigation Jump to search
Filigreed Wikisource Logo.png Wikisource vision development
Wikimedia-logo.svg
On Meta
Wikidata-logo-en.svg
On Wikidata
IntroductionParticipantsCore values proposalWikisource developmentCalendarIdea evaluation


Below you will find a list of suggested projects to transform the core values into a reality. Feel free to create subpages, add more ideas, or discuss existing ideas in the corresponding subpages.

High quality digital books

  • Book Manager is a proposed feature to bundle pages into a single unit, instead of a scattered and loosely-organized set of pages.
Mockup of the form that will be used to organize the book

The BookManager is a project proposed by User:GorillaWarfare to stabilize and expand the Extension:BookManager. This extension would bundle pages into a single unit, instead of a scattered and loosely-organized set of pages.

Project summary

From the proposal page: "This project aims to improve Extension:BookManager so that it can be used to collect a book (from Wikisource, Wikibooks, or similar) into a single unit, instead of a scattered and loosely-organized set of pages.

"There are a number of wikis (for example, Wikisource and Wikibooks) that consist of content that is structured as a book. MediaWiki does not currently have much in the way of support for this structure, and so these wikis are forced to try to adapt the article structure to suit their needs. Wikisource has adapted by using an Index namespace for individual pages, then collecting each chapter (or an entire work, if the work is short) in the main namespace. If the work spans several subpages (in the case of chapters), the pages are linked together using header templates, and an index page of sorts is typically used for a landing page. Wikibooks uses a similar structure, using subpages of an article for each chapter. These adaptations work, but are severely limited and unstandardized.

"Extension:BookManager was created as an attempt to address this issue. The extension currently needs to be stabilized, but it is a good starting point for this project. It can be modified to use a JSON representation of the book, which will neatly collect all the necessary metadata and organizational information. I have created an example at mw:User:GorillaWarfare/Proposal/JSON. This will be editable via a form (see right); users will not need to manipulate the raw JSON. Each book will have a single main page that can be used to interact with the book as a whole—these interactions will include the ability to watchlist an entire book or print/export a book. There are quite a few enhancements that depend on this organizational structure (see Bugzilla), and I hope to tackle some of these as a part of the project."

Read more about this proposal on the Google Summer of Code Proposal.
  • Visual index: One of the biggest hurdles for newcomers is to get familiar with the formatting templates that we use on Wikisource. A way to ease the learning process could be to create a visual index.
This a proposal to create a visual index for Wikisource's format templates. It would feature a page with lorem ipsum text with as many format features as possible. On mouse over it would highlight the feature, and on click it would redirect to the corresponding template. Here there is an example for a visual index used by a music writing software.

How

The simplest way would be a static image with links on certain positions. A more complex way could be an animated svg (examples on commons) or a Flash animation that would react "on mouse over".

Challenges

  • Even if the text is standard for all wikisources, the links to the templates would have to be localized. See also Wikisource:Wikisource common template set.
  • If the Visual Editor is implemented, the situation may change and some templates would be no longer necessary.


Discuss here

Embrace standards

TEI stands for Text Encoding Initiative, which is a consortium that from 1980 maintains the TEI guidelines used in Perseus Project, British National Corpus, FreeDict, and many other projects to encode text in an XML based format. In the frame of the Wikisource vision development, we are evaluating if it would make sense to support this format with an exporting feature, and adapting the guidelines to wikitext in the form of templates where it makes sense so. Initially we would aim for TEI Lite and eventually we would add the function to export ebooks as TEI XML.

How

This can be done using data-* HTML5 attributes for the TEI tags, either implemented in existing templates or in new ones if necessary. Later on the tags can be used to export as TEI XML.

Current situation

There is no XML export format supported. Tpt just made a draft of a XSLT stylesheet, inspired by the docbook one of the TEI project: https://github.com/Tpt/tei2wikitext/blob/master/tei2wikitext.xsl The stylesheet does only the basic conversions and needs to be improved. At one time, it could be tested here: http://tools.wmflabs.org/tei2wikitext/

Difficulties

The TEI guidelines are huge, therefore the first step would be to implement a reduced set of the guidelines called TEI Lite.

The other problem that we face is that each Wikisource has their own set of templates, which means that any change or correction should be synchronized across wikisources. To minimize the effort it is suggested to use this approach: Wikisource:Wikisource common template set.

Moreover, the idea of using TEI tags inside the wikitext is problematic. The best approach would be to allow a particular layer or namespace dedicated to TEI, to avoid confusion of normal users (who don't know it) and even problems with TEI/XML parsers.

Discuss here
  • Text exporting: EPUB (already supported), ODT (already supported), TEI (not supported yet). There are XSLT 2.0 specifications to transform TEI XML documents to XHTML, to LaTeX, to XSL Formatting Objects, to OOXML (docx), and to ePub format.
  • Djvu viewer: both DJVU and PDF formats are supported, however we don't have any web DJVU viewer. It is suggested to adopt the Internet Archive Book reader.
The Internet Archive BookReader is an open source book viewer developed in HTML/JavaScript by the Internet Archive. It currently serves as front end for millions of books in archive.org, plus other organizations. It has been suggested to support it in either on Wikisource or on Commons to provide a better experience when reading DjVu files, which it might support (not natively according to 2011 docs though). As of 2016, the Internet Archive no longer has interest in using DjVu files at least for themselves; it's not clear whether other users of the bookreader adapted it for direct usage of DjVu files. As of 2015, there are no plans for PDF.js to support DjVu. It may be easier to convert Magnus' book2scroll to a MediaWiki extension.
  • Annotations: Wikisource could support some annotation system that would allow to comment texts, serve book quotes on demand or act as an anchor for Wikipedia citations.
An annotation is a note that is made while reading any form of text. This may be as simple as underlining or highlighting passages. There are no standards in this field, but there have been some developments already (Open Annotation Data Model). WS could implement a solution like hypothes.is, Textus, or develop its own.

Related to this there is a GsoC proposal by User:Rjan that aims to integrate the OKFN Annotator as a Wikimedia extension: http://annotator.wmflabs.org/wiki/Main_Page

If successful, it could be used as a basis for the Wikisource Annotation system. Afterwards, annotations in Wikisource could be linked/referenced/transcluded in Wikipedia.

Discuss here
  • Book authority control: Wikidata will allow us to make our book database available to the world, and so other servers can understand that a certain book is the same they want, we need to use common book identifiers. For modern books that will be the ISBN, for older books we'll have to append to our book data OCLC or LCC number. See Wikidata:Books task force for more information.
  • Metadata exporting: Wikisource has an OAI-PMH beta export tool which uses the data in the index pages. It will have to be modified to be able to export metadata from Wikidata once the system is ready.
  • Catalog browsing: by creating a OPDS catalog (http://opds-spec.org), readers will be able to browse our book collection using apps like MegaReader, PageTurner, etc. Drafting stage.

Increase organization and volunteer participation

What do you think about creating a Wikisource User Group?

More efficient way of working

  • Customized Book Uploader: It is not possible to undertake big modifications, but the current Commons uploader allows for some scripting that doesn't need code review.
The book uploader is a proposed customisation on the Extension:UploadWizard targeted to fulfill the needs of the Wikisource community. This is a proposal part of the Wikisource vision development.

Current situation

When a user wants to upload a book, he or she has to go through steps thought for the upload of pictures, therefore there are no specific fields for book data. Moreover, the user has to create an "Index:" page for the file with exactly the same data as in Commons, since the data is not propagated.

Proposed project

This project can be deployed either as modification to the UploadWizard:

  • On the "Upload" screen there would be another button labeled as "Import book" which would activate a similar functionality as now provides the "Internet Archive Import tool".
  • On the "Describe" screen there would be an additional option to import external metadata from external sources either in MARC, BibJson or others. Currently MARCsman exists as a standalone tool
  • Extra data fields for books
  • Store all book meta data in the commons Template:Book
  • Create automatically an "Index:" page on Wikisource with the relevant data.
  • FUTURE: When Wikidata is deployed on Commons and Wikisource, create an item in Wikidata with book meta data and link the fields both in Commons and in Wikisource.

Risks

  • The changes needed on the UploadWizard need code review.
  • Wikidata is still not deployed on sister projects


Discuss here
  • Wikisource common template set: If TEI is a wanted option, it would be interesting to use the same templates in all projects to keep the maintenance effort to a minimum.
In the frame of the Wikisource vision development we are considering to have a set of templates common to all Wikisources. This would allow to keep all templates to the latest version and bug free with minimal effort. It can be combined with a migration to LUA scripting and to TEI. Compares to Wikisource:Shared Scripts.

Current situation

Each Wikisource maintains their own templates, some of them are common but each one is developed independently. Copy and paste of templates is not always easy, since there are dependencies of other (sometimes commons) templates, that again follow different development routes.

Proposal

Define a set of templates that can be used for all Wikisource projects. The documentation and parameter names would be localized using Extension:Translate. The common set of templates residing in wikisource.org would be synchronized across language wikisources using a bot. Custom-developed templates and common templates could coexist, or the presence of this set could be opted in or out.

Difficulties

Template naming, migration, etc. Also that instead of modifying these templates locally the changes would have to be done on wikisource.org, so the other wikisources would benefit of the changes too.

Discuss here
  • Author creation form (for linking/creating authors in Wikidata): It would be a form to create simultaneously an Author page and a Wikidata article. See WS-IT Author pages (edition mode) for some ideas about that kind of form.

Work with other entities to the same goal

  • Measuring tools: after contacting some organizations, some typical question that arise are: "how do volunteers know which books are more demanded for transcription?" "How do you measure the times a transcription has been downloaded or viewed?" We don't have tools to collect that kind of information (yet).
  • Open Library partnership: Ongoing conversations about a possible partnership between OL and WS.
Open Library is a project of the non-profit Internet Archive. Open Library claims to have 6 million authors, 25 million edition records, and about three million readable books are available as scanned or born-digital e-books.

Which partnership could be possible

From WS towards IA:

  • WS could link OL works to proofread WS editions
  • Wikisource could upload back into the Internet Archive the proofread editions. Hurdle: WS has no means (yet) of embedding proofread text back into the Djvu files.
  • Books in Commons that are missing in the Internet Archive could be linked from OL or uploaded via Archive-It

From OL/IA towards IA:

  • An indication when there is a proofread work in Wikisource.
  • An invitation to proofread texts in WS when that book is present.

Status

OL is building a working community, not that many resources are available, but there is openness for a collaboration.

Discuss here
  • Google Books dialogue: there has been some dialogue with a Google engineer about ways of collaborating. Biggest stumbling block at the moment for running some experiments is that we don't have proper way of synchronizing data bases, for that it is necessary to have a book authoring in place (see above) and use OCLC or LCC identifiers as matching codes. Wikidata and the Books task force are the key to solve this.
  • OCLC connection: Wikipedian in residence Max Klein has offered support to increase interaction with OCLC.
  • GLAM parnerships: what do GLAMs seek in Wikisource? What features and improvements do they want? How can we ease collaboration?
  • Open Access: Wikisource allows to host every document published under a free license. But it's not easy to deal with digital-born documents, as the software (Proofread extension) is made to transcribe digitized paper books. Can Wikisource become a more integrated digital library storing also free licensed scholarly articles, beyond PDFs? How to deal with HTML or PDF articles? Daniel Mietchen has written a script to convert PubMed articles in HTML, and we could customize it for importing articles into Wikisource.

Write your own

  • Alignment tools for aligning different versions of a work in a given language. Imagine having 3 versions of a Shakespeare work in English, each both paginated according to the original publication style, and connected to one another. Alignment is useful both for scanning the same section of different works, and for eventually aligning comments or annotations across editions.
  • Translation tools for generating free translations of original works, or aligning existing (partial and complete) translations into a wikisource text. This involves both tools for translating, and tools for aligning translations -- similar to alignment tools within a single language.