Wikisource:Changing the main page

From Wikisource
Jump to: navigation, search

Layout[edit]

Here is a proposal concerning layout : Main Page alt.

It is independent of the choice of a ranking criterion.

The idea is to simplify the layout and improve legibility. In addition, only the top 10 languages are sorted.

Ranking[edit]

Currently the main page shows the top ten Wikisource subdomains per article count. Some members of the community have manifested worries if that is the most adequate way to sort projects.

This page lists the pros and cons on some options. Edit it only if you have new options to propose or if you have additional pros and cons reasonings. Your comments, questions etc. see please the talk page. For the vote page, see Wikisource:Votes.


question : For simplicity's sake, shall we say that this is only regarding the top ten around the logo? Dovi 14:32, 10 May 2009 (UTC)

in my opinion it makes no sense to have two counting systems at the same time. ThomasV 12:44, 2 June 2009 (UTC)


"Article" count[edit]

  • Pros:
    • Automatic update within the wiki.
  • Cons:
    • Some subdomains have custom namespaces showing information about Author, Portals etc, some subdomains don't have and hosts those pages on the main namespace. The subdomains with no custom namespaces potentially can have more "article" pages than the ones with custom namespaces.
    • Is so easy to inflate the "article" count:
      • splitting large works on more than one "article" page; all Wikisources does it. A book with 30 chapters will result in at least 31 "articles". 30 books with 30 chapters will result on at least 930 "articles";
      • following the page breaks of modern editions of public domain works (the Spanish Wikisource does it on some works). A book with 300 pages will result in at least 301 "articles". 30 books containing 300 pages will result in 9030 "articles";
      • with mass bot addtions of content (the Portuguese Wikisource is importing a public domain dictionary with more than 100-200k entries, one entry per "article");

Number of words [1][edit]

  • Pros:
    • In theory, it may really reflect the content of a Wikisource : but are the figures good, and could the number of words of "Proofread" pages be added ? Enmerkar 18:55, 11 March 2009 (UTC)
  • Cons:
    • The length and number of words used per equivalent sentence is not invariant between languages.
    • It is difficult to part readable words from undecipherable words in Proofread pages: the readable words only ought to be added
    • If transcluded words are accounted for, adding Proofread pages' words would count them twice.

ProofreadPage Statistics - All pages[edit]

  • Pros:
    • Readers can know how many scans they will find, but they must be told that scans are images, not texts.
  • Cons:
    • Is so easy to inflate the ProofreadPage namespace using a bot to extract text from .djvu files (like the ones provided by the Internet Archive).
    • Not all documents are get from scanned printed books. Not all projects use built-in proofread extension for most of documents.
    • Readers cannot know whether they will find readable texts or undecipherable texts.
    • Uncorrected pages that can't be read ought not to be taken into account since they failed to achieve their purpose.

ProofreadPage Statistics - Proofread pages[edit]

  • Pros:
    • Reflect real work of contributors.
  • Cons:
    • RTL bug; lack of OCR in many languages; plenty of other kinds of quality work not counted.
    • As above: Not all documents are get from scanned printed books. Not all projects use built-in proofread extension for most of documents. Therefore don't reflect the real work of contributors.

ProofreadPage Statistics - Validated pages[edit]

  • Pros:
    • Reflect real work of contributors.
  • Cons:
    • As above: RTL bug; lack of OCR in many languages; plenty of other kinds of quality work not counted.
    • As above: Not all documents are get from scanned printed books. Not all projects use built-in proofread extension for most of documents. Therefore don't reflect the real work of contributors.

Page views[edit]

  • Pros:
    • simple, objective
    • already used at wikipedia
    • does not depend on the particularities of each language (such as word size, compactness, etc)
    • does not depend on the technical tools used or not used by subdomains (ProofreadPage)
    • page views yields results that are much more stable than criteria based on counting content, which are subject to 'arms race'
    • traffic on a subdomain reflects its success and impact in the general public. this success depends on both content and quality.
  • Cons:
    • Does not necessarily represent community and content for smaller languages.
    • I suppose it isn't possible to measure how long visitors stay in a page, but isn't there a difference between an entire novel on one page and a two-lines poem? Ought this not be accounted for if some mean were found to measure that?
    • Would user's pages and discussion's pages be accounted for too, or only the main space?

MB's in the database[edit]

  • Pros:
    • Can be simply found in [1] (or measured as size of DB dump)
  • Cons:
    • Includes all (old) revisions of all pages, incl. talk pages etc.

Total size of articles[edit]

Eventually not only in main namespace, but in specified namespaces as Author or Page also.

  • Pros:
    • Size of all important content.
  • Cons:
    • How to simply find it?
    • If the value of a domain is measured by its megabytes only, texts will be any texts without any discrimination. Accumulating old phone books will suffice to fill the library!

Activity[edit]

(??? how to measure it???)

  • Pros:
  • Cons:
    • Our visitors might be more interested in knowing what they wil find in the library than knowing how much effort has been given to build it.

Complex criteria[edit]

Top ten Wikisources on Main Page would be selected combinating more ranks to one final rank. For each particular measure separate rank would be made, then from 0 to 10 points would be given to projects (1st place = 10 points, 11th and next places = 0 points). Finally, points would be summarized to final rank and first ten projects would be placed around the Wikisource logo on main page. If two or more projects in final rank would have the same sum of points, then one of statistics (or higher number of 1st, 2nd... places as in Formula 1) could be deciding.

This calculation would be performed 2–4 times in year. Particular measures could be:

  1. article count,
  2. average size of article (this rank effectively lowers influence of massive storing of very short articles as Portuguese Wikisource now does),
  3. one of proofread statistics,
  4. number of active (or very active) users, e.g. in last year,
  5. page views.
  • Pros:
    • Includes not only size (due to article count), but also quality (due to proofread statistics), activity (due to number of active users) and significancy (due to page views).
  • Cons:
    • Gives weight to things that have zero or close to zero value at some wikis: Proofread page, page views.
    • "Significance" is not a real "pro" because a highly significant collection of literature for a small language will still have far fewer views that an insignificant collection for a large language.
    • complex criteria are difficult to understand and to accept. there will be fights over the right balance, and suspicion that the choices are ad-hoc. ThomasV 20:44, 16 March 2009 (UTC)

Moderately complex criteria[edit]

Top ten Wikisources on Main Page would be selected combinating more ranks to one final rank. For each particular measure separate rank would be made, then from 0 to 10 points would be given to projects (1st place = 10 points, 11th and next places = 0 points). Finally, points would be summarized to final rank and first ten projects would be placed around the Wikisource logo on main page. If two or more projects in final rank would have the same sum of points, then one of statistics (or higher number of 1st, 2nd... places as in Formula 1) could be deciding.

This calculation would be performed 2–4 times in year. Particular measures could be:

  1. article count,
  2. average size of article (this rank effectively lowers influence of massive storing of very short articles as Portuguese Wikisource now does),
  3. number of active (or very active) users, e.g. in last year,
  4. variety of articles: whether wide or not (centuries, genres, target readers. Would the number of categories do for this measure?)
  • Pros:
    • Calculates both content and community activity but without giving undue weight to any single factor that could be misleading (e.g. not only number of articles).

Notes[edit]

  1. See comments here