Wikisource:ProofreadPage

From Wikisource
Jump to navigation Jump to search

This page provides information about the Proofreadpage extension, and how it is used at Wikisource. It is suited mostly to admins configuring their wiki, and to users who already know how to use the extension. If you are a beginner looking for a general introduction, see en:Help:Proofread.

Use the discussion page to report bugs here, request features you would like to see, and participate in the discussions.

See also:

Todolist[edit]

  • <div class="pagetext"> should not contain the header & footer fields, but only the transcluded part of pages. This is needed for two-column books. This requires an update of all pages using the database : for the moment, javascript workaround
  • <div class="pagetext"> should be followed by \n
  • float images in paragraph : do not use \n to glue pages. use ' ' in some cases ?
    • This break template:nop on en which protect the last linefeed on a page to be removed, replacing '\n' by ' ' generate the following sequence in this case "\n<space>first line on the next page" : which MediaWiki handle as a <pre>. See [1] where &#32; is proposed to work around this problem.
  • deprecate pagenum template
  • layout integration: remove containers
  • two columns : use magic templates instead of JavaScript classes
  • generated code by transclusion through <pages is wrapped by a <div></div>, this is buggy when we need to use multiple tag pages on a page without enclosing each part in an HTML block tag. The code show adding a div is intentional
// wrap the output in a div, to prevent the parser from inserting paragraphs
$out = "<div>\n$out\n</div>";
$out = $parser->recursiveTagParse( $out );
return $out;

two ways to fix it, remove the outer <div> at return point or understand if there is not a better way to avoid this <p> insertion.

  • Empty pages are not displayed in the ribbon displaying the proofread status of transcluded pages. This misleads the reader into thinking that the status of the current text is good, even if it's incomplete. In toc mode, empty pages are correctly displayed.
Possible solution: fix the way the total page count is computed in non-toc mode
array( 'templatelinks', 'page' ),
array( 'COUNT(page_id) AS count' ),
array( 'tl_from' => $id, 'tl_namespace' => $page_ns_index ),
__METHOD__,
null,
array( 'page' => array( 'LEFT JOIN', 'page_title=tl_title AND page_namespace=tl_namespace' ) )
The LEFT JOIN part filters out non-existing pages. The query w/o the left join has been tried on the toolserver and it returned the right number of pages.
  • (low priority and probably not necessary) add a {{{source}}} parameter for the MediaWiki:Proofreadpage_header_template template containing the name of the index page. This could be useful to insert links to the index page in the template, but maybe redundant with the source tab. Zaran 13:00, 31 August 2011 (UTC)[reply]
  • page numbering: in some case the page number is missing for the last page, an example on la [2] - the problem appears to be caused when pages are skipped
  • This appears to be caused by the last page number (which is processed first) having a y-position difference between it and the initial y-position being computed to be negative. You can solve this by adding a dummy page number at the end of the page, processing the page numbers and removing it again. Example replacement init_page_numbers() can be found here. If the initial y-position could be computed correctly in the first place, that would be better.Inductiveload 05:04, 9 September 2011 (UTC)[reply]
I see, you add a dummy #pagenum span acting as a guard when calculating offset of page_numbers(). I see the initial call to refresh_pagenumbers() is protected but refresh_pagenumbers() is also called when layout change, isn't it bugged in this case too or the initial setup is sufficient ? — Phe 08:57, 9 September 2011 (UTC)[reply]
It was bugged, when changing the layout the last page number was lost again, I applied what you did but directly in refresh_pagenumbers() not in init_page_numbers(). Thanks to have figured out a fix for this longstanding problem. — Phe 17:22, 9 September 2011 (UTC)[reply]
but layout 3 is still broken on la: — Phe 18:32, 9 September 2011 (UTC)[reply]
Fixed now [3]Phe 19:22, 9 September 2011 (UTC)[reply]
  • Yet another trouble with pagenumber [4] page 63 to 66, page number overlap with chrome Mac and linux (14.0.835.186 64 bits for linux), not yet checked if the last change described above is related. — Phe 13:00, 1 October 2011 (UTC)[reply]

Mediawiki update 18/04/2012[edit]

New options in order to improve transclusion system of multi-pages books (with .djvu or .pdf file):
step
Transclude only one page on n. By example : <pages from=1 to=10 step=2 /> show the 1st, 3rd, 5th,7r and 9th pages.
exclude
Don't include following pages. By example : <pages from=1 to=10 exclude="2-5,9" /> show the 1st, 6th, 7th, 8th and 10th pages.
include
Include following pages. By example : <pages include="2-5,9" /> show the 2th, 3th, 4th, 5th and 9th pages.

We can, of course, use all the attributes on the same tag. By example <pages from=1 to=10 include="31" exclude="2-4" step="2" /> will show 1st, 5th, 7th, 9th and 31th pages.

MediaWiki update 03/10/2011[edit]

MediaWiki update 16/02/2011[edit]

  • Special:PagesWithoutScans (not deployed yet)
  • New syntax for DoubleWiki: matching phrases can be added inside the "align" template. see example
  • Index pages can have a CSS field[1]. It contains classes to be added to the pages. example: book with two columns. The <div class="pagetext"> is hidden.
  • TOC pages : when <pages/> is used without "from" and "to" a proofreading status indicator for the whole book is displayed, and the table of contents of the book is transcluded from the index page. example1, example2
  • The <ref> tag accepts a "follow" parameter : <ref follow="blah">. Use it for footnotes spread over multiple pages, for the first part of the footnote use <ref name="blah"> and subsequent parts of footnote use <ref follow="blah">. There is no change to how footnotes are collected on the target page.
  • In order to optimize speed and bandwidth, the default width of the image in edit mode is set to 1000 pixels. For books that need more resolution, add a pixel value to the "width" field to Proofreadpage_index_template. here is an example
  • Pages that have quality=0 are transcluded without page number by the <pages/> command. This allows you to transclude pages of a multilingual book without the line breaks induced by q0 pages.
  • Wheel zoom is fixed.

MediaWiki update 09/04/2010[edit]

  • New zoom : click in the image to activate the zoom. Then drag the mouse to select a rectangular area to be zoomed into.
  • The Vector skin is now supported
  • Pagecounts in Special:Indexpages are now correctly updated. Index pages that have a wrong pagecount may still need to be purged.
  • A new bug appeared, that causes the toolbar to disappear. It is caused by a change in skins/common/edit.js, which resulted in an incompatibility. The bug is fixed in svn, but the fix will need to be activated. In the meantime, the bug can be worked around by choosing to display headers and footers by default. (set proofreadpage_default_headers to true in your local javascript, or check the corresponding option in your preferences). fixed
  • The <pages/> command now requires "from" or "to" parameters in order to transclude pages. Without one of those parameters, it can be used to display a header.

MediaWiki update 24/09/2009 - summary of changes[edit]

The code was updated today. The following new features are available:

  • A coloured proofreading status indicator is displayed in the main namespace, under the title of pages that use transclusion. It shows the proofreading status of transcluded pages from the "Page" namespace. It also contains a (hidden) backlink to the index page, that can be captured by local javascript.
  • A new special page is available at Special:IndexPages. It displays a similar proofreading indicator for index pages. Indexes are sorted using the number of pages proofread and validated. Most advanced projects are displayed first. Index pages will need to be purged in order to appear in the special page.
  • The <pages> command can display a header, with up-to-date citation information about the transcluded book. This information is extracted from the fields of the index page. The header itself is a template in the MediaWiki namespace, that needs to be setup by sysops. It is possible to pass a local parameter to this template, so that users can define several headers with different styles.
  • Headers generated by the <pages> command can also display navigation links (that is, 'next chapter', 'previous chapter'). The navigation links are found using the list of links to ns-main that are on the index page. The pages may be renamed without updating the navigation links.
  • The <pages> command can perform section transclusion, for the first and last page of the displayed interval. It uses parameters "fromsection" and "tosection".
  • The "Related changes" links now work correctly with index pages that use <pagelist/>
  • The edit window of the "page" namespace sends a formular with three textboxes (header, text, footer) to the server, instead of relying on javascript to concatenate them. This should fix some browser compatibility issues. The PageQuality template was replaced with a parser hook.

The following features were postponed :

  • Text layer extraction from pdf files.
  • Configurable page headers and footers are further delayed.


MediaWiki update 15/06/2009 - summary of changes[edit]

The code was successfully updated. To update your browser, type Ctrl-Shift-R.

The following new features are active :

  • New page status for pages where that do not require double proofreading. (see below)
  • Text layer extraction from djvu/pdf file (see below)
  • The old zoom was restored for view mode
  • New zoom for the edit window : use mouse wheel to zoom, and mouse drag to move the image.
  • Configurable Headers and Footers
  • The "pagelist" command now accepts "from ... to" parameters. (see below)
  • New pages command for easy transclusion of a series of page (see below)
  • Edit options can be set as gadgets


Text layer extraction from djvu/pdf file[edit]

DjVu and PDF files may contain a text layer, typically for the OCR text. This text is extracted when a page is edited for the first time, and added to the edit window.

Examples
Configuration

The file description page might need to be purged if the djvu/pdf file was uploaded before the feature was added.

Configurable Headers and Footers[edit]

The default content of page headers and footers can be configured in Mediawiki:Proofreadpage_default_header and Mediawiki:Proofreadpage_default_footer.

In addition, this default value can be adapted to each book. For this, admins need to add 'header' and 'footer' fields to the index pages.

Proofreading path[edit]

ProofreadPage has five quality levels :

Without text
not yet created page Not proofread Proofread Validated
Problematic

The <pagelist/> tag[edit]

Used on index pages, to display links to pages. The name of the index page must match the name of the djvu/pdf file.

Syntax
<pagelist from="X" to="Y" Z="foo" AtoB="bar" />

where X, Y, Z, A, B are page numbers

The from=... to=... parameters define an interval of pages. Example:

<pagelist from=10 to=100 />

The AtoB=... parameter applies a style to an interval of pages. Style parameters may also be applied to a single page.

It is also possible to apply different styles to odd and even pages with AtoBeven=... and AtoBodd=.... For example the following code:

<pagelist 1to7odd="normal" 1to7even="-" />

will produce: 1 - 3 - 5 - 7

Available styles are:

  • normal: Arabic numerals (1, 2, 3 etc.)
  • roman: lower-case Roman numerals (i, ii, iii etc.)
  • highroman: upper-case Roman capital numerals (I, II, III etc.)
  • folio: counts every leaf (folio) instead of page. The front side of the leaf is labelled r for "recto", the back side v for "verso". So 1to4=folio gives: 1r 1v 2r 2v
  • folioroman: as the above, but with lower-case Roman numerals (ir iv iir iiv)
  • foliohighroman: as the above, but with upper-case Roman numerals (Ir Iv IIr IIv)
  • empty: the current numeral is kept, but the hyperlink to the page is removed, thus appearing as plain text.

Other strings are passed to the link. Non numeric page numbers included in this way should always be enclosed in " " quotation marks, and must not conflict with the style names.

Certain characters cannot be included, for example "[","]","[[","]]","{{","}}","{","}","(",")",".", amongst others as these can be confused either with wiki-markup, or with internal syntax used in the construction of the page numbering displayed next to transcluded text.

It is strongly advised against using long non numeric strings (or those containing spaces) for page numbers.

Example:

<pagelist 1to10="roman" 11="Foreword"/>

In this example, '1to10' is an interval, and 11 is a single page.

It is possible to define overlapping intervals, or to modify a single page within an interval. Example :

<pagelist 1to5="empty" 3to10="roman"  />

Counters : if a numeric parameter is applied to a page number, it resets the page counter. Example :

<pagelist 1to10="roman" 11=1 />
Examples


If providing a series of values for A=B (or ranges) all definitions must be provided (along with ranges defining styles ) in ascending order. Out of order definitions are not reliably processed.

The <pages/> tag[edit]

This command transcludes a series of pages from an index. It also inserts links between pages, with the page numbers taken from the index page.

Syntax

With djvu/pdf indexes, parameters should be integers :

<pages index="foo.djvu" from=100 to=200 />. 

With other indexes, parameters should be page names:

<pages index=foo from=foo_page1.jpg to=foo_page15.jpg />. 

Section transclusion is possible for the first and last page:

<pages index="foo.djvu" from=100 to=200 fromsection=section2 tosection=section1 />.

Section transclusion can be applied to all pages too (cannot be used with fromsection and tosection):

<pages index="foo.djvu" from=100 to=200 onlysection=english />.
Options in order to improve transclusion system of multi-pages books (with .djvu or .pdf file):
step
Transclude only one page on n. By example : <pages from=1 to=10 step=2 /> show the 1st, 3rd, 5th,7r and 9th pages.
exclude
Don't include following pages. By example : <pages from=1 to=10 exclude="2-5,9" /> show the 1st, 6th, 7th, 8th and 10th pages.
include
Include following pages. By example : <pages include="2-5,9" /> show the 2th, 3th, 4th, 5th and 9th pages.

We can, of course, use all the attributes on the same tag. By example <pages from=1 to=10 include="31" exclude="2-4" step="2" /> will show 1st, 5th, 7th, 9th and 31th pages.

Note: Filename components need to be wrapped in "quotation marks" if they contain spaces, or else the spaces in the filenames need to be replaced with underscores (_). Quotation marks also must be used if the filename contains a non-ASCII character.

Configuration

The template Mediawiki:Proofreadpage_pagenum_template is inserted before each transcluded page. It is used to display page numbers, in the text or in the margin. It accepts 3 parameters : 'page' for the page, 'num' for the page number (given as text only), 'formatted' for a "nice-to-show version" of the page number (which may include html tags). Example

Note: This transclusion method by default inserts a space between each page and the next. It is possible to change this configuration by setting the "page separator" variable (example of such request). If the page separator is a space, it is not (yet) possible to divide a word across two pages and have it displayed correctly. The recommendation is not to divide words.

User options[edit]

The following options can be made available in the user's preferences, as gadgets:

The following options is available in the user's preferences:

  • Show the headers/footers in the edit window (in Preferences/Editing). Name in software : proofreadpage-showheaders

Configuring index pages[edit]

Index pages can be configured by modifying two templates :

In addition, some fields of the index page can be passed to the headers/footers. They must be indicated in

For language interwiki

About journal issues and partial publication[edit]

It is not a good idea to create an index page for a few pages of a book, or for a few pages of a journal issue. Another person might create another index with other pages from the same journal issue, and might not know that another index already exists for the same book.

If you want to publish pages from a journal issue, please name the index after the journal, not after the author of the article you are publishing.

If you create a djvu/pdf file, try to create a djvu/pdf of the whole book/issue, even if you are planning to publish only a few pages from that issue. You should not worry that the index pages will look unfinished. Centralizing all the pages of a given book/journal issue will help users who publish excerpts from the same book/issue.

Headers and Navigation[edit]

The 'pages' command can generate headers automatically. For this the command must include a "header" parameter.

Example

fr:La Petite Dorrit/Tome 2/Chapitre 5

The header is defined in MediaWiki:Proofreadpage_header_template. It is a template that reads parameters extracted from the index page. In addition, it can provide, navigation links, with the following parameters:

{{{prev}}}, {{{current}}}, {{{next}}}

In order to find the previous and next chapters, the index page is used as a Table of Contents.All links from the index page to the to the main namespace are interpreted as 'chapters', except the first one, which is expected to belong to the "title" field. (note: if your wiki does not have an author namespace, this will not work, because the links to author/translator pages will this wrongly interpreted as chapters.)

All parameters defined in MediaWiki:Proofreadpage js attributes are passed to the header template, additionally you can pass any named parameters to this template with a <pages index="..." my_parameter=value />, obviously such parameters needs to be handled by the template. The same mechanism can be used to overload parameter value, e.g. <pages index="..." Author=value /> will avoid to use the default value the extension get from the Index page.

Page numbers are also available:

{{{from}}}, {{{to}}}

Finally, the value assigned to the "header" parameter is available as :

{{{value}}}

This can be combined with parser functions, in order to define several styles of headers.

A special case is made by the extension for call to the <pages index="" /> without from and to parameter, in this case {{{value}}} is assigned to toc and the TOC is transcluded from the Index: page.

Proofreading status indicator[edit]

A coloured proofreading status indicator is displayed in the main namespace, under the title of pages that use transclusion. It shows the proofreading status of transcluded pages from the "Page" namespace. Here is how it looks like :

In this example the text is 40% validated, 30% proofread, 25% raw, and 5% of the transcluded pages are problematic. It also contains a (hidden) backlink to the index page, that can be captured by local javascript.

This indicator is defined by a system message, and it can be configured by admins. Mediawiki:Proofreadpage_quality_template

In the Swedish Wikisource, similar bar graphs are also generated by the template Statusstapel, for example on sv:Wikisource:Statistik.

Special:IndexPages[edit]

This special page lists index pages and their proofreading status. Index pages that were created before the introduction of this feature need to be purged in order to be displayed in the list.

Pages are ordered using the following criterion : 2*(#validated) + (#proofread). This is intended to reflect the number of proofreading actions. In the future more options will be available.

  1. add it to Mediawiki:Proofreadpage_index_template, Mediawiki:Proofreadpage_js_template