Wikisource talk:ProofreadPage

From Wikisource
Jump to navigation Jump to search

Features requests[edit]

Q : Do we want another pagequality level for pages that are incomplete[1] ?[edit]

  1. incomplete = text not completely transcribed

I think it would be useful. --Zyephyrus 14:42, 10 May 2009 (UTC)Reply[reply]
Incomplete or incompletely proofed? Or either/both? In any case, I'd use it. --Spangineer 23:22, 12 May 2009 (UTC)Reply[reply]
It might mean both, either one or the other: it would mean that the page requires an action from the user whatever the action. --Zyephyrus 06:03, 13 May 2009 (UTC)Reply[reply]
Incomplete means that the text is only partially transcribed. It seems to me that something such as incompletely proofed would do more harm than good, because it is useless for other users to know that you have incompletely proofed a page, if they do not know which part it is. ThomasV 08:33, 13 May 2009 (UTC)Reply[reply]
Ok. In the long run, a part of the page might be colored as not proofread and another part as proofread, so the page would be incomplete too; but I understand that it is not a feature for now. Sorry if I have distorted what you meant. --Zyephyrus 09:16, 13 May 2009 (UTC)Reply[reply]
I want this feature and think it would provide a great deal of guidance for new transcribers as to where they can quickly begin improving a page.Aharonium (talk) 21:20, 16 June 2014 (UTC)Reply[reply]

"Without text"[edit]

I've been playing around with bulk validation of blank pages these last few days. The idea is that you don't need to read a blank page to validate it; in most cases it suffices to glance at a small thumb of the page image to determine that, yes, it really is blank. So I wrote a script to make me a gallery of images of unvalidated allegedly blank pages.

One thing I have learned from this is that there are a lot of allegedly blank pages that are in fact not blank. The most common cause is pages with a picture but no text—since there is no text to extract from the djvu, the bots mark those pages as blank. It is also not uncommon for our bots to mark text pages as blank, because the OCR failed for that page. And I found one case where every page of an entire book is tagged as blank, because a bot tried to upload OCR text from a djvu that had no text layer.

My conclusion is that allegedly blank pages do need to go through a validation process just like every other page. It worries me a little that the new proofreading system shunts blank pages off into a separate class, making it impossible to distinguish between allegedly blank pages and definitely blank pages. Could we think a little more about this change before we go live with it?

Hesperian 01:31, 28 May 2009 (UTC)Reply[reply]

Robots are not supposed to modify the status of a page.
the 'without text' status will have to be set manually, and the corresponding button will be accessible from any other state.
if robots becomes a problem, I might block robot edits that attempt to modify page status.
ThomasV 23:05, 29 May 2009 (UTC)Reply[reply]
I think you are saying that "Without text" will like "Validated as blank". That is okay. Hesperian 03:55, 3 June 2009 (UTC)Reply[reply]

Sorting out pages that I can validate per project[edit]

I have been working on the enWS project of the month doing lots of the first proof, and some of the validation. From the proofread status en:s:Index:Omnibuses and Cabs.djvu, I am finding it hard to determine which I can validate, and which I proofread. It would be nice to have a means where I could highlight those that I can go and validate. At the moment, the only way that I can determine is to enter the Page: namespace and see whether the option to validate exists.

Even to expand that thought a little. It would be nice to have a means to monitor work on a Index: namespace project. To be able to see what is happening to subpages of a project, to be able to see at a glance how many pages are validated for a work, how many need to be done, etc. This would give us a reasonable means to have a completion schedule. I have some good underlying thoughts on what would be useful, but am unsure of what is technically feasible, and especially easily and quickly feasible. Thx. --Billinghurst 00:39, 18 June 2009 (UTC) (PS. ThomasV. Your extension is bloody marvellous!)Reply[reply]

these are interesting ideas.
I agree, it would be very useful to visualize which pages were proofed by oneself. This is not possible currently, because the identity of the proofreader is not stored in the database, but only in the text of the pages. In the future I plan to have this stored in the database, and what you describe will be possible. This modification, however, will require a schema change (a modification of the database), and it is not likely to happen soon. In the meantime, I can only suggest to proceed in reading order, in order to remember which pages you proofed :-) [ThomasV, 05:37, 18 June 2009 (UTC)]
Boo, you are no fun whatsoever! Order smorder :-P [Billinghurst 11:58, 18 June 2009 (UTC)]
concerning your second suggestion, Zephyrus added a 'rc' link to some indexes, where you can visualize the modifications of the pages. However, it requires a complicated template, that has to be built manually. I was also planning to create a 'special' page, where index pages are listed with detailed page counts. This too will require a schema change, though.
ThomasV 05:37, 18 June 2009 (UTC)Reply[reply]
Even a page where there was a count of the total number of pages in a work, and a summary of their status
Work's name Total pages Count of Validated Count of Proofread Count of Problematic Count of Blank

then a list of works (all or some). It wouldn't need to be a dynamic list, it may be something that is updated daily by a cron if there are changes to a total. [Billinghurst 11:58, 18 June 2009 (UTC)]
The rc link is in all the last PotM : Look here on the right, higher than the Contents box. Does this answer your needs? --Zyephyrus 09:37, 18 June 2009 (UTC)Reply[reply]
Better than nothing, and I think that I scrambled through them. -- Billinghurst 11:58, 18 June 2009 (UTC)Reply[reply]
Billinghurst, do you mean a thing like this one made by Kipmaster on --Zyephyrus 12:27, 19 June 2009 (UTC)Reply[reply]
Yes, Zeph. Something along those lines. However, I was thinking automated rather than manual.-- Billinghurst 15:40, 19 June 2009 (UTC)Reply[reply]

assorted wishes[edit]

From bawolff:

Off the top of my head - handle namespaces sanely, It should work out of the box (no hidden steps like importing userland js or setting up namespaces), some of the page status stuff should be in either the page_prop table or somewhere else in the db so its queryable, most of the js needs a thourogh review for security
(That's from my vauge memory of looking at the extension once about a year ago, so things could have changed)

Preview in Third Column[edit]

To aid with "Show Preview" before saving the page, it would be helpful to editors to see the formatted, unsaved text as a third column near the scanned text. Currently, I see the preview above the ProofreadPage gadget before saving. There are many times when I may miss a format error that is only clear after the save and the formatted version appears next to the scanned image.

Given the number of wide-screen monitors in use, this should be tenable and a nice way to use all of those extra pixels "in the gutter". Add a gadget option to allow seeing the edit pane, scan pane and an new preview pane side-by-side-by-side. The content need only be updated after pressing the "show preview", not in real-time. Is there a way to get the same result with CSS and div tags without needing to change the code of ProofreadPage (if the above is difficult to do inside the code)? - DutchTreat (talk) 12:08, 1 August 2014 (UTC)Reply[reply]

Sorting wheat and chaff[edit]

Is there a ready means to find out which pages have been proofread and validated, though not transcluded into the main namespace. Here I am thinking of a means of checking by work that all pages have been transcluded, or at least a means to have a sanity check that pages not transcluded are not by accidental omission.

I would see that this would be looked at in two ways.

  1. From a work's Index page where we are working upon the work, and want to have a check of the transclusion to the main namespace. Especially relevant for those pulling a work together.
  2. From the perspective of random pages that have been proofed or validated and should be transcluded. Often the case that works are seen to be casually checked from Recent Changes, and one can never be certain of the status.I am wondering whether this could be a report via Special:IndexPages or as a subset/drill down from that page. There are a number of ways that I can think of interrogating things when generally having the janitorial hat on one's head.

-- Billinghurst 07:39, 19 October 2009 (UTC)Reply[reply]

there's currently no way to do this on the wiki; you could get this information from the toolserver, though ThomasV 09:03, 19 October 2009 (UTC)Reply[reply]

Bug reports[edit]

HTML comments ruin Index: page layout[edit]

In normal wiki code, you can insert HTML comments within a table. On Index: pages, however, such comments ruin the table layout. I guess the problem is being caused by the "!" in "<!--". For an example, see here . Hesperian 23:50, 20 May 2009 (UTC)Reply[reply]

you cannot insert html comments inside a tag. btw, pagelist does not output a table. ThomasV 23:29, 29 May 2009 (UTC)Reply[reply]
Template:MediaWiki:Proofreadpage index template outputs a table, so the HTML comment is indeed within a table. But, as you say, in my diff it is also within a tag. Sorry to have bothered you. Hesperian 03:58, 3 June 2009 (UTC)Reply[reply]

Error: index expected[edit]

Hello. I've uploaded File:Un_polític_desgraciat_(1911).djvu to Commons, made the trancriptions at s:ca:Llibre:Un_polític_desgraciat_(1911).djvu pages, and also created s:ca:Un polític desgraciat for transclusion of all of its pages, but appears "Error: index expected". Could it be a lag problem? Perhaps it desapears in some hours (if so, I'll comment later)? -Aleator (talk) 13:25, 24 June 2009 (UTC)Reply[reply]

I fixed it. you should use quotes instead of underscores ThomasV 15:21, 24 June 2009 (UTC)Reply[reply]
Thank you, Thomas. (But s:ca:Atheneo de grandesa works with underscores). -Aleator (talk) 18:15, 25 June 2009 (UTC)Reply[reply]

Cache problem?[edit]

Hello. One strange thing: at s:es:Página:Sesiones de los Cuerpos Lejislativos de Chile - Tomo VII (1823).djvu/339 I can see today page number 342 which begins with "En esta ocasión...". But when I zoom in, or I edit the page, the image is another one, also page number 342, but begins with "Boticarios. Don A.N. Coox pide...". What can this be? The books/index (File:Sesiones de los Cuerpos Lejislativos de Chile - Tomo VII (1823).djvu) was uploaded a month ago. I cannot see anything wrong at html source. Purged page and index but nop. Any ideas? -Aleator (talk) 18:53, 1 October 2010 (UTC)Reply[reply]

it seems to work now. delete the page and edit it again ThomasV 07:26, 2 October 2010 (UTC)Reply[reply]
Another one: at s:ca:Pàgina:Les Tragedies de Séneca (1914).djvu/50 we can see page 6 beginning with no fonch paccionat.... If we edit it, the image is page 5 beginnig with amagaren, E acabat.... The index was reloaded at Commons a few days ago. I've purged at Commons and Wikisource, and deleted and undeleted, the Index and the page, but it continues wrong. Any ideas? Thanks! -Aleator 16:47, 21 September 2011 (UTC)Reply[reply]
The solution: In those cases, we have to purge the full url of the specific image. E.g. we had to add ?action=purge at the end of . -Aleator 12:46, 23 September 2011 (UTC)Reply[reply]
A gadget on Page: adding a purge tab doing that will be nice, and perhaps the same purge tab on Index: to purge all thumb of a djvu, if possible. Beside that anyone has an idea from where come this bug ? namely is it a proofread extension trouble or a mediawiki cache trouble with multi page file format ? — Phe 13:39, 23 September 2011 (UTC)Reply[reply]
This cache problem appears again in: commons:File:El virgo de Visanteta y el alcalde de Favara ó El parlar be no costa un pacho (1845).pdf. Any idea?--KRLS (talk) 10:22, 26 October 2018 (UTC)Reply[reply]

Planned features[edit]

This features are still in development.

  • Store page metadata in a table, not in the page's text field
  1. this is the next big thing
  2. it will require a schema change : a new table for pages where metadata (status, username) will be stored

  • A special page for indexes, showing proofreading status. (this too will require a schema change).

Usage questions[edit]

Index functionality[edit]

I must be having a blond moment ...

Configurable Headers and Footers

The default content of page headers and footers can be configured in Mediawiki:Proofreadpage_default_header and Mediawiki:Proofreadpage_default_footer.

In addition, this default value can be adapted to each book. For this, admins need to add 'header' and 'footer' fields to the index pages.

How does one insert headers/footers into an Index: page? -- Billinghurst 15:53, 19 June 2009 (UTC)Reply[reply]

It will insert headers and footers in page:pages, but is often malfunktioning --Joergens.mi 22:36, 26 June 2009 (UTC)Reply[reply]

fyi, this function was temporarily enabled at I disabled it because it needs a fix. The fix was commited last week [1]. I will reactivate it once the fix goes live. ThomasV 08:52, 27 June 2009 (UTC)Reply[reply]


an projekt startet with the previous version and working fine there is now malfunktioning. this page for example. the version aktiv at 2009-06-16T07:18:57 (Versionen) (Unterschied) Seite:Schreiber Bundschuh zu Lehen 073 still worked I'm sorry that i can't tell the aktiv version from that, but there aren't any information about the changes. I'm hoping it is one of the usual sillies in the js scripts, if not i've to open a bugzilla issue. --Joergens.mi 22:47, 26 June 2009 (UTC)Reply[reply]

This problem is specific to it is caused by a wrong regexp in s:de:MediaWiki:ProofreadPage.js. ThomasV 07:36, 27 June 2009 (UTC)Reply[reply]
fixed ThomasV 15:48, 28 June 2009 (UTC)Reply[reply]

IP not allowed to proofread[edit]

What do you think about the idea that ip's aren't allowed to do proofreading?

I know at least 3 ip's in the german language wikisource with academic background (degrees in history,..) who are supporting our work by proofreading. They are excluded now to finalize pages - and not very happy about it. Is that the correct way? They don't want to use nicknames by one reason or the other (some of them are more than mid aged). --Joergens.mi 22:47, 26 June 2009 (UTC)Reply[reply]

The goal of the ProofreadPage rewieving system is to ensure that a page has been reviewed by two different users. For this reason, the usernames of the user that declares the page proofread, and of the user that declares the page validated, must be different. This system allows Wikisource to give readers a guarantee that their e-text has been proofread twice. Note that the same kind of constraint also exists at project Gutenberg. This type of guarantee is fundamental if we want to be taken seriously.
IP adresses do not provide a stable identity. If IP adresses were allowed to mark pages as proofread, there would be no way to ensure that the first proofreader and the second one are different, because the IP adress of a computer changes. A page could be proofread and validated by the same person, not logged in, at two different days, or by the same person, once not logged in and then once logged in. This is why we do not allow IPs to change the proofreading status of a page (even though they are fully allowed to actually proofread and correct pages).
This is not just a question of fighting vandalism. Even if anonymous proofreaders were all of good faith, they would certainly not remember which pages they have already proofread, especially when a single person proofreads hundreds or thousands of pages during a year.
Please note that there are many other things IPs are not allowed to do. While Wikimedia does allow IPs to participate at some level, IPs are generally encouraged to create an account. IPs are excluded from the following tasks at WikiMedia:
  1. IPs are not allowed to move pages
  2. IPs are not allowed to upload images, or any type of file
  3. IPs cannot have preferences
  4. IPs cannot have a watch list
  5. IPs cannot become sysops
In addition, there are a few other extensions used at WikiMedia, whose goal is to improve the reliability of content, just like ProofreadPage. These extensions also exclude the participation of IPs:
  1. IPs are not allowed to do edit patroling
  2. IPs are not allowed to flag articles in FlaggedRevs
The reason why these extensions exclude IPs is similar to the one I gave above.
I hope that these 3 german IPs (are they 3 different users?) are aware of the many things they cannot do without an account. I can only suggest them to create an account.
ThomasV 08:24, 27 June 2009 (UTC)Reply[reply]

I think such a behaviour has to be discussed with the community the extension will be used and not changed in the middle of the usage without previous public notification and discussion. The previous version were without this humbling paternalism, why should it be introduced by the personal wish of the developer of this extension. It should be easy to make this Behaviour switchable, that every project can set it to its personal preference, it is simply the same to do as you did with the default setting of horizontal and vertical layout. --Joergens.mi 08:53, 27 June 2009 (UTC)Reply[reply]

<pages/> command and partial transclusion[edit]

Is there a way to get rid of the "Index" link on the top of the transcluded pages?

Does the command allow partial transclusion by section labeling or some such, and if not is this planned as a feature, would it be possible to add this as a feature? Is this command planned to eventually replace the use of the "Page" template currently used for transclusion?

Many thanks for your wonderful extension! -- Teak 04:27, 2 July 2009 (UTC)Reply[reply]

A hidden "index" link is prepended to the pages, so that a javascript function can pick it up and create an "index" tab (such a function is active at, you can see it in the local monobook.js). There's currently no way to get rid of it, and unfortunately it shows up in printable versions. In future versions, however, this link will be removed : detection of the index page will be performed independently, and on any page, not just on pages that use this command.
Support of partial transclusion with the "pages" command is planned, but not implemented yet.
ThomasV 08:38, 2 July 2009 (UTC)Reply[reply]

Usability updates[edit]

After the first usability release is released to all wikisource projects, the extension, correctly the toolbar buttons, does not work with the new edit toolbar. The Vector skin seems to work correct with the ProofreadPage extension, there is no known error. --enomil 13:28, 2 July 2009 (UTC)Reply[reply]

It also do not add the tabs to moving foreward/backward and the index tab. --enomil 17:07, 6 July 2009 (UTC)Reply[reply]

The index form does not initialize with the new edit toolbar. --enomil 10:18, 7 July 2009 (UTC)Reply[reply]

unfortunately it is difficult to avoid incompatibilities between javascript extensions, especially when developers are nor aware of other projects.
in the future I hope to progressively replace javascript code with php; this should make it more robust.
ThomasV 11:09, 7 July 2009 (UTC)Reply[reply]
You are not alone bugzilla:19527. --enomil 21:22, 7 July 2009 (UTC)Reply[reply]

Scanned image not show in the edit mode[edit]

Hi, in a few images in the "page" namespace are not shown in the edit mode since yesterday, and a black background is shown instead, for example[2], [3]. I tried purging the page, and even delete/create with lack of success. This behavior is seen by a few of the members of the community, and seems to be independent of the OS/browser used. Is this due to a code update, or is it a bug? Thanks. -- Teak 19:02, 7 July 2009 (UTC)Reply[reply]

you want to purge the page of the djvu file. I tried, but unfortunately it failed tue to timeout. retry later maybe ThomasV 09:04, 8 July 2009 (UTC)Reply[reply]
Thanks, it works now... -- Teak 13:30, 8 July 2009 (UTC)Reply[reply]

Categorising Index pages[edit]

At the moment, all index pages are in a bulk Category:Index pages, this is not very useful: As texts becoming 100% validated, it would be useful if they could be moved to a Category:Validated works (or similar), so that users don't open an index page, only to find out that there is nothing for them to do there. This might be done automatically, or there could be some sort of tick boxes on the index page, similar to the proofreading page. V85 16:18, 8 September 2009 (UTC)Reply[reply]

perhaps the new special page at Special:IndexPages is what you need ? ThomasV 13:34, 30 October 2009 (UTC)Reply[reply]

Special:IndexPages sort order[edit]

Regarding Special:IndexPages and the statement "Most advanced projects are displayed first", a completely validated text containing 99 pages will appear after a 1000-page text which has been 10% validated. This seems a little counterintuitive to me. Hesperian 01:27, 25 September 2009 (UTC)Reply[reply]

Following through on this, a project can be considered finished when every page is either Validated or WithoutText. Depending on whether you want to count or ignore the WithoutText pages, your sort keys should be either {(Validated+WithoutText)/Total, Proofread/Total, NotProofread/Total} or {Validated/(Total-WithoutText), Proofread/(Total-WithoutText), NotProofread/(Total-WithoutText)}. Hesperian 02:29, 25 September 2009 (UTC)Reply[reply]
they are ordered by the following criterion : 2*(#validated) + (#proofread).
this is intended to reflect the number of proofreading actions.
in the future there will be more options on this page
ThomasV 10:56, 25 September 2009 (UTC)Reply[reply]

New line caracter added at the bottom of Page: text[edit]

As it seems, a \n caracter is automatically added at the bottom of text of Pagina: pages when saving it, just before the footer. I tried and tried to delete it with no success. IMHO, this caracter wasn't added in the previous versions (it was a typical mistake wasting transclusion of a broken paragraph). Am I right?

See: it:Pagina:Zibaldone di pensieri.djvu/122 and its transclusion, it:Pensieri di varia filosofia e di bella letteratura/16. --Alex brollo 10:07, 6 October 2009 (UTC)Reply[reply]

yes, it's a bug; it has already been reported in the scriptorium.
I fixed that bug soon after the initial report, but the deployment of the fix has been delayed due to other problems. I guess it should be deployed this week.
ThomasV 12:19, 6 October 2009 (UTC)Reply[reply]
it is fixed now ThomasV 20:26, 7 October 2009 (UTC)Reply[reply]

Special:IndexPages tweak required[edit]

When I filter on it will only show me the top 50 whether I chose 50 or 100, and the NEXT links don't apply the key filter for the next page of results, it just will give me the TOP 20/50/100 generally, so I am unable to see all of the DNB volumes. Billinghurst 11:36, 27 October 2009 (UTC)Reply[reply]

thanks for pointing this out. I just fixed it in svn. ThomasV 13:33, 30 October 2009 (UTC)Reply[reply]

Transclude and section discrepancy[edit]

Found that if one has a section tag like <section begin= Whitaker, Sir Frederick /> (note the leading space between = W) and then you transcluded it with

<pages index="Dictionary of National Biography volume 61.djvu" from="21" to="22"
fromsection="Whitaker, Sir Frederick" tosection="Whitaker, Sir Frederick">

that the tag (& page) with the space is missed. So I have modified my section tag to <section begin=Whitaker, Sir Frederick /> and there are no problems.

If it is easy, can we have it so that leading spaces in a section tag are ignored? Thx. -- Billinghurst 00:31, 5 November 2009 (UTC)Reply[reply]

To move a group of pages[edit]

This has nothing directly to do with the Proofread-tools. but You are in need of it, because of the tools.

It should be possible to move [without redirect] a group of pages. This because, sometimes a page in a djvu-file is removed or added. And then, You suddenly have to move a large number of pages.

This should be limited to pages in ns-104 and to groups of pages within one single index. -- Lavallen 19:48, 5 November 2009 (UTC)Reply[reply]

we currently have no solution for that, except robots. when an error is detected in a djvu file, I encourage users to display a warning on the index page, so that users know they should wait before they create the pages. ThomasV 11:29, 20 December 2009 (UTC)Reply[reply]
I found a solution. Admins can install Twinkle tools and get the capability to batch move and batch delete. It works very well. For instructions on how I currently have it set up have a look at my monobook.js file on enWS. Note that the move tool does create a redirect, however, the batch delete fixes that problem pretty quickly. billinghurst 22:56, 4 January 2010 (UTC)Reply[reply]

Any plans for Special:IndexPages ?[edit]

What plans exist for being able to manipulate the data on Index: pages and integrating that with s:Special:IndexPages? I see on frWS that they have a link to a search, though it only seems to filter if the search term is found, which is okay for an Author search if you have the author name in the work, however, not if you don't. Is there capacity to utilise the Author: field on the Index: page and combine that with results from IndexPages? From the perspective of enWS this would allow better linking from Author: pages to scanned pages using this extension. Thx billinghurst 12:52, 25 November 2009 (UTC)Reply[reply]

well, that was the original idea when I created the search button, but then I realized it was much simpler to implement a global search ThomasV 11:31, 20 December 2009 (UTC)Reply[reply]

Including a <ref> output as part of <pages> tag?[edit]

Would an optional field within the tag be available. I was thinking that could automatically output <ref> either as references or in a smaller format like s:Template:Smallrefs billinghurst 23:39, 19 December 2009 (UTC)Reply[reply]

yes, but the pages tag is becoming fairly complicated, also because it is now used to display headers and TOC...
maybe it is time to think about a better syntax. I thought about using an xml-like syntax :
<pagegroup index="name_of_the_index" /> 
 <header title=blah />
 <pages from=xx to=yy/>
ThomasV 10:55, 20 December 2009 (UTC)Reply[reply]
That sounds like a brilliant idea (I think and presuming that I know xml-like!), allows for additions/improvements and better capacity for running bot fix if needed. Would a class from common.css be something that could be incorporated if that is the case? billinghurst 11:09, 20 December 2009 (UTC)Reply[reply]

if I understand the meaning of title correctly, <header name=xxx > is better Phe 11:14, 20 December 2009 (UTC)Reply[reply]
I guess "name" would be the name of the template ? I was thinking of "title" as an overload parameter. ThomasV 11:32, 20 December 2009 (UTC)Reply[reply]
We will need something flexible to customize the header as people want. Actually we are going to implement different template for each name of header with the risk to get as many implementation as people want different header. So what about <header feature="1+2+4+8"> and, in the proofread template, code ala {{#ifexpr: {{{feature}}} and 4|code to add translator}} ; {{#ifexpr: {{{feature}}} and 8|code to add navigation}}. That way we can get a footer (only navigation) with a <header feature=8>.. This can be done without modifying the extension afaics. Phe 12:11, 20 December 2009 (UTC)Reply[reply]
Another possible syntax, more readable, <header translator=1 navigation=1 page_nr=1 />. I think we must go by defining the feature we want rather to use a name expanding to a given template, it'll more easy for maintenance. Phe 12:30, 20 December 2009 (UTC)Reply[reply]
ok, forget about embedding things in a "pagegroup" tag. we can use a parser function to define a default index for the whole page :
<header title=blah />
<pages from=xx to=yy/>
this is compatible with the current syntax.
it is possible to override the index:
<pages from=xx to=yy/>
<pages index=index2 from=xxx to=yyy/>
we might still decide to use something else than the "pages" command to create headers.
ThomasV 19:28, 4 January 2010 (UTC)Reply[reply]

<pages> seems to break TOC creation[edit]

Currently doing some conversion of pages from {{s:Page}} to <pages> and found that when one does the conversion that the ability to have a TOC built from the heading lines breaks. Examples

billinghurst 04:46, 25 December 2009 (UTC)Reply[reply]

I can confirm that using <pages> makes the TOC disappear also at svws. Also the magic word: NOEDITSECTION doesn't work as intended.
In this page, I have made a TOC of my own. -- Lavallen 15:53, 27 December 2009 (UTC)Reply[reply]

OCLC field?[edit]

As we have been slowly developing the fields on the Index: page. I would like for us to consider the addition of w:OCLC. It would be interesting whether we can start to get our files indicated into other collections now that we are getting quite a mass of documents, and this is potentially one piece of excellent metadata. billinghurst 00:14, 31 December 2009 (UTC)Reply[reply]

Multiple authors[edit]

What do I do for a book which has multiple authors? I.e. where a book contains several chapters, each chapter written by a single, identified author. I cannot put all the authors in the author field, as this would give the impression that all the authors collectively wrote all the chapters. Nor do I feel that I should leave the author field blank, because, after all, the authors are identified, and as such, should be attributed. V85 16:47, 16 January 2010 (UTC)Reply[reply]

you can write "collective work" in the author field.
if you are using headers generated from the index page, you can override the author in the <pages/> command :
<pages index=foo.djvu header=1 author=xx /> (or "forfatter" at
in the future I plan to add support for overriding parameters in the "pagelist" command
ThomasV 21:04, 16 January 2010 (UTC)Reply[reply]
Generally books like that would have an editor listed, or be listed as First listed author, et al. The individual authors can be listed on individual pages in the work. Well, that is what we are doing for works like the s:en:DNB (63 vols) with a 100+ contributors of articles. billinghurst 11:18, 17 January 2010 (UTC)Reply[reply]
For three authors only I put the three of them in the Author field: see here, but for a lot of authors this solution won't do. I'd rather choose the "et al." solution then. --Zyephyrus 11:37, 17 January 2010 (UTC)Reply[reply]
Thanks for your tips. The particular book on which I was working s:no:Lysets seier did not have a named editor, so instead, I included author tages, where appropiate. V85 22:15, 20 January 2010 (UTC)Reply[reply]

Transclusion of poems[edit]

Poems cause particular problems when being transcluded, as there is an automatic indent for the first line of a paragraph. This also happen in poems, yielding a most unattractive result. Would it be possible to remove the indent the first line of a paragraph for thext within the <poem>-tages? V85 22:13, 20 January 2010 (UTC)Reply[reply]

put this in your css:
.poem { 
	margin-bottom: 0em; 
	margin-top: 0em; 
	line-height: 1.6em;
	margin-left: 2.5em;
	text-indent: 0em;
.poem p { 
	margin-top: 0em ! important; 
	margin-bottom: 0em ! important; 
	text-indent: 0em !important;

ThomasV 22:25, 20 January 2010 (UTC)Reply[reply]



In the beta GUI, a.k.a Vector, is it possible to make the Previous, Next and Index buttons appear immediately and not in the dropdown menu? --Amire80 13:32, 23 January 2010 (UTC)Reply[reply]

Also, when editing, no proofread options are available (zoom options, ), nor the 2 noinclude edit boxes displayable! -Aleator (talk) 18:34, 28 February 2010 (UTC)Reply[reply]
vector skin is supported now ThomasV 16:22, 9 April 2010 (UTC)Reply[reply]

Linkback to transcluding Main-space page.[edit]

If think it would be good if a page in the Page: namespace had a link at the top with the index (^) link that linked to the page that transcludes it into the Main: namspace. For example en:Page:The Craftsmanship of Writing.djvu/257 would have a link to en:The Craftsmanship of Writing/The Technique of Translating.

I know this can be done via the "what links here" here button, but this would be much more convenient. After all the index page gets a link, and that can be accessed by "what links here". This would provide a useful reverse link to the one next to pages transcluded with <pages>.

Obviously, you can check the links present there, and ditch the ones like the index page link, and keep only the main-space one. If there is more than one transclusion (unusual for a normal page), they you can link to them all, or none, or to the "what links here" page as disambiguation.

Cheers, Inductiveload 02:16, 13 February 2010 (UTC)Reply[reply]

Tab order[edit]

I would expect that the tab would jump from the editing field to the Summary field. It works this way in the main namespace, but in the Page: namespace the edit summary field is very far from the main editing field in the tab order. --Amir E. Aharoni 09:08, 28 February 2010 (UTC)Reply[reply]

See also the discussion at en:Wikisource:Scriptorium. I thought this was unusual, using shift-tab instead of tab, but I adapted. I generally favour anything that helps the addition of edit summaries. Cygnis insignis 07:12, 2 March 2010 (UTC)Reply[reply]
this is fixed now ThomasV 16:23, 9 April 2010 (UTC)Reply[reply]

Text disappeared[edit]

Thanks for the april improvements :)

One rare case reported at it.source Scriptorium: if editing s:it:Pagina:Dalla Terra alla Luna - 021.jpg, no text is present. I purged page and index but continues equal. Other pages and indexes are OK, I think. Rare... -Aleator (talk) 20:18, 9 April 2010 (UTC)Reply[reply]

When Wikisource isn't Wikisource...[edit]

I was surprised to find that Wikisource:ProofreadPage was a red link when adding some things to the help file. I ended up linking to this page (from en.wikisource) as an external link. Some questions that spring to mind...

  • Is there a way to link to the plain no-language Wikisource without the "external link" icon?
  • Is there a way to make a redirect from there to this page?
  • Should this page be moved there?
  • Is it possible to arrange for pages in this wiki to automatically appear as search results in a search of en.wikisource?
  • Would you consider automatically redirecting when a page title exists here but not there?

* Why is there a second English Wikisource? How much stuff is here? And what is the site formally known as?

(From the edit page I see that this is formally the multilingual Wikisource and that English-language pages shouldn't be started here. I wonder how many useful pages like this one may still remain here.)

Just curious... Mike Serfas 17:00, 24 April 2010 (UTC)Reply[reply]

For linking to this multilingual Wikisource from any other project (say from en.source), try typing [[oldwikisource:Main page|Main page]]. For linking from multilingual Wikisource to any other Wikimedia wiki, try typing for example [[w:eo:Main Page|Esperanto Wikipedia]] (that is, Esperanto Wikipedia).
Redirects between different projects still don't exist.
Moving between projects could be import or transwiki functions, just for sysops and just when it has been activated on bugzilla.
Arranging page content between projects needs Mediawiki:InterWikiTransclusion.js and certain templates on both wikis. This ca.source test pages is transcluding Wikisource:ProofreadPage and one page from en.source.
Hope usefull. :) -Aleator (talk) 17:56, 24 April 2010 (UTC)Reply[reply]

Strange bug[edit]

See s:ru:Участник:ChVA/Песочница. As you can see, some parameters are shown as if there were no such parameters in index (though they really are). For example, {{{Источник}}} (russian for source) parameter. It is show in triple brackets... I don't understand why it happens. ChVA 19:15, 24 April 2010 (UTC)Reply[reply]

I'm not sure, but it might be because the page is not in the main namespace. did you try ? ThomasV 04:58, 25 April 2010 (UTC)Reply[reply]
Oh, thanks. I've found the reason — 'cause I've overlook that passed perameters sholuld be in MediaWiki:Proofreadpage js attributes! By the way, do you know how to make a drop-down list in index page like in en.wikisource (fields 'type' and 'progress')? ChVA 05:49, 25 April 2010 (UTC)Reply[reply]
see enhanced index form in Wikisource:Shared Scripts ThomasV 06:57, 25 April 2010 (UTC)Reply[reply]
I added importScriptURI(''); to s:ru:MediaWiki:Common.js, what else should I do? ChVA 15:49, 9 May 2010 (UTC)Reply[reply]
You talk about OCR.js; I can see the OCR button at "Страница" namespace, and OCR works (but for latin alphabet, not cyrillic). I think drop-down list will appear if the next translations you've written are added. Regards. -Aleator (talk) 16:17, 9 May 2010 (UTC)Reply[reply]
OK, thanks, I will wait. Also I'll delete button OCR because there is no need in it if it doesn't work with Russian. ChVA 16:22, 9 May 2010 (UTC)Reply[reply]

Also, please add localization for Russian to IndexForm.js:

var m_author     = { 'en':'Author', 'fr':'Auteur', 'ru':'Автор' }
var m_translator = { 'en':'Translator', 'fr':'Traducteur', 'ru':'Переводчик' }
var m_editor     = { 'en':'Editor', 'fr':'Éditeur scientifique', 'ru':'Редактор' }
var m_publisher  = { 'en':'Publisher', 'fr':'Éditeur', 'ru':'Издатель' }
var m_place      = { 'en':'Place', 'fr':'Lieu', 'ru':'Место' }
var m_volume     = { 'en':'Volume', 'fr':'Volume', 'ru':'Том' }
var m_school     = { 'en':'School', 'fr':'School', 'ru':'Школа' }

var m_book       = { 'en':'Book', 'fr':'Livre', 'ru':'Книга' }
var m_collection = { 'en':'Collection', 'fr':'Recueil', 'ru':'Сборник' }
var m_journal    = { 'en':'Journal or magazine', 'fr':'Journal ou magazine', 'ru':'Журнал' }
var m_phdthesis  = { 'en':'Thesis, report', 'fr':'Thèse, rapport', 'ru':'Диссертация, отчёт' }
var m_dictionary = { 'en':'Dictionary', 'fr':'Dictionnaire, encyclopédie, ouvrage de référence', 'ru':'Словарь, энциклопедия' }
var m_T   = { 'en':'Done', 'fr':'Terminé', 'ru':'Закончено' }
var m_V   = { 'en':'To be validated', 'fr':'À valider', 'ru':'Нужно проверить' }
var m_C   = { 'en':'To be proofread', 'fr':'À corriger', 'ru':'Нужно вычитать' }
var m_MS  = { 'en':'Ready for Match & Split', 'fr':'Texte prêt à être découpé (match & split)', 'ru':'Готово для согласования и разделения' }
var m_OCR = { 'en':'Needs an OCR text layer', 'fr':'Ajouter une couche texte OCR', 'ru':'Нужно распознать' }
var m_X   = { 'en':'Source file is an excerpt of a larger volume, or a mixture of several sources', 'fr':'Source incomplète (extrait) ou compilation de sources différentes', 'ru':'Исходный файл — часть большего текста или смесь нескольких источников' }
var m_L   = { 'en':'Source file is incorrect (missing pages, unordered pages, etc)', 'fr':'Fichier défectueux (lacunes, pages dans le désordre, etc)', 'ru':'Исходный файл содержит ошибки (отсутствуют страницы, перепутан порядок страниц и т.п.' }

Index in Russian for example: s:ru:Индекс:Дешевая юмористическая библиотека Нового Сатирикона, Выпуск 23.djvu

done ThomasV 17:54, 9 May 2010 (UTC)Reply[reply]
Thank you Thomas, but there is still no drop-down list in Index editing page ([4]). What did I miss? ChVA 18:15, 9 May 2010 (UTC)Reply[reply]
Could be missing the Russian index name on MediaWiki:IndexForm.js?:

if(wgCanonicalNamespace=="Livre"||wgCanonicalNamespace=="Index"||wgCanonicalNamespace=="Индекс") {

-Aleator (talk) 18:54, 9 May 2010 (UTC)Reply[reply]
I think so. ChVA 19:16, 9 May 2010 (UTC)Reply[reply]

Hm... that's strange. Drop-down list for 'type' is working, but not for 'progress' (though ThomasV added 'Состояние' variable in MediaWiki:IndexForm.js. See for example [5] - 2nd field is for Progress (Состояние). ChVA 06:46, 11 May 2010 (UTC)Reply[reply]

sorry for the delay ; I think that I have found why you do not have the drop-down list. I recommend you do not use cyrillic characters for the internal name of this field : call it 'Progress' in Proofreadpage_index_attributes (just like you did with 'Type'). ThomasV 10:12, 15 August 2010 (UTC)Reply[reply]
OK, I done it but without result. May be some changes in IndexForm.js needed ('ru':'Progress' may be or something)? ChVA 19:03, 17 August 2010 (UTC)Reply[reply]
ok, it is fixed now; I modified the localization of messages so that you can define them on your wiki. Please copy the definition of self.ws_messages from MediaWiki:Base.js to, so that I can delete it from here. ThomasV 12:22, 26 August 2010 (UTC)Reply[reply]
What page should I insert it into? 06:35, 9 October 2010 (UTC)
I think that it has to be copied on ru:s:Mediawiki:Common.js. Look for "self.ws_messages" at other Wikisources, e.g. en:s:MediaWiki:Common.js. -Aleator 16:54, 9 October 2010 (UTC)Reply[reply]
Thanks. ChVA 18:49, 11 October 2010 (UTC)Reply[reply]

One more bug[edit]

<pages /> command is sometimes stop working. See [6] for example. Last week it showed page numbers, now only command itself. Any minor edit (for example, adding or deleting space etc) will restore page numbers. ChVA 12:03, 16 May 2010 (UTC)Reply[reply]

This seems to be happening frequently lately. Often pagelists disappear, only to reappear after a minor edit. V85 15:10, 20 June 2010 (UTC)Reply[reply]
I am aware of the issue ; I have not found the cause though ThomasV 20:39, 20 June 2010 (UTC)Reply[reply]
Filed as bug 24168. --LA2 02:22, 29 June 2010 (UTC)Reply[reply]

Missing "text-align:left;"[edit]


The Wikisource user billinghurst suggested that I post my request here. So, I'm copying it below:

As can be seen in this image, it is missing something like "text-align:left;" for the first span of the element "prp_header" added by this script. Could anybody fix it? Helder 12:16, 27 May 2010 (UTC)

Just some hints: the message is en:s:MediaWiki:Proofreadpage header. The extension script is identical in other wikisources and left-alignes well at ca, de, es, et, it, ru, sv, zh and "old" WS, and badly at en, fr, la and no WS. So those last 4 have something in common that disrupts the align. -Aleator (talk) 22:15, 27 May 2010 (UTC)Reply[reply]
For fr, la and no, the common thing is #content { text-align: justify; } for en it's a body.ns-104 { text-align: justify; }. What about wrapping the sys msg with a <span style="text-align:left;">...</span> ? — Phe 00:32, 30 July 2011 (UTC)Reply[reply]

Bugzilla 21526 — ") breaking pages[edit]

A conversation has taken place in enScriptorium as probably has been done elsewhere about Bugzilla:21526 about how the character combination ") is breaking the rendering of djvu pages. It doesn't seem to have been added to this list which would seem to be appropriate, so I am adding it now.

From my personal perspective, having had the bugzilla report identified, to me it resolved some problems that I had seen and had not been able to reconcile. So can I ask that where bugs are identified and have a wide effect through the WS community that they are also brought here. Helps us all to understand, and I believe it is where ThomasV has requested that we bring issues. billinghurst sDrewth 01:47, 6 July 2010 (UTC)Reply[reply]

Digging in the mediawiki svn, I saw this is fixed now Phe 00:02, 30 July 2011 (UTC)Reply[reply]

proposal for footnotes than span several pages[edit]

I think that the only way to correctly handle footnotes is to improve the "ref" tag (Cite extension), so that it accepts footnotes that cross page breaks.

As an example, assume that two transcluded pages yield the following wikitext :

blah blah blah
blah blah blah<ref name=note1 >beginning of note 1</ref> 
blah blah blah
blah blah blah
blah blah blah<ref follow="note1">end of note</ref>
blah blah blah

We assume that the first reference is on page 1, and that the second one is on page 2.

In this proposal, the above wikitext would be rendered as text with a single footnote. The footnote would be located in the text at the position of the first occurence of "ref".

In the "Page" namespace, the footnote on page 2 is "orphaned", because the parent footnote is missing:

blah blah blah
blah blah blah<ref follow="note1">end of note</ref>
blah blah blah

An orphaned footnote should be rendered at the beginning of the "references" box, without number. The location of the ref tag the text should be irrelevant.

ThomasV 14:20, 10 August 2010 (UTC)Reply[reply]

Could this be an opportunity to fix two problems in one? Sometimes I'd like to name referencies, to align their format with original text. I.e: sometimes references are written as (1), (2)...; sometimes their number is progressive into the whole book, so that the first reference into a chapter could be (45). So I'd like to have an alternative syntax for ref tag, something like:
  • <ref display="(1)">....</ref>

It this would be possible, it would be possible to to add a rule "merge references with the same name/display", and it would be possible too to add two parameters from=... and to= ... to <references/> tag, to render only a subset of references at a given location into the text. --Alex brollo 06:31, 11 August 2010 (UTC)Reply[reply]

PDF and proofraed[edit]

Hi. I'm from polish Wikisource and I want to submit some strange bug with PDF. When I want to download PDF from book which was created with <pages> tag I can only see code:

<pages index="Illustrowany przewodnik do Tatr, Pienin i Szczawnic (Walery Eljasz-Radzikowski)" from="PL Eljasz-Radzikowski-Illustrowany przewodnik do Tatr, Pienin i Szczawnic.djvu/005" to="PL Eljasz-Radzikowski-Illustrowany przewodnik do Tatr, Pienin i Szczawnic.djvu/012" />

(example: page).

I have also tried to create PDF from single page (example) and after that there is code from proofread in header:

<pagequality level="4" user="Viatoro" />

This is serious problem when someone wants to create his own book using "create book" from left menu and there is only proofread's code. Viatoro 22:26, 4 September 2010 (UTC)Reply[reply]

Pages difference[edit]

In Gesenius' Hebrew Grammar, and probably in many other DjVu and PDF files the technical page number and the logical page number in the book are not the same. On Index pages it is marked using <pagelist>. The <pages> tag displays the logical page numbers, but the tag itself in the source must have the technical page numbers.

Is there a way to have the <pages> tag understand this automatically without having to calculate the pages difference every time? --Amir E. Aharoni 21:00, 5 November 2010 (UTC)Reply[reply]

fromsection / tosection - shortcut[edit]

The fromsection and tosection attributes often have an identical value. If i understand correctly, there's no shortcut for that now and i always have to write 'fromsection="chapter20" tosection="chapter20"'. It would make sense to just write something like 'section="chapter20"'. --Amir E. Aharoni 07:56, 6 November 2010 (UTC)Reply[reply]

noinclude tag[edit]

I've opened bugzilla:26881 because under Internet Explorer, noinclude tags makes the tagged text move to the footer section, and the end text to disappear. -Aleator 12:20, 23 January 2011 (UTC)Reply[reply]

The same is with the Opera browser. Mozilla works fine with it. ChVA 17:56, 21 February 2011 (UTC)Reply[reply]
I attached a proposed patch on bugzilla, only tested with Opera which shown the same trouble, IE test needed. — Phe 01:12, 31 July 2011 (UTC)Reply[reply]

Problem with fromsection and tosection[edit]

This feature doesn't work after recent update. See [7] for example (the beginng of the 3d chapter appears now at the end). The whole page is included though tosection parameter is set. ChVA 14:15, 19 February 2011 (UTC)Reply[reply]

Problem with [age numbers in transcluded pages[edit]

Problem with <pages> — now it shows real page numbers in Index, not page numbers set by <pagelist> command. For example, at,_James_(DNB01) page number 131 is shown though it should be 119 Apparently, template Mediawiki:Proofreadpage_pagenum_template recives wrong page numbers from <pages> command ChVA 18:01, 21 February 2011 (UTC)Reply[reply]

ProofreadPage eats \n[edit]

Hello! For example, after loading the code <section begin="name">\nText is translated into ## name ##\n\nText. After "Show preview" it is translated to <section begin="name">Text. Therefore, if the page begins with the text in the section, then one (text) is not included into a paragraph by tags <p> and </p>. Of course, you can recover a \n before saving, but it has to do every time you edit page. -- Crower 16:25, 13 March 2011 (UTC)Reply[reply]

I have not discovered this together with the section-parser, but I have seen something that looks like this on ordinary pages. I think Template:Nop on enws is used for such cases. -- Lavallen 19:50, 15 March 2011 (UTC)Reply[reply]
I found this situation is created by code var search = /##[\s]*(.*?)[\s]*##[\s]*\n/; in the function restore_lst in the Base.js. To avoid losing \n you can also use simple comment: <section begin="name"/>\n<!--- --->\nText.... Thank you. -- Crower 13:20, 20 March 2011 (UTC)Reply[reply]
Yes, we have also noticed a similar problem at french wikisource. Using any no-effect string, like the nop template from, or <nowiki/>, or a html comment, is a temporary solution. The problem comes from the code mentioned by Crower, it'll have to be fixed. Zaran 20:01, 26 September 2011 (UTC)Reply[reply]
When I reported this on bug #26028 (after this discussion) it was closed as a duplicate of bug #12130. You can vote for it if you also find this to be annoying. Helder 11:03, 27 September 2011 (UTC)

Transclusion of images[edit]

I have noticed that there is an issue with transclusions that contain images. It seems to me that often, images are included in books, "where they look nice", for example on the middle of a page, with an equal amount of text above and below it, irrelevant of where paragraphs begin and end. So long as the text is presented on printed pages, that is acceptable. However, when we tranclude text, so that what is presented as multiple pages in a book, becomes a single text on screen, a problem occurs. And that problem is that where a picture is inserted (in code), it creates a break, and what seemingly is a new paragraph, sometimes mid-sentence.

Take for example en:s:CORSETS: An Analysis. This article has several illustrations. If we look at the picture that is on page 8, you will see that the insertion of the picture has created a line break, so it looks like one paragraph ends with the words "the body against" (and not full stop) and the next paragraph begins "the downward pull" (without capitalisation). These are, however, both part of the same sentence. Is there a way to avoid this problem in a text that has illustrastions? Where there is a new paragraph on the same page as the illustration, the code for that picture could be moved to between the paragraphs. However, on some pages, he image is so large, or the paragraphs so long, that there are no new paraprahs starting on the same page as the image. What to do in these cases? V85 07:46, 5 September 2011 (UTC)Reply[reply]

[8] talk about this trouble, see comment 6, the problem come from that image are included in <div> wrapper, so the code is <p><div>... image </div></p>, this is not valid html, div can't be inside p, so the parser close the paragraph, add the image and reopen the paragraph after the div. No known workaround except the one you pointed by moving the image outside a paragraph. — Phe 16:40, 17 September 2011 (UTC)Reply[reply]

We ran into the same problem on it.source. So far the best solution we found is to wrap the entire paragraph around a <div></div> (in our case, setting its style to make it look like a regular <p></p>). In this way, the Mediawiki software does not automatically wrap the paragraph in a <p></p> as usual, and the text is not interrupted by the image's presence. See my edits at pages 7, 8. See also this page for an example. Candalua 19:58, 17 September 2011 (UTC)Reply[reply]

Nice, thanks, I'll try to propagate it to fr:. By the way, on fr: we are searching a trick to solve the trouble "a cell table can't bypass a page boundary", the only solution we have use includeonly/noinclude tag to put the whole text in the same cell when transclusing pages... — Phe 18:15, 18 September 2011 (UTC)Reply[reply]

Moved to MediaWiki talk:Dictionary.js. — Phe 19:42, 25 September 2011 (UTC)Reply[reply]

Talk moved to MediaWiki talk:Modernisation.js#MediaWiki:Modernisation.js. — Phe 19:07, 25 September 2011 (UTC)Reply[reply]

Add an attribute to the pages tag to transclude only odd or even pages[edit]

Marc suggested at French wikisource to have a solution to transclude only even or odd pages when using the <pages /> tag. This is necessary with multilingual books, where you have a page in language A facing a page in language B and only want to transclude language A. I suggest adding an attribute to the tag (could be named only), where you could specify which pages to transclude. For example :

<pages index="file.djvu" from=12 to=17 only=even/>

would transclude pages 12, 14 and 16 of the index file.djvu. This shouldn't be too hard to code. Zaran 20:42, 26 September 2011 (UTC)Reply[reply]

  • I support this idea, it can be a big issue when working on bilingual works. Unfortunately, nobody is developing this extension at the moment.--Doug.(talk contribs) 01:15, 11 December 2011 (UTC)Reply[reply]
I support it too, but we need perhaps something more general, something ala step=, step=2 will allow skipping odd or even page and an exclude= : exclude="12-15,21,33" will exclude page from 12 to 15, page 21 and page 33. Beside that an include= with the same syntax as exclude to allow to not use from= to= in some case. — Phe 01:29, 11 December 2011 (UTC)Reply[reply]

Automatic OCR layer extraction, pdf and russian letters[edit]

Automatic OCR layer extraction from pdf works incorrect with russian letters. For example, in ru:s:Индекс:Цыбиков Г.Ц. том 2 О Центральном Тибете, Монголии и Бурятии.pdf it extracts only latin letters and numbers but russian letters transforms to odd symbols like "��". Is it possible to solve this problem? Original pdf file placed to commons:File:Цыбиков Г.Ц. том 2 О Центральном Тибете, Монголии и Бурятии.pdf--Вантус 14:43, 29 October 2011 (UTC)Reply[reply]

Hi, I was wondering if you had an example of a wiki that has auto extraction of pdf text. In particular, I'm trying to search within pdfs so that for any OCR readable pdf that if I search for words that appear in the pdf that the search results return that pdf. Please let me know if you had any luck. Thanks in advance,

Tosection ignored[edit]

Hello! Can anybody solve this problem, please? The page 89 has 2 sections, but in the transcluded page the "tosection" is ignored and appears the whole page (also the section of the next chapter). Any idea? Meanwhile, I'm using {{Page}}. Thanks! -Aleator 15:16, 26 October 2012 (UTC)Reply[reply]

Oh... the name of the section was not OK. :S Solved! -Aleator 15:27, 26 October 2012 (UTC)Reply[reply]

No Image[edit]

I installed the extension on my wiki but when I create a Page:xyz.jpg there is no image at the right side. Via Scan I can see the image. What's wrong? -- 13:10, 4 November 2012 (UTC)Reply[reply]

Importing djvu image_metadata fails to create bytea[edit]


  • Postgres 9.3
  • Mediawiki 1.21.2
  • ProofreadPage 1.21
  • Ubuntu 13.04

Problem: When I refresh a "Page" namespace, I am presented with a SQL error (I turned all debug on). It is trying to assign XML image metadata to a Postgres 'bytea' column and failing as the string is not in the proper format.

The DJVU file was created by 'pdf2djvu' (0.7.12-2ubuntu6). Sample PDF and DJVU files can be given if needed.

Error Message:

UPDATE "image" SET img_size = '4890564',img_width = '2758',img_height = '4142',img_bits = '0',img_media_type = 'BITMAP',img_major_mime = 'image',img_minor_mime = 'vnd.djvu',img_metadata = '<?xml version="1.0" ?> <!DOCTYPE DjVuXML PUBLIC "-//W3C//DTD DjVuXML 1.1//EN" "pubtext/DjVuXML-s.dtd"> <mw-djvu><DjVuXML> <HEAD></HEAD> <BODY><OBJECT height="4142" width="2758"> <PARAM name="DPI" value="300" /> <PARAM name="GAMMA" value="2.2" /> </OBJECT> <OBJECT height="4113" width="2708"> <PARAM name="DPI" value="300" /> <PARAM name="GAMMA" value="2.2" /> </OBJECT> <OBJECT height="4133" width="2746"> <PARAM name="DPI" value="300" /> <PARAM name="GAMMA" value="2.2" /> </OBJECT> <OBJECT height="4108" width="2704"> <PARAM name="DPI" value="300" /> <PARAM name="GAMMA" value="2.2" /> </OBJECT> <OBJECT height="4154" width="2771"> <PARAM name="DPI" value="300" /> <PARAM name="GAMMA" value="2.2" /> </OBJECT> <OBJECT height="4108" width="2704"> <PARAM name="DPI" value="300" /> <PARAM name="GAMMA" value="2.2" /> </OBJECT> <OBJECT height="4129" width="2729"> <PARAM name="DPI" value="300" /> <PARAM name="GAMMA" value="2.2" /> </OBJECT> <OBJECT height="4133" width="2738"> <PARAM name="DPI" value="300" /> <PARAM name="GAMMA" value="2.2" /> </OBJECT> </BODY> </DjVuXML> <DjVuTxt> <HEAD></HEAD> <BODY> <PAGE value="This shadowy assembly-is the... TRUNCATED TO NOT BE TOO BIG ... " /> </BODY> </DjVuTxt> </mw-djvu>',img_sha1 = 'i7pmpn6xx8rl8fit5dg9djbhcq196oo' WHERE img_name = 'My_Test_DJVU.djvu'

Stack Trace:

  1. /usr/local/mediawiki-1.21.2/includes/db/DatabasePostgres.php(482): DatabaseBase->reportQueryError('ERROR: invalid...', '22P02', 'UPDATE "image"...', 'LocalFile::upgr...', false)
  2. /usr/local/mediawiki-1.21.2/includes/db/Database.php(983): DatabasePostgres->reportQueryError('ERROR: invalid...', '22P02', 'UPDATE "image"...', 'LocalFile::upgr...', false)
  3. /usr/local/mediawiki-1.21.2/includes/db/Database.php(1840): DatabaseBase->query('UPDATE "image"...', 'LocalFile::upgr...')
  4. /usr/local/mediawiki-1.21.2/includes/filerepo/file/LocalFile.php(546): DatabaseBase->update('image', Array, Array, 'LocalFile::upgr...')
  5. /usr/local/mediawiki-1.21.2/includes/filerepo/file/LocalFile.php(495): LocalFile->upgradeRow()
  6. /usr/local/mediawiki-1.21.2/includes/filerepo/file/LocalFile.php(454): LocalFile->maybeUpgradeRow()
  7. /usr/local/mediawiki-1.21.2/includes/filerepo/file/LocalFile.php(349): LocalFile->loadFromRow(Object(stdClass))
  8. /usr/local/mediawiki-1.21.2/includes/filerepo/file/LocalFile.php(464): LocalFile->loadFromDB()
  9. /usr/local/mediawiki-1.21.2/includes/filerepo/file/LocalFile.php(716): LocalFile->load()
  10. /usr/local/mediawiki-1.21.2/includes/filerepo/FileRepo.php(366): LocalFile->exists()
  11. /usr/local/mediawiki-1.21.2/includes/filerepo/RepoGroup.php(146): FileRepo->findFile(Object(Title), Array)
  12. /usr/local/mediawiki-1.21.2/includes/GlobalFunctions.php(3542): RepoGroup->findFile(Object(Title), Array)
  13. /usr/local/mediawiki-1.21.2/extensions/ProofreadPage/ProofreadPage.body.php(195): wfFindFile(Object(Title))
  14. /usr/local/mediawiki-1.21.2/extensions/ProofreadPage/ProofreadPage.body.php(429): ProofreadPage::load_index(Object(Title))
  15. /usr/local/mediawiki-1.21.2/extensions/ProofreadPage/ProofreadPage.body.php(397): ProofreadPage::preparePage(Object(OutputPage), Array, false)
  16. [internal function]: ProofreadPage::onBeforePageDisplay(Object(OutputPage), Object(SkinVector))
  17. /usr/local/mediawiki-1.21.2/includes/Hooks.php(255): call_user_func_array('ProofreadPage::...', Array)
  18. /usr/local/mediawiki-1.21.2/includes/GlobalFunctions.php(3883): Hooks::run('BeforePageDispl...', Array)
  19. /usr/local/mediawiki-1.21.2/includes/OutputPage.php(2031): wfRunHooks('BeforePageDispl...', Array)
  20. /usr/local/mediawiki-1.21.2/includes/Wiki.php(572): OutputPage->output()
  21. /usr/local/mediawiki-1.21.2/includes/Wiki.php(458): MediaWiki->main()
  22. /usr/local/mediawiki-1.21.2/index.php(62): MediaWiki->run()
  23. {main}

unsigned comment by (talk) 00:49, 3 October 2013‎.

Recto/verso numbering through <pagelist/>?[edit]

A lot of early books were numbered by leaf instead of by page-side, meaning when you open the book only the right-side page has a number. The page numbering is usually expressed in the same way as manuscripts, i.e. page number and recto or verso (see the image to the right). At the moment, it seems like there's no support for this style of numbering in the <pagelist/> function and the numbering has to be entered by hand for each page, like this:

<pagelist 1="1" 2="1v" 3="2" 4="2v" 5="3" 6="3v" ... />

Is there some way to achieve this that I'm missing? Or can "folio" (and "folioroman") styles be added so we can just mark them like this?

<pagelist from=1 to=50 1to50="folio" />

Cross-posted with Extension talk:Proofread Page because I'm not sure where the best place to request this is.

Michael Chidester (talk) 19:20, 2 October 2014 (UTC)Reply[reply]

The best place for this request is on Bugzilla. Feel free to open a bug on it. Tpt (talk) 08:54, 6 October 2014 (UTC)Reply[reply]

I need a help for using ProofreadPage at Korean wikisource[edit]

Hello, there. I need a help for using ProofreadPage at Korean wikisource. I tried to contact local admin. But nobody answered me. At first, please see this page ko:색인:殺人書秘話 박문 제1집, 1938.10, 12-13 (2 pages).pdf, which doesn't work well. I think something is wrong in this. How should do for using ProofreadPage at Korean wikisource? HappyMidnight (talk) 02:31, 11 January 2015 (UTC)Reply[reply]

Could you remove space characters between pages in Thai, Chinese, Japanese etc ?[edit]

Hi. This extension creates space characters between pages. Although many European languages put spaces between words, some Asian languages don't put spaces between words in Thai, Mandarin, Cantonese, Japanese etc. Could you create a parameter to get off/on the insertion of space characters between words for these languages ? Thank you for your maintenance. --Akaniji (talk) 10:02, 15 October 2015 (UTC)Reply[reply]

@Akaniji: There is an old phabricator bug concerning this problem. IMO, the only thing we can do with it is to ping Tpt to set higher priority for this problem in his long queue. Or, maybe, to prepare a ProofreadPage patch for this ourselves and send it to Phe as ready-to-implement solution? Ankry (talk) 11:46, 18 October 2015 (UTC)Reply[reply]

Ignore: Test edit[edit]

Testing of the new anti-spam filter. Varlaam (talk) 18:08, 11 December 2015 (UTC)Reply[reply]

Building an index page for multiple files (jpg/png/...)[edit]

It has been indicated that we do not cover situations like where we have a string of individual images, eg. jpg, and we need to demonstrate the building of Page:...jpg links that string together to match the individual images. We talk about how they string together in <pages> but not how to build the page. Thoughts? billinghurst sDrewth 03:50, 22 July 2016 (UTC)Reply[reply]

Arabic alphabet[edit]

Hi. Is there any possibility that the Arabic alphabet could be OCRed? We really need it for the Persian wikisource. --Yoosef Pooranvary (talk) 09:24, 3 June 2017 (UTC)Reply[reply]

Pages cannot be listed more than once on an index page[edit]

I create 'Table of Contents' with links to the corresponding pages. If I have two headings on one page, then when I write the index, I get an error 'Pages cannot be listed more than once on an index page'. Why? This is not logical. One page can have two headings. For example, a title and subtitle or two (or more) works.Karaby (talk) 12:30, 1 June 2020 (UTC)Reply[reply]

Transcluded page not showing as expected[edit]

Is there a support for this extension? Installed on MediaWiki 1.38.2, PHP, 7.4.30 (fpm-fcgi), ProofreadPage – (05f73cd) 05:02, 26 October 2022, PDF Handler, LabeledSectionTransclusion – (67e3ec4) 05:08, 6 July 2022. All looks fine when in the Page namespace, except when using the page tage <pages index="pdf file.pdf" from=1 to=1 /> for transcluding from Pages to Main, it does not show the expected page from the Pages transcluded in the page in the main, but only a thin stripe; hovering mouse above it, it reads: 1 validated page, 0 only proofread pages and 0 not proofread pages. The template used for MediaWiki:Proofreadpage pagenum is a template from wikisource. Something missing somewhere? Pspviwki (talk) 16:39, 2 November 2022 (UTC)Reply[reply]