Wikisource talk:ProofreadPage
From Wikisource
[edit] Features requests
[edit] Q : Do we want another pagequality level for pages that are incomplete[1] ?
- ↑ incomplete = text not completely transcribed
- I think it would be useful. --Zyephyrus 14:42, 10 May 2009 (UTC)
- Incomplete or incompletely proofed? Or either/both? In any case, I'd use it. --Spangineer 23:22, 12 May 2009 (UTC)
- It might mean both, either one or the other: it would mean that the page requires an action from the user whatever the action. --Zyephyrus 06:03, 13 May 2009 (UTC)
- Incomplete means that the text is only partially transcribed. It seems to me that something such as incompletely proofed would do more harm than good, because it is useless for other users to know that you have incompletely proofed a page, if they do not know which part it is. ThomasV 08:33, 13 May 2009 (UTC)
- Ok. In the long run, a part of the page might be colored as not proofread and another part as proofread, so the page would be incomplete too; but I understand that it is not a feature for now. Sorry if I have distorted what you meant. --Zyephyrus 09:16, 13 May 2009 (UTC)
[edit] "Without text"
I've been playing around with bulk validation of blank pages these last few days. The idea is that you don't need to read a blank page to validate it; in most cases it suffices to glance at a small thumb of the page image to determine that, yes, it really is blank. So I wrote a script to make me a gallery of images of unvalidated allegedly blank pages.
One thing I have learned from this is that there are a lot of allegedly blank pages that are in fact not blank. The most common cause is pages with a picture but no text—since there is no text to extract from the djvu, the djvutext.py bots mark those pages as blank. It is also not uncommon for our bots to mark text pages as blank, because the OCR failed for that page. And I found one case where every page of an entire book is tagged as blank, because a bot tried to upload OCR text from a djvu that had no text layer.
My conclusion is that allegedly blank pages do need to go through a validation process just like every other page. It worries me a little that the new proofreading system shunts blank pages off into a separate class, making it impossible to distinguish between allegedly blank pages and definitely blank pages. Could we think a little more about this change before we go live with it?
Hesperian 01:31, 28 May 2009 (UTC)
- Robots are not supposed to modify the status of a page.
- the 'without text' status will have to be set manually, and the corresponding button will be accessible from any other state.
- if robots becomes a problem, I might block robot edits that attempt to modify page status.
- ThomasV 23:05, 29 May 2009 (UTC)
-
- I think you are saying that "Without text" will like "Validated as blank". That is okay. Hesperian 03:55, 3 June 2009 (UTC)
[edit] Sorting out pages that I can validate per project
I have been working on the enWS project of the month doing lots of the first proof, and some of the validation. From the proofread status en:s:Index:Omnibuses and Cabs.djvu, I am finding it hard to determine which I can validate, and which I proofread. It would be nice to have a means where I could highlight those that I can go and validate. At the moment, the only way that I can determine is to enter the Page: namespace and see whether the option to validate exists.
Even to expand that thought a little. It would be nice to have a means to monitor work on a Index: namespace project. To be able to see what is happening to subpages of a project, to be able to see at a glance how many pages are validated for a work, how many need to be done, etc. This would give us a reasonable means to have a completion schedule. I have some good underlying thoughts on what would be useful, but am unsure of what is technically feasible, and especially easily and quickly feasible. Thx. --Billinghurst 00:39, 18 June 2009 (UTC) (PS. ThomasV. Your extension is bloody marvellous!)
-
- these are interesting ideas.
- I agree, it would be very useful to visualize which pages were proofed by oneself. This is not possible currently, because the identity of the proofreader is not stored in the database, but only in the text of the pages. In the future I plan to have this stored in the database, and what you describe will be possible. This modification, however, will require a schema change (a modification of the database), and it is not likely to happen soon. In the meantime, I can only suggest to proceed in reading order, in order to remember which pages you proofed :-) [ThomasV, 05:37, 18 June 2009 (UTC)]
-
-
- Boo, you are no fun whatsoever! Order smorder :-P [Billinghurst 11:58, 18 June 2009 (UTC)]
-
-
- concerning your second suggestion, Zephyrus added a 'rc' link to some indexes, where you can visualize the modifications of the pages. However, it requires a complicated template, that has to be built manually. I was also planning to create a 'special' page, where index pages are listed with detailed page counts. This too will require a schema change, though.
- ThomasV 05:37, 18 June 2009 (UTC)
-
-
- Even a page where there was a count of the total number of pages in a work, and a summary of their status
-
-
| Work's name | Total pages | Count of Validated | Count of Proofread | Count of Problematic | Count of Blank |
|---|
-
-
-
-
- then a list of works (all or some). It wouldn't need to be a dynamic list, it may be something that is updated daily by a cron if there are changes to a total. [Billinghurst 11:58, 18 June 2009 (UTC)]
-
- The rc link is in all the last PotM : Look here on the right, higher than the Contents box. Does this answer your needs? --Zyephyrus 09:37, 18 June 2009 (UTC)
-
- Better than nothing, and I think that I scrambled through them. -- Billinghurst 11:58, 18 June 2009 (UTC)
-
- Billinghurst, do you mean a thing like this one made by Kipmaster on fr.ws? --Zyephyrus 12:27, 19 June 2009 (UTC)
- Yes, Zeph. Something along those lines. However, I was thinking automated rather than manual.-- Billinghurst 15:40, 19 June 2009 (UTC)
-
-
[edit] Sorting wheat and chaff
Is there a ready means to find out which pages have been proofread and validated, though not transcluded into the main namespace. Here I am thinking of a means of checking by work that all pages have been transcluded, or at least a means to have a sanity check that pages not transcluded are not by accidental omission.
I would see that this would be looked at in two ways.
- From a work's Index page where we are working upon the work, and want to have a check of the transclusion to the main namespace. Especially relevant for those pulling a work together.
- From the perspective of random pages that have been proofed or validated and should be transcluded. Often the case that works are seen to be casually checked from Recent Changes, and one can never be certain of the status.I am wondering whether this could be a report via Special:IndexPages or as a subset/drill down from that page. There are a number of ways that I can think of interrogating things when generally having the janitorial hat on one's head.
-- Billinghurst 07:39, 19 October 2009 (UTC)
-
- there's currently no way to do this on the wiki; you could get this information from the toolserver, though ThomasV 09:03, 19 October 2009 (UTC)
[edit] Bug reports
[edit] HTML comments ruin Index: page layout
In normal wiki code, you can insert HTML comments within a table. On Index: pages, however, such comments ruin the table layout. I guess the problem is being caused by the "!" in "<!--". For an example, see here . Hesperian 23:50, 20 May 2009 (UTC)
- you cannot insert html comments inside a tag. btw, pagelist does not output a table. ThomasV 23:29, 29 May 2009 (UTC)
-
- Template:MediaWiki:Proofreadpage index template outputs a table, so the HTML comment is indeed within a table. But, as you say, in my diff it is also within a tag. Sorry to have bothered you. Hesperian 03:58, 3 June 2009 (UTC)
[edit] Error: index expected
Hello. I've uploaded File:Un_polític_desgraciat_(1911).djvu to Commons, made the trancriptions at s:ca:Llibre:Un_polític_desgraciat_(1911).djvu pages, and also created s:ca:Un polític desgraciat for transclusion of all of its pages, but appears "Error: index expected". Could it be a lag problem? Perhaps it desapears in some hours (if so, I'll comment later)? -Aleator (talk) 13:25, 24 June 2009 (UTC)
-
- I fixed it. you should use quotes instead of underscores ThomasV 15:21, 24 June 2009 (UTC)
- Thank you, Thomas. (But s:ca:Atheneo de grandesa works with underscores). -Aleator (talk) 18:15, 25 June 2009 (UTC)
- I fixed it. you should use quotes instead of underscores ThomasV 15:21, 24 June 2009 (UTC)
[edit] Planned features
This features are still in development.
- Store page metadata in a table, not in the page's text field
- this is the next big thing
- it will require a schema change : a new table for pages where metadata (status, username) will be stored
- A special page for indexes, showing proofreading status. (this too will require a schema change).
[edit] Usage questions
[edit] Index functionality
I must be having a blond moment ...
| “ | Configurable Headers and Footers
The default content of page headers and footers can be configured in Mediawiki:Proofreadpage_default_header and Mediawiki:Proofreadpage_default_footer. In addition, this default value can be adapted to each book. For this, admins need to add 'header' and 'footer' fields to the index pages. |
” |
How does one insert headers/footers into an Index: page? -- Billinghurst 15:53, 19 June 2009 (UTC)
It will insert headers and footers in page:pages, but is often malfunktioning --Joergens.mi 22:36, 26 June 2009 (UTC)
- fyi, this function was temporarily enabled at en.ws. I disabled it because it needs a fix. The fix was commited last week [1]. I will reactivate it once the fix goes live. ThomasV 08:52, 27 June 2009 (UTC)
[edit] Failure
an projekt startet with the previous version and working fine there is now malfunktioning. http://de.wikisource.org/wiki/Seite:Schreiber_Bundschuh_zu_Lehen_095 this page for example. the version aktiv at 2009-06-16T07:18:57 (Versionen) (Unterschied) Seite:Schreiber Bundschuh zu Lehen 073 still worked I'm sorry that i can't tell the aktiv version from that, but there aren't any information about the changes. I'm hoping it is one of the usual sillies in the js scripts, if not i've to open a bugzilla issue. --Joergens.mi 22:47, 26 June 2009 (UTC)
- This problem is specific to de.ws. it is caused by a wrong regexp in de:MediaWiki:ProofreadPage.js. ThomasV 07:36, 27 June 2009 (UTC)
- fixed ThomasV 15:48, 28 June 2009 (UTC)
[edit] IP not allowed to proofread
What do you think about the idea that ip's aren't allowed to do proofreading?
I know at least 3 ip's in the german language wikisource with academic background (degrees in history,..) who are supporting our work by proofreading. They are excluded now to finalize pages - and not very happy about it. Is that the correct way? They don't want to use nicknames by one reason or the other (some of them are more than mid aged). --Joergens.mi 22:47, 26 June 2009 (UTC)
- The goal of the ProofreadPage rewieving system is to ensure that a page has been reviewed by two different users. For this reason, the usernames of the user that declares the page proofread, and of the user that declares the page validated, must be different. This system allows Wikisource to give readers a guarantee that their e-text has been proofread twice. Note that the same kind of constraint also exists at project Gutenberg. This type of guarantee is fundamental if we want to be taken seriously.
- IP adresses do not provide a stable identity. If IP adresses were allowed to mark pages as proofread, there would be no way to ensure that the first proofreader and the second one are different, because the IP adress of a computer changes. A page could be proofread and validated by the same person, not logged in, at two different days, or by the same person, once not logged in and then once logged in. This is why we do not allow IPs to change the proofreading status of a page (even though they are fully allowed to actually proofread and correct pages).
- This is not just a question of fighting vandalism. Even if anonymous proofreaders were all of good faith, they would certainly not remember which pages they have already proofread, especially when a single person proofreads hundreds or thousands of pages during a year.
- Please note that there are many other things IPs are not allowed to do. While Wikimedia does allow IPs to participate at some level, IPs are generally encouraged to create an account. IPs are excluded from the following tasks at WikiMedia:
- IPs are not allowed to move pages
- IPs are not allowed to upload images, or any type of file
- IPs cannot have preferences
- IPs cannot have a watch list
- IPs cannot become sysops
- ...etc.
- In addition, there are a few other extensions used at WikiMedia, whose goal is to improve the reliability of content, just like ProofreadPage. These extensions also exclude the participation of IPs:
- IPs are not allowed to do edit patroling
- IPs are not allowed to flag articles in FlaggedRevs
- The reason why these extensions exclude IPs is similar to the one I gave above.
- I hope that these 3 german IPs (are they 3 different users?) are aware of the many things they cannot do without an account. I can only suggest them to create an account.
- ThomasV 08:24, 27 June 2009 (UTC)
I think such a behaviour has to be discussed with the community the extension will be used and not changed in the middle of the usage without previous public notification and discussion. The previous version were without this humbling paternalism, why should it be introduced by the personal wish of the developer of this extension. It should be easy to make this Behaviour switchable, that every project can set it to its personal preference, it is simply the same to do as you did with the default setting of horizontal and vertical layout. --Joergens.mi 08:53, 27 June 2009 (UTC)
[edit] <pages/> command and partial transclusion
Is there a way to get rid of the "Index" link on the top of the transcluded pages?
Does the command allow partial transclusion by section labeling or some such, and if not is this planned as a feature, would it be possible to add this as a feature? Is this command planned to eventually replace the use of the "Page" template currently used for transclusion?
Many thanks for your wonderful extension! -- Teak 04:27, 2 July 2009 (UTC)
- A hidden "index" link is prepended to the pages, so that a javascript function can pick it up and create an "index" tab (such a function is active at fr.ws, you can see it in the local monobook.js). There's currently no way to get rid of it, and unfortunately it shows up in printable versions. In future versions, however, this link will be removed : detection of the index page will be performed independently, and on any page, not just on pages that use this command.
- Support of partial transclusion with the "pages" command is planned, but not implemented yet.
- ThomasV 08:38, 2 July 2009 (UTC)
[edit] Usability updates
After the first usability release is released to all wikisource projects, the extension, correctly the toolbar buttons, does not work with the new edit toolbar. The Vector skin seems to work correct with the ProofreadPage extension, there is no known error. --enomil 13:28, 2 July 2009 (UTC)
It also do not add the tabs to moving foreward/backward and the index tab. --enomil 17:07, 6 July 2009 (UTC)
The index form does not initialize with the new edit toolbar. --enomil 10:18, 7 July 2009 (UTC)
- unfortunately it is difficult to avoid incompatibilities between javascript extensions, especially when developers are nor aware of other projects.
- in the future I hope to progressively replace javascript code with php; this should make it more robust.
- ThomasV 11:09, 7 July 2009 (UTC)
-
- You are not alone bugzilla:19527. --enomil 21:22, 7 July 2009 (UTC)
[edit] Scanned image not show in the edit mode
Hi, in hy.ws a few images in the "page" namespace are not shown in the edit mode since yesterday, and a black background is shown instead, for example[2], [3]. I tried purging the page, and even delete/create with lack of success. This behavior is seen by a few of the members of the community, and seems to be independent of the OS/browser used. Is this due to a code update, or is it a bug? Thanks. -- Teak 19:02, 7 July 2009 (UTC)
[edit] Categorising Index pages
At the moment, all index pages are in a bulk Category:Index pages, this is not very useful: As texts becoming 100% validated, it would be useful if they could be moved to a Category:Validated works (or similar), so that users don't open an index page, only to find out that there is nothing for them to do there. This might be done automatically, or there could be some sort of tick boxes on the index page, similar to the proofreading page. V85 16:18, 8 September 2009 (UTC)
- perhaps the new special page at Special:IndexPages is what you need ? ThomasV 13:34, 30 October 2009 (UTC)
[edit] Special:IndexPages sort order
Regarding Special:IndexPages and the statement "Most advanced projects are displayed first", a completely validated text containing 99 pages will appear after a 1000-page text which has been 10% validated. This seems a little counterintuitive to me. Hesperian 01:27, 25 September 2009 (UTC)
- Following through on this, a project can be considered finished when every page is either Validated or WithoutText. Depending on whether you want to count or ignore the WithoutText pages, your sort keys should be either {(Validated+WithoutText)/Total, Proofread/Total, NotProofread/Total} or {Validated/(Total-WithoutText), Proofread/(Total-WithoutText), NotProofread/(Total-WithoutText)}. Hesperian 02:29, 25 September 2009 (UTC)
-
- they are ordered by the following criterion : 2*(#validated) + (#proofread).
- this is intended to reflect the number of proofreading actions.
- in the future there will be more options on this page
- ThomasV 10:56, 25 September 2009 (UTC)
[edit] New line caracter added at the bottom of Page: text
As it seems, a \n caracter is automatically added at the bottom of text of Pagina: pages when saving it, just before the footer. I tried and tried to delete it with no success. IMHO, this caracter wasn't added in the previous versions (it was a typical mistake wasting transclusion of a broken paragraph). Am I right?
See: it:Pagina:Zibaldone di pensieri.djvu/122 and its transclusion, it:Pensieri di varia filosofia e di bella letteratura/16. --Alex brollo 10:07, 6 October 2009 (UTC)
-
- yes, it's a bug; it has already been reported in the en.ws scriptorium.
- I fixed that bug soon after the initial report, but the deployment of the fix has been delayed due to other problems. I guess it should be deployed this week.
- ThomasV 12:19, 6 October 2009 (UTC)
- it is fixed now ThomasV 20:26, 7 October 2009 (UTC)
[edit] Special:IndexPages tweak required
When I filter on http://en.wikisource.org/wiki/Special:IndexPages?key=dictionary+of+national+biography it will only show me the top 50 whether I chose 50 or 100, and the NEXT links don't apply the key filter for the next page of results, it just will give me the TOP 20/50/100 generally, so I am unable to see all of the DNB volumes. Billinghurst 11:36, 27 October 2009 (UTC)
- thanks for pointing this out. I just fixed it in svn. ThomasV 13:33, 30 October 2009 (UTC)
[edit] Transclude and section discrepancy
Found that if one has a section tag like <section begin= Whitaker, Sir Frederick /> (note the leading space between = W) and then you transcluded it with
<pages index="Dictionary of National Biography volume 61.djvu" from="21" to="22" fromsection="Whitaker, Sir Frederick" tosection="Whitaker, Sir Frederick">
that the tag (& page) with the space is missed. So I have modified my section tag to <section begin=Whitaker, Sir Frederick /> and there are no problems.
If it is easy, can we have it so that leading spaces in a section tag are ignored? Thx. -- Billinghurst 00:31, 5 November 2009 (UTC)
[edit] To move a group of pages
This has nothing directly to do with the Proofread-tools. but You are in need of it, because of the tools.
It should be possible to move [without redirect] a group of pages. This because, sometimes a page in a djvu-file is removed or added. And then, You suddenly have to move a large number of pages.
This should be limited to pages in ns-104 and to groups of pages within one single index. -- Lavallen 19:48, 5 November 2009 (UTC)