Wikisource talk:Index

From Wikisource
Jump to navigation Jump to search
Important project pages:What is Wikisource?Wikisource and WikibooksVotesCopyright policyPossible copyright violationsProposed deletionsCatalogingWikisource and Project GutenbergLanguage policyLanguage domain requestsLanguage domain requests/Rules for votingActive testsList of Wikisource Languages

Cataloging...[edit]

Discussion originated in Wikisource: Scriptorium. Moved to here by User:Maio 01:21, 20 Feb 2004 (UTC).


What we're facing here in Wikisource will soon be a cataloging crisis... it's already difficult and unclear as to how to locate documents and where to add them to lists when importing them into the Wiki.

I feel like the multitude of categories are liable to generate massive confusion, and that it will become very difficult to maintain all of the various lists in short order. Instead of the massive numbers of document classes and indices that we currently have, I propose reducing the number of document types to just a handful: say, fiction, non-fiction, data tables, and source code, and indexing by title within each type. I also suggest that we use the author namespace for all authors and generate a single unified master list for authors.

I hope I'm not being too presumptuous in making these suggestions... Project Gutenberg still suffers from tremendous difficulties in indexing and cataloging; we have the opportunity to head an a drastically different and much more useful direction. -- Jehanne 17:21, 18 Feb 2004 (UTC)

I mostly, but not completely, agree with this. One of the things that I have been recently working on has been resolving orphan pages; the problem you cite has become very evident there. It is perhaps because of this that I made such a big fuss about the layout of the English Main Page. When that was resolved I was not ready for a fight over what should be included in "Primary Categories", despite my belief that a number of unnecessary items were added.
What I might consider as "primary" differs from what you propose, but I believe there is still plenty of negotiating room over that. For example I would not use a primary "fiction/non-fiction" distinction because that requires a judgement that the cataloguer may not be in a position to make. The primary categories should be distinguishable with as little outside information as possible. By keeping "Indexes" as a separate part of a language main page we allow for a number of special indexes which are additional to the primary categories.
I agree with the "Author:" name space. I've been gradually setting some of these up, but attaching low priority to those authors for whom we thus far have only one work. If you have any suggestions about improving the format of these pages I would be glad to hear them. For now they may be recognized by the "See" reference from the alphabetical author listings.
I strongly support an alphabetically subdividable author master list, and have been working on that from the beginning.
The cataloguing problem that you describe cannot be overemphasized. Dealing with it, including the complexities arising from our multilingual environment, will be key to having a usable Wikisource. Eclecticology 21:24, 18 Feb 2004 (UTC)
The alphabetic author master list is a great thing, and is in (relatively) good shape right now as far as I'm concerned. The author namespace is the logical extension of that.
You're right, there's lots of room for discussion on primary categories (and I, too, feel that there are too many at present). I attempted to implement my scheme earlier by going through a significant portion of the article list. My results are here. It didn't turn out as nearly as neatly as I hoped -- I ended up with oogles of subcategories, because... well, they just made *sense*.
The fiction/non-fiction divide and judgment call wasn't something I'd really thought about. I'll have to ponder that.
I've worked in libraries before, and -- intellectually -- I knew that cataloging was a tricky issue, but, well, I never expected this.
I don't know if it's yet been proposed that we would probably benefit most from a more database-like setup (with fields for all sorts of meta-data) that could automatically generate catalogs and allow for more logical searching, and have an interface adaptable to many languages. That's a developer job, and something I don't have the technical background to program. By the time Wikisource is an order of magnitude larger, I think it will become an absolute necessity. -- Jehanne 22:36, 18 Feb 2004 (UTC)
In practical terms, it's important to build on what works. The author lists, and the author namespaces are showing signs of becoming stable, so that a disciplined development of these pages and a correlation of relevant articles with these should have high priority.
One of the concepts that I have tried to implement with the primary categories is a logical question and answer approach. At any stage in the process one asks the question, "Does it belong here?" Then, if yes, put it here; if no, go on to the next question. With this some of the categories that have been suggested will become meaningless. All the poetry and all speeches, for example, will likely have nothing in them, because most poetry and speeches will have had an author.
Yes, work in an established library was never like this! Serendipitously, I hust yesterday encountered an interesting question in the book How Would You Move Mt. Fuji? "How would you find a book in a library" if you don't know what the book is, and the library (which is at least a five-storey building) has no classification system ? Eclecticology 00:47, 19 Feb 2004 (UTC)
Ah! Now the system makes a little more sense. I was looking for the "best" category out of the whole list, when it turns out that it's a dichotomous key... perhaps we can make this clear somehow on the main page. -- Jehanne 02:38, 19 Feb 2004 (UTC)
Such an explanation would be important. Putting it directly on the Main Page would create a clutter. Despite the negative attitude that some Wikipedians have had for sub-pages, I would support a series of .../Help sub-pages designed to to provide help for that specific page. Eclecticology 19:45, 19 Feb 2004 (UTC)
I don't know if you are familiar with Borders (a commercial library) but they have a pretty good system. You can easily find for example a book about angels by moving from Spiritual/Religious > Supernatural beings bla blah.We can use the Open Directory Project as a base also. --Maio 02:58, 19 Feb 2004 (UTC)

I beleive that the main confusion here is that the "Primary Categories" should really be called "Quick Reference Help" or some type of FAQ. The real main index categories right now are Fiction, Non-Fiction, Poetry, Short Stories, Texts, and Miscellaneous material. As soon as someone submits an essay, a new index should be created called Wikisource:Essays.
Your idea in principle is good, establishing the number of categories as a minimum. However, your scheme fails when categorizing poetry: poetry can be ficticious and non-ficticious, because of that, it should be a category apart of itself. The same thing with essays. Short stories could be easily located inside Fiction. And texts could be located inside Non-Fiction along with Legal Documents and Speeches. On the other side, Letters and Debates should go inside Miscellaneous, as you could have ficticious debates and letters. Data tables could be a category of its own.
About indexing Authors... I completely disagree. Storaging is not a problem, we should provide all sort of authors lists: a full list of authors, a list of authors stored in Wikisource that have written poetry, essays, short stories, etc.
--Maio 02:58, 19 Feb 2004 (UTC)
Maio, in principle, I understand the desire to have "all sorts" of lists; but there's a very practical barrier -- not storing, but maintaining all of them. We have less than a thousand items now, and most of our lists are next to useless -- a problem that's only bound to get worse... Which is why I would opt for as very few lists as necessary. Even on the very active Wikipedia, lists (especially those mean to be "exhaustive") maintain themselves poorly. Reduce the complexity, and you increase both the ease of use for both readers and contributors. I'd be much more inclined to support something very simple and straightforward like this. -- Jehanne 03:20, 19 Feb 2004 (UTC)
There is no need for maintenance when that stuff is done automatically, ie: by the software. :p --Maio 04:14, 19 Feb 2004 (UTC)
That'd be great! :) Who's going to write the software? -- Jehanne 14:05, 19 Feb 2004 (UTC)
That brings it down to the nitty-gritty! Until a developer falls from heaven into our laps we'll have to do with the software that we have. Eclecticology 19:45, 19 Feb 2004 (UTC)

It's not out of whim that library catalogues have their primary listings under their authors. To maximize compliance, one needs to minimize ambiguity. If you ask 100 contributors to classify the same work, you want to ensure that they all do it the same way, without your needing to know personally what that work is. If "essay" is to be a category, how do you distinguish between that and non-fiction? Can you provide a simple but strict set of easily understood criteria that everyone will follow?

The issue of categorization systems comes up periodically on the Wikipedia mailing lists. Everybody gets into the argument. Not everybody supports them, and there's usually one person that complains that such a scheme could facilitate the censorship of sexually explicit material, and another who looks at that as an opportunity for making Wikipedia more acceptable to school boards. After a while the argument fades away, until six months later someone suggests that it would be a good idea to put things into categories.

What that all means to us is that we need to find our own way of categorizing and cateloguing that works. If it does work well, maybe Wikipedia can learn from us. I see that as one strictly defined primary key, and an indefinite number of multiple additional and optional categories designed to suit the needs of varied users. Eclecticology 19:45, 19 Feb 2004 (UTC)

If "essay" is to be a category, how do you distinguish between that and non-fiction?
Pretty easy, everything that is an essay goes into Wikisource:Essays.. anything else that is NOT an essay, nor a data table, nor a short story, nor poetry, nor a novel, nor a speech, nor a letter, blah blah blah.. goes into Fiction or Non-Fiction. Basically, before we start designing the catalogue standard, we have to check first which are the primary categories of works/literature. --Maio 01:01, 20 Feb 2004 (UTC)
Maio, I'm afraid your reasoning feels quite weak to me. We also need to design the catalog with the first-time user in mind -- and that means, at least in part, making our system conform to people's expectations when it comes to library catalogs. Prior to the digital catalog age, most card catalogs were organized three primary ways: author, title, and subject (usually using the Library of Congress (LC) subject headings). I'm absolutely positive that we're not ready to deal with subject classificification yet -- professional librarians have been doing that for over 150 years and have basically been stalled in poor compromises (Dewey Decimal, LoC) for more than 50.
Europeans have their own different system. (LC is too American. :-)) India has something else, etc. -Ec
Maybe the Europeans have got it right. I'll have to look into that. -- Jehanne 13:50, 20 Feb 2004 (UTC)
My contention is that people just don't go to the "essays" shelf, or the "letters" shelf in a library -- so why ask people to change their habits for Wikisource? They also aren't going to turn to Wikisource for current knowledge as in their public library, so that dimishishes the importance of subject classification. I'd strongly urge that we focus on our main index: author, at first -- until we can get a developer to write us catalog software that will allow searching in a multitude of fields and automatic generation of index lists. -- Jehanne 02:04, 20 Feb 2004 (UTC)
"Essays" is a form rather than a clssification. Cataloguing systems tend to use the term only for anthologies and the like -Ec
It is kindof point less to further discuss this when in reality we should be using our efforts to request the feature. I would gladly write the code, but unfortunately I do not have the time right now to sit down and do so. I'm sorry for not including a link to the "request a feature" page, but I'm on a text-based browser right now. I'm pretty sure that you can find it on Wikipedia:Wikipedia:Help or Wikipedia:Wikipedia:HOWTOs. --Maio 06:01, 20 Feb 2004 (UTC)
That's fine, I don't mind waiting until you have the time. As for requesting the feature, I'm sure that you will be best able to phrase it the way you want it.
Meanwhile, can the info box be developed in such a way as to include classifications? That can be done without software changes. If classifications were according to some code, in addition to anything in plain text, simply plugging that code into a properly working search function would give all articles that have that code. Searching on codes would avoid the false positives that you would get in plain text search. Eclecticology 10:14, 20 Feb 2004 (UTC)
Jehanne has done a great job about it, talk with her about that. I just re-formatted it. Discussion at Help talk:Infobox/Feb 2004 archive. About waiting: you will have to wait up until xmas probably! x_x Btw, here is the link to the request a feature page. --Maio 14:16, 20 Feb 2004 (UTC)