On cataloging invisible things …

I’ve been kicking around thoughts on how best to organise the LibriVox catalogue in the future, especially when the new design is implemented. LibriVox is a collection of people who speak many languages, and who record public domain audiobooks in most of them.

Currently, we organise by Category (out of Fiction, Poetry, Non-fiction, Dramatic Works — one per book) and by Genre (list here — multiple selections possible).

However, as our catalogue grows, I think it’s going to get harder and harder to manage this very fixed structure. As we add more books, we’ll need to add more Genres, and it will be a lot of work to retrospectively look over our books when a new Genre is added. For example, Art was recently added to the list, and older books about Art were added to the Genre by people who remembered them. But even with a relatively small number of relevant books involved, some can get missed, like Ruskin’s Lectures on Landscapes. Adding a Genre which would cover a lot of existing books becomes a bit of a nightmare, and there’s little incentive to add new Genres of this type. My point here is definitely not to second-guess the labellers, but to note that it’s going to be very hard to keep using this system when we have 5,000 books. By the time we get to 10,000, we’ll have literally thousands of books in some Genres and it’ll be very hard for listeners to find books that interest them, using the Genre system.

Although we have a lot of extremely altruistic people involved at LibriVox, very few can be recording in the belief that their book will never be downloaded. So I think it encourages readers to have a good catalogue system which makes it easier for listeners to find books they may like. Genres are an important part of this. (I also think some kind of ‘if you liked that, you may enjoy…’ recommendation system will be helpful, but that’s a different discussion. 😉

Also, our current Genres don’t really work for some classic library areas that people often expect to find here, such as Detective Fiction or True Crime. And are autobiographies in Biography or Memoir? How well would a non-native English speaker differentiate Humor and Comedy? (since I struggle myself.) Or Instruction and Advice, for that matter? And would you put a book on learning English into Instruction or Advice, or just Languages? Is the Genre ‘Children’, books about children, or for children? (Books on child-rearing are very different to Mother Goose.) Is Literature of any use at all, since it’s rather “in the eye of the beholder”? Ditto the Genre “Fiction” which ought to be already covered by the Category. We have Epistolary Fiction, but where do we put books of non-fictional letters? What definition of Romance is being used (we do, after all, have a Romance of Rubber in the catalogue!) Are all catalogers using all these terms in consistent ways?

I don’t have a definite answer to any of this — it needs some major discussion! But I think a piece of the puzzle is given an essay by Clay Shirky, who talks about how to sensibly arrange virtual objects, avoiding the perils of real-world organisation. (Hence, some “there is no shelf” musings on Twitter. We might get some support from thinking about the problem in terms of physical library cataloging, but it’s not the whole answer, as we can see from Shirky’s list of popular library system pitfalls.)

I’m wondering about tagging books — having the reader and prooflistener do the initial tagging, which might look a lot like our current genres (or the LoC categories at Gutenberg) and then opening the system up to the general public to tag also (but with the major limitor that no tag appears until it’s been entered a certain number of times by different IP addresses. This would avoid things getting tagged “rubbish narration” or “horrid background noise” or the random like “vote for X” or spam (you don’t need examples of this one.)

Finally, I’d like to chuck into the mix, the representation of other languages. It’s very important to me that a book in a particular language has its catalogue summary in that language (with an English translation if wanted.) It would be lovely to be able to use Genre in one’s own language too. I realise our current hard-skeleton of Genres lends itself to this better than a big flexible system, but anyway. We have an increasing number of books in languages other than English, and encouraging listeners for those languages will result in more readers for those languages.


  • Cori,

    Thanks for your tremendous contribution! I think tagging the books is the way to go. That way, listeners can enter keywords even if the genre is unknown or hasn’t been created. Also, there have been times when I have remembered a book that was read to me when I was little–or I could remember what the book was about, character names, and such–and couldn’t remember the title nor who wrote it. It helps to be able to search for a “key phrase” along with key words. I don’t know how hard it’ll be to set something like that up. Thanks again for all of your work.

  • It’s obviously a complicated subject and you probably know more about it than I do, but for my money, any kind of cataloguing system apart from an alphabetical or chronological one is doomed to fail because (as you’ve already pointed out) you can’t pigeonhole books any more than you can people or beliefs.

    I sometimes order cable pay-per-view movies and am annoyed to see them categorized as Drama, Horror, Science Fiction, Western, etc. (Is “Billy the Kid vs. Dracula” horror or Western?) You can never be sure that you’ve seen every title until you’ve gone through all of the lists, most of which are bound to contain redundant entries.

    I think that, apart from an alphabetical catalogue, a simple arrangement by the author’s date of birth, and then perhaps by his or her nationality, might be the best way to go. Every century of ancient times, say, and every decade of modern ones, could have their own pages. And as in the Classical Music Archive – another fine internet resource – prominent artists could be highlighted and perhaps even have their own pages.

    People searching for a particular book shouldn’t rely on Librivox to find it. Instead, they should turn to a little resource called the internet. It’s not perfect, but it certainly contains more information than any single web site possibly could.

    Likewise, I don’t know about tagging. What I’d most like to see a system of voting, so that listeners could easily and effectively opine about the thing which should matter most to Librivox, namely the quality of its recordings. I can’t tell you how many times I’ve wanted to praise a great reader, or to excoriate an awful one – but where? Only a handful of interested persons will see my remarks on the Librivox forum.

    I second the previous comment: thanks as always for your hard work, Cori. Like you, Librivox is a wonderful invention. Who knew that Thomas Edison was, among many, many other things, the father of modern astronomy?

  • I think the most super-ideal tagging, CC, might come in the nearish-future, from processing the online text of whatever’s recorded – that would allow people to search for character names, familiar places, historical dates, and so on. In the meantime, I strongly hope that the search feature will be able to include the summary of all books, and that’ll be a half-way house to tagging (though it depends on a good summary, of course.)

  • *grins* Mm, well, Walter – not having a way to rate / rank / vote on LibriVox recordings is a very central pillar of the LibriVox concept, simply because it’s as likely to scare off the best readers, as to stop less fluid ones. No-one’s perfect for all readers — even I’ve had some negative comments here and there. If they’d come at the beginning of my recording career, it wouldn’t have lasted very long at all. Listeners have different criteria, and the best that LibriVox can do is to offer a choice of voice, and hope that at least one version will suit.

    That said, there’s a busy whirl of reviews at archive.org, and not all of those are encouraging or constructive. (And the negative / derogatory ones have chilled some readers, which is a loss to the public domain overall.) I don’t think it’s a choice between Quality and Quantity, and I’m sure that a focus on the former would disproportionately affect the latter. But anyway, sorry – this is a favourite rant topic!

    As for the cataloguing, I think it’s worth having something within our catalogue which helps people choose what to listen to next, because it seems clunky to make people go outside LibriVox, find book recommendations, and then come back in to see which ones we’ve recorded. Also, that tends to become either a popularity contest (lists which start on Pride & Prejudice and end with Jane Eyre) or else reverse snobbery (dullsville, mile-long novels, that LibriVox only records because the mission is “ALL books in the public domain”. There are some really wonderful old tomes out there that deserve a new revival, and that’s best targetted to listeners who might appreciate them most. Not that I’ve really worked out how that might look, but I think it’ll be increasingly important as the catalogue grows.

  • Listen to a few minutes of “15 – Introduction Of The Edison Electric Light” from “Edison, His Life and Inventions”, as read by Nelly, and you’ll know why I think that Librivox needs a voting system to warn listeners away from clunkers.

    Reading from China, Nelly takes one of the book’s best and crucial chapters and turns it into an hour of garbled torture. English is her second language – she says, for example, “Addison” instead of Edison and “lightening” instead of lighting – the recording quality is awful and there are background traffic noises.

    This is a person who desperately needs to be told that she needs a lot of improvement, fast, and that she spoiled what could have been a pleasant hour of entertainment and enlightenment.

    I don’t ask for much – just clear, distinct reading. If a great reader, like you or Kara or Graham Redman, comes along, so much the better. But I can live with the lesser lights, many of whom are quite good indeed. My only peeve is that too often a perfectly fine collaboration will be spoiled by someone who had no business intruding himself or herself into it.

    As for the catalogue issue, I suppose that you’re the expert on the subject. (You were too deep for me, I must admit, when you said: “Also, that tends to become either a popularity contest, etc.”) And I agree with you that one of Librivox’s main attractions is its ability to breathe new life into “really wonderful old tomes” which most readers wouldn’t have stumbled across no matter how long they surfed the internet.

    Anyway, no matter how you slice it, Librivox is great stuff and is sure to gain in popularity. I’m reminded of a passage from Edison’s biography – not, thankfully, one of the chapters read by Nelly – in which Edison takes his newly invented gramaphone to President Garfield’s White House. So many people crowded into the president’s sitting room that the Secret Service feared that the floor would collapse under him.

  • Please please PLEASE get a decent rating system. Yes, people can rate on web.archive, but not enough people know about it to get a decent amount of ratings. The only people who don’t want ratings are the poor readers, which hardly encourages them to improve. As the comment above says, why have lots of people’s efforts ruined by one chapter read by a duff reader? As Wikipedia says:
    “A frequent concern of listeners is the site’s policy of allowing any recording to be published as long as it is basically understandable and faithful to the source text”.

    While Librivox persists in this crazy “quantity over quality” attitude, it’s the only way for people to sort the wheat from the chaff.

    I don’t know how many times I’ve started clicked through some really good works, sampling a few seconds of each before downloading the whole thing, to find that the hard and conscientious work of many good readers is ruined by… well, Walter above has a classic example of one of the worst, and I’m in 100% agreement.

    This is about the LISTENER, Cori. Librivox is seen from the outside by many as a therapeutic outlet for the kind of people, who, vocally, are the equivalent of the kind of people who they show on X-Factor because they’re just…so…bad!

    And people WILL say what they think. Doesn’t take much googling to find some pretty frustrated listeners giving some pretty harsh opinions, having had their ears melted and their book ruined yet again by someone attacking a $1 store headset mic with reading that would embarrass most primary school children. I think at least a basic rule should be that the reader knows what punctuation is and does!

    Anyway, point made. Catalogue is OK as it is, it just NEEDS a rating system.

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    This site uses Akismet to reduce spam. Learn how your comment data is processed.