Archive for the 'ebooks' Category

Pat the Elephant

Friday, July 23rd, 2010 by Constance

There is a well-known fable about blind men with contrasting views on the anatomy of an elephant, each having examined a separate piece of the beast and independently concluded that it is either very like a spear, or a fan, or a snake, etc.  Even in combination their observations fail to provide a very good picture of what an elephant looks like as a whole.  The story was popularized in a poem by John Godfrey Saxe which is cited in a surprisingly wide variety of publications, from early childhood education manuals, to scientific and medical reports, to vocational guides and, more predictably, collections of 19C verse.  I know this because a search on a distinctive phrase from the poem’s conclusion: “prate about an elephant not one of them has seen” in the HathiTrust digital library finds more than 140 matches in these places.

Blind searching in large digital text repositories like the HathiTrust or Google Books provides an intriguing but incomplete view of the mass-digitized book corpus.  Frequently cited statistics like “12 million books” in GBS, “5 million books” or “one million public domain books” in Hathi don’t really tell us much about the anatomy of the mammoth.  Pat the elephant…what do you find?  A lot of curious sensory experiences that don’t add up.

When it comes to anatomizing elephants, all parts are not created equal.  Georges Cuvier, who famously reconstructed skeletons on the basis of a tooth or a toe, knew this.  Cuvier confidently and correctly distinguished Indian and African elephant species based on characteristic differences in jawbones; he ‘discovered’ the woolly mammoth based on a close examination of incomplete fossil remains.

I’m inclined to think that counting books (or volumes) is about as useful in characterizing the mass-digitized corpus as counting vertebrae in the catacombs.  It tells us something about how much is there, but not much about who, or what, is there.

Happily, there is an abundance of bibliographic metadata describing the content from which the mass-digitized corpus was sourced that can be used (like a fossilized tooth or a toe) to assign some generic, or I suppose specific, characteristics to the elephant in the room.  Over the past year, OCLC Research has been working on a project with Hathi and some other interested libraries to begin characterizing the enormous, vaguely familiar (snake? spear? tree?) yet altogether revolutionary (woolly!) mammoth created through the digitization of legacy print collections.

We’ve posted some empirical data on the subject and library distribution of titles in the Hathi digital repository here.  

I think it provides a useful complement to the enchanting and progressively revealing fan-dance of class numbers here.

More to come.

Emphasis on Ebooks

Friday, October 9th, 2009 by Jim

Along with some colleagues I attended the O’Reilly Emphasis on Ebooks online conference today. It’s part of the ongoing O’Reilly Tools of Change for Publishing conference.

In Research we’re investing significant time over the next several months to thinking about services built around books and how those services will change as the migration from physical to ebooks progresses. This half-day conference seemed relevant.

It had three panels organized around

Ebook Pricing: Is $9.99 the new price for ebooks? How can publishers add value and increase margins with ebooks?
What Do Readers Want? How are readers responding to ebooks and the plethora of new devices? What do they think of our efforts to date?
The Future of Electronic Reading: Ebooks, Ereaders, and Beyond: This presentation will cover the current state of the art in eBooks and eReaders - discussing the technologies currently at play and those coming in the near future.”

The first panelists were the most interesting. They included the founders or CEOs from Scribd, Lexcycle (the folks who produce the Stanza ereader), Bookoven and Librivox. Innovators and successful early pioneers. Here’s some of the things they said.

I can’t attribute these remarks to the specific individuals given the limitations of Webex conferencing and the rapid-fire talk that was going on.

All the stuff you can’t do with an ebook can explain the price difference – sell it, lend it, annotate etc. Take a dollar off for every one of those and you get to $10 from 15.

What’s the range of opportunity that e provides that’s unavailable in print? We don’t know what that richer object is yet.

What’s really wanted is the ‘everything’ edition. You get all the formats including print for a small increment. And what constitutes the ‘everything’ edition will take shape by audience and segment and genre which will lead to differential pricing even for the ‘everything’ edition.

Offering new free titles raises interest across the entire list of a publisher. (An old public domain offering doesn’t do it). What’s the conversion after the free title? Lots of consumers think e is a platform for consuming free content and don’t go any further. [The speaker referred to these consumers as Freegans - a designation I’ll use in the future.]

Read the rest of this entry »