Facts are chiels that winna ding

Friday, May 23rd, 2008 by John

Perhaps one reason why Scotland has produced more than its proportionate share of scientists and lawyers [1] is because of the hard-headedness of a nation which gives great respect to factual knowledge. The lines ‘But facts are chiels that winna ding,/An downa be disputed’ appear in Burns’ poem A Dream from 1786. The English translation is ‘But facts are fellows that will not be overturned,/And cannot be disputed’. The world of research libraries is a world of respecters of facts, and I have bumped up against this in a few different contexts in the last few days.

Peter Murray-Rust, ebullient Reader in Molecular Informatics at the University of Cambridge, talks about the conservatism of the chemistry publishing industry in his blog, which he says is harming science because it is moving so slowly to Open Access. As a consequence, factual data is being held behind toll barriers, when it should belong to the commons. ‘… we cannot get the facts … The melting point of X is Y (temperature) at Z (pressure) is a fact. I hope at least we can agree on that, and that it isn’t a “creative work”.’ The use of copyright to deny access to these facts harms science, and is not defensible. That too, says Peter in his most recent post is a fact.

Scientists want facts, and research libraries must do everything in their power to help them to get them readily and persistently. In the realm of biodiversity, the facts of life have become clearer recently. Natural historians are classifiers, of course, and the classification system in use in the Library at the Royal Botanic Gardens in Kew, London (one of our Partners), is based on that used to classify the plants of its Herbarium – the Bentham-Hooker classification system. But that classification needs to change, because botanists now use DNA analysis to determine the degrees of relatedness of plant species, and this has the effect of forcing revisions to existing classifications based, inevitably, on the intuitions of taxonomists and classifiers from previous generations. The facts of plant taxonomy are now incontrovertible. Nonetheless, in the library at Kew, with its hundreds of thousands of items, the idea of revising the classification system to follow changes in the plant taxonomy is daunting, and as digitisation gathers pace, enhanced keyword access seems a more efficient direction to take – even if we can now produce a factually correct classification system.

And a very good reason for this is because web search as the fundamental and universal starting point for research is now also a known fact. That is why it makes sense to point our common bibliographic metadata heritage at the web, as OCLC is doing in working with Google on its Book Search programme, by providing WorldCat metadata to Google, as announced a couple of days ago. The OCLC Perceptions of libraries and information resources report of 2005 produced the evidence: 84% of searchers begin with an Internet search engine; 1% begin with a library website. Here is a new fact of life for custodians of research collections. At a meeting of the Europeana Contacts Working Group in The Hague on Tuesday of this week, as we considered how best to make Europe’s cultural heritage available in digital form, we pondered that fact, which is a large fellow who will not be overturned any time soon.

[1] This may be a fact, but I have not assembled the evidence for it.

Libraries, Archive and Museum convergence

Wednesday, May 21st, 2008 by Günter

LAM Workshop @ PrincetonWe’re still on the LAM! At this point, we have received guidance from thought leaders, conducted phone conversations with interested RLG program partners, and visited 5 sites to hold a comprehensive library, archive and museum workshop with participants from all constituencies.

LAM Workshop @ V&AOur site visits were (in chronological order) at the Smithsonian and Yale (workshop blog), the Victoria & Albert Museum (image to the left), U of Edinburgh (image at the bottom) and Princeton (top image). These campus (or campus-like) organizations all harbor various libraries, archives and museums, and are at various stages of collaboration (all the way to administrative integration). The workshops were aimed at both surfacing existing models, as well as deepening the working relationships among the different units. We did not confine ourselves to digital issues, but allowed participants to take the discussion of collaboration in whatever direction they felt was most fruitful, including brick-and-mortar considerations.

While we are in the very beginning stages of work on the final report, here are some random exemplary findings which I think you may find reflected in it:

  • We’ve found that the presence of a catalyst is crucial in fostering the discussions between libraries, archives and museums. For the workshop day, RLG Programs played that role. For the long-term, institutions where LAM institutions have a strong partner in an interested and responsive IT department seemed particularly well poised to make progress.
  • We’ve found that in particular when institutional mandates for integration (top down) meet with grassroots enthusiasm for working together (bottom up), a powerful dynamic for change unfolds. If the administrative mandate and staff enthusiasm aren’t properly synced up, progress becomes more difficult to achieve.
  • We’ve found that incentive structures often reward competitive rather than collaborative behavior. Staff evaluations often don’t include metrics to measure collaboration across institutions; in institutions where tenure shapes the behavior of individuals, collaborative endeavors usually don’t play a role when the review comes up.
  • We’ve also seen that organizational charts can play an important part in making collaboration more difficult, especially in a campus setting: when the reporting lines of different collections terminate in different administrative offices, setting a joint goal might require more negotiation.
  • LAM Workshop @ U of EdinburghWe’ll also report out on the projects the sites have committed themselves to as an outcome of the workshop. I don’t want to let the cat out of the bag too much (although an earlier LAM posting contains some project details), so you’ll have to wait for the report to hear all about the projects sites committed themselves to! However, it’s interesting to note that there was a remarkable similarity in the overall ambitions articulated at workshops sites.

    At most institutions, we found that a single search across all institutional resources, both for the benefit of the public and the staff, is a major aspiration (and inspired Ricky’s recent post on cross-collection searching), closely followed by a sense that a more compelling body of digitized material needs to be provided, as well as the means of managing those materials for the long term in a pan-institutional trusted digital repository. Most of the sites also grappled with questions of how to better harness user knowledge and contributions, as well as the place of LAM collections in an information landscape dominated by online search engines and social networking sites.

    I hope this little teaser posting gives you a good idea of what sorts of insights you can expect to glean from the forthcoming report, which will be written by our consultant Diane Zorich, and should be posted here as part of the PAR report series in early August. We’ll also make the agenda from the day-long workshop as well as the scene-setting power point presentation we used available soon. We hope other institutions may be tempted to hold their own workshops, inspired by the successful template we’ve developed.

    Making progress on orphaned works

    Tuesday, May 20th, 2008 by Merrilee

    John Wilkin wrote a long and interesting post on the University of Michigan’s efforts to identify works out of copyright from the pool of works published in the US between 1923 and 1963. In this post, he talks about the research and investigation process as being amenable to “group sourcing” (his term) and as being work suitable for libraries. I agree with him on both counts. It’s interesting to see the data coming out of this project (60% of books are out of copyright, less than the 85% cited elsewhere). I wonder if the difference between 60% and 85% isn’t the difference between what was collected by academic libraries versus overall production. We’ll know more with more data.

    Michigan contributed their knowledge and experiences to our Copyright Investigation Summary Report, and I’m happy to see them again leading the way.

    Metadata Tools Forum: All came together

    Friday, May 16th, 2008 by Karen

    This was our first try at a forum like this, bringing tool developers and their intended user communities together, and focusing most of the forum on the tool developers showcasing their work that people could attend as their interests led them. I blogged earlier on the inspirations for the forum.

    So most of the day people spent looking at tools in groups around large 34” monitors and asking questions of the tool developers directly. One metric of success: Almost everyone saw at least one tool that could be used in their own environments. Terry Reese from Oregon State University, who demonstrated his MarcEdit tool, blogged about the forum on his way home. Notes Terry:

    …interacting with users is really one of the most important ways that I can get a better idea of what people are waiting out of the program. And for that, the RLG tools forum was very useful. As an attendee, this meeting exposed me to a number of tools that I certainly wasn’t aware of before. How we will be able to make use of some of these tools, well, that’s still in the air, but I think that forums like this are important.

    Terry (pictured above) lists the tools that were demonstrated with URLs for some of them and his own reactions. Each of the tool developers created “summary sheets” that you can access on the RLG Programs Metadata Tools Forum Web page. We will be adding a summary of the forum discussion soon.

    Wan Wong (pictured above) came all the way from the National Library of Australia to show off its Subject Selector, inspiring attendees to think of other authority files or databases they would want to target if implemented locally. Brad Westbrook’s demonstration of the Archivists’ Toolkit attracted large groups (picture below). He is looking for others to help with coding, developing functional requirements, refining software specifications, and testing.

    Among the needs identified by forum attendees:

    • Tools should be easily configurable and easily modifiable.
    • Identify gaps in tool output and external requirements.
    • Programmers are often not available, so “shrink-wrapped” tools requiring little technical expertise in installation and configuration are needed. However, some tool users also are equipped to tinker.
    • Develop closer ties between developers and user communities. Provide more opportunities for catalogers and coders to get together.
    • Provide hooks between different tools.
    • Provide opportunities for co-development and for user communities to specify functional requirements and beta-testing.
    • Institutions or organizations need to commit to ongoing support once a tool is released

    We are grateful to the Boston Public Library for their wonderful support for this forum!

    Have you noticed? Many more controlled headings in WorldCat

    Tuesday, May 13th, 2008 by Karen

    Three weeks ago my colleague Thom Hickey blogged about a project he started to control names in WorldCat. This is another example of leveraging the work done in WorldCat Identities to bring more headings under control in WorldCat itself. Rather than individual catalogers updating headings in a record manually, Thom describes a process where 8 headings a second are linked to their associated LC-NACO authority record.

    At the time he wrote about the 63,489 Johann Sebastian Bach’s in WorldCat that were brought under control; last night it was 47,000+ Beethoven’s. Since the start of the project, about 5.5 million headings have been controlled, just over 20% of the 26 million “fairly easy” headings—personal names that match an authority record on multiple subfields.

    So, how about it? Have any of you out there noticed?

    Collection Analysis of Art Libraries

    Tuesday, May 13th, 2008 by Günter

    Just in time for the annual ARLIS conference, we’ve published the study An Art Resource in New York: The Collective Collection of the NYARC Art Museum Libraries (.pdf: 136K/18 pp.), which characterizes the overlap and uniqueness of the Frick Art Reference Library, the Metropolitan Museum of Art’s Thomas J. Watson Library, and the libraries of the Brooklyn Museum and the Museum of Modern Art (the New York Art Resources Consortium or NYARC for short.) Together with Milan Hughston from MoMA, I presented on the study to an audience whose interest was clearly piqued by the results: whether you compare the NYARC institutions to each other, to other local research libraries (Columbia, NYPL, NYU) or a peer institution (the Getty), to the RLG Union Catalog or WorldCat, what emerges is an intriguing degree of uniqueness in this aggregate collection.

    Some of the questions in the q&a were about our level of confidence in the numbers. While we did not sample NYARC collection items to establish a margin of error for the analysis (which is based on RLG Union Catalog clusters), Milan confirmed that current clustering work by the Frick, Brooklyn and MoMA to integrate their respective catalogs [pdf link] confirmed the high rate of uniqueness found in our study. Constance’s recent webinar on Assessing Uniqueness in the System-wide Book Collection (.wmv: 71.3MB/54min.) also provides a useful context for the findings of the NYARC study: in a 250 item sample of records with a single holding in WorldCat, only a little over 10% were unique due to, let’s say, differences of opinion in cataloging.

    While there’s a significant amount of uniqueness in the NYARC collection, there’s still overlap to be exploited as well. Interestingly enough, both uniqueness and overlap make for great fodder for collaboration: a highly unique collections adds more value to the whole collaborations around resource sharing, for example. On the other hand, overlap can be exploited in collaborative projects around off-site storage and deaccessioning. As Milan’s remarks made clear, the NYARC are currently investigating a wide range of options.

    Many thanks again to my colleague Brian Lavoie, who crunched the numbers for the NYARC study and wrote the report. I’ll say publicly that he is a very generous man for giving me co-authorship for whatever little editing I contributed to the piece.

    Fore-edge books

    Monday, May 12th, 2008 by Karen

    During Merrilee’s and my visit to the Boston Public Library last Friday, Tom Blake and Maura Marx introduced us to the results of the BPL’s digitization of its fore-edge books—books with paintings on their edges that can be viewed only by looking at the sides of the book. Some are “double fore-edge” books – one painting is visible when the leaves are fanned one way, and another painting appears when fanned another way. The landscapes we saw date from the 19th century. The one pictured here is from an 1808 first-edition of Latin and Italian poems of Milton translated into English verse, with a painting of the inn at Edmonton, from which John Gilpin, Cowper’s immortal hero, started his famous ride. You can just make out the covers of the book at the top and bottom.

    Tom explained that the digitization process is labor-intensive, requiring six times the effort than that needed to digitize a book. The BPL’s Fore-Edge Paintings set is saved in Flickr, as part of its “Art of the Book” collection, and is well worth looking at!

    Unicode milestone

    Monday, May 5th, 2008 by Karen

    Mark Davis blogged today on the Official Google Blog that Unicode passed a new milestone last December: For the first time Unicode became the most frequent encoding found on Web pages “overtaking both ASCII and Western European encodings—and by coincidence, within 10 days of each other.” The accompanying graph shows the speed of the Unicode uptake.

    “You can see a long-term decline in pages encoded in ASCII (unaccented letters A through Z). More recently, there’s been a significant drop in the use of encodings covering only Western European letters (ASCII and a few accented letters like Ä, Ç, and Ø). We’re seeing similar declines in other language-specific encodings. Unicode, on the other hand, is showing a sharp increase in usage.”

    RLG was a founding member of the Unicode Consortium, and I have had the pleasure of seeing its uptake in the library community. With the widespread adoption of Unicode, I’ve seen far fewer instances of bibliographic records or Web pages in Chinese and Japanese scripts that are garbled because of incompatible encoding. It’s gratifying to see all the investment in making the world’s languages accessible to all paying off. So easy to take it for granted…