Archive for March, 2011

See us at CNI next week!

Wednesday, March 30th, 2011 by Merrilee

A few of us will be attending the Coalition for Networked Information Membership Meeting in San Diego next Monday and Tuesday.

Jennifer will be giving a briefing on Managing Research Information for Researchers and Universities, which I think will nicely tie together some recent work and reports (ours and others). Ricky and I will be co-presenting on Out of the Eddies and into the Mainstream: Making Special Collections Less Special and More Accessible — like Jen, we’ll be summarizing a raft of our own work, and asking, “where do we go from here.” Unfortunately, both presentations are scheduled at the same time, from 10:30-11:30 Pacific on Tuesday April 5th. Jim Michalko will also be attending the meeting, so OCLC Research will be well represented.

If you’ll be in San Diego, I hope you’ll take some time to come to our presentations or otherwise track us down. If you aren’t able to attend in person, follow along on Twitter at #cni11s.

What We’re Reading — Week of March 21, 2011

Friday, March 25th, 2011 by Bruce

The imperfectionists : a novel (Book, 2010) []

I thought this was terrific. The central character is an English language newspaper published in Italy, with a cleverly assembled set of short stories telling
its history, from its curious founding through periods of energized attention to a long twilight and abrupt end. Resonated with me.

Pandora’s Facebook Box – John Battelle’s Searchblog

I yam who I yam (but tell that to Facebook or Pandora)

The Internet: For Better or for Worse by Steve Coll | The New York Review of Books

Morozov’s “The Net Delusion” sounds like it’s a counter-balance to Clay Shirky’s “Here Comes Everyone”, detailing the malevolent uses of social media.
Wu’s “The Master Switch: The Rise and Fall of Information Empires” is praised as a “tour de force” on rise and fall of information technologies and
expresses his concern on the contested future of the Internet. Wu (a law professor at Columbia) coined the phrase “net neutrality”, and this is one book I plan to read.

The NY Times: Un-Free At Last! | Monday Note

One of the clearer commentaries on the NYT’s new paywall and why “Keep it Simple, Stupid” is so important. I’m just glad I still have my print subscription to the Sunday Times.

Which cities produce worldwide more excellent papers than can be expected? A new mapping approach–using Google Maps–based on statistical significance testing

“Based on Web of Science data, field-specific excellence can be identified in cities where highly-cited papers were published. Compared to the mapping approaches published hitherto,
our approach is more analytically oriented by allowing the assessment of an observed number of excellent papers for a city against the expected number.”

We’re seeing more of these geo-data visualizations, like the NY Times’ Mapping America site based on US census data, and for me they often confirm expectations rather than reveal
surprising findings. So it’s interesting to me that the aim of this effort was to reveal the unexpected. Example map for Chemistry here:

Resource Sharing in Australia: Find and Get in Trove – Making “Getting” Better

Great description of the challenges in providing a complete array of “get” options in the NLA’s Trove: buy, borrow (national loans), copy, digital view, print on demand and digitize on demand

National Library of Finland’s Digitalkoot, Europe’s first national crowdsourced digitization program

This nifty example of combining gaming with OCR text correction reminded me of the Waisda? games for tagging videos and the huge crowdsourcing success of the National Library of Australia’s historic newspapers (31 million lines of text corrected). Still early stages, but 25K volunteers in one month is pretty good.

This Title is Metadata

Friday, March 25th, 2011 by Roy

This week I spoke at an OCLC Member Services event called “Good Practices for Great Outcomes: Cataloging Efficiencies that Make a Difference.” It was held at The Huntington, one of our Research Library Partnership partner institutions, and is one of a series of such events.

In putting the talk together, I shamelessly stole from Karen Calhoun’s slidedecks, as she gives high-level talks about cataloging and metadata issues all the time, and her insights are always interesting, useful, and informative. To that I added other issues, largely from some of the recent work we have done in Research. Chief among these was the work headed up by my colleague behind my cubical wall, Karen Smith-Yoshimura. Her working group to “gather evidence to inform changes needed in MARC metadata practices” produced a report, “Implications of MARC Tag Usage on Library Metadata Practices” that I commend to your attention.

I also drew upon some ongoing work we are doing to try to detect and unambiguously mark items that are online in full — particularly open access content. In doing this work, we are delving into such esoterica as exactly how 856 fields are coded and to what the URLs in those fields lead. We’re finding some messes along the way, which in the cases that appear to be amenable to bulk clean-up we are forwarding to the WorldCat quality control team to fix.

The next event in this series is next week, on March 29 in San José, which is unfortunately full at this point. But if you’re interested, keep your eyes peeled for the next one. Or, better yet, contact RJ Pettersen if you’d like to host it.

The Scottish Presence in the Global Library Resource

Wednesday, March 23rd, 2011 by Brian

“Collective collections” are the combined library collections of multiple institutions. They may exist as a physical aggregation of materials at a single location; they may exist through a service layer that integrates distinct collections into a single resource. Or they may be only notional: a hypothetical combination that can be mined for intelligence to inform institutional and collaborative decision-making. Collective collections can be assembled at any scale, from two institutions to the global library system as a whole. In the latter case, of course, we can only approximate – no single data source represents the holdings of all libraries everywhere. Fortunately, the more than 200 million records and 1.7 billion holdings contained in the WorldCat database provide us with an approximation of the global library resource that is sufficient to explore many interesting questions.

A project currently underway in OCLC Research is exploring the concept of a national presence in the global library resource. A national presence can be characterized from a number of perspectives, including the distinctive features of the country’s library collections; the output of the country’s publishing houses; works authored by the country’s citizens; and the corpus of materials that, regardless of origin, are “about” some aspect of that country. All of these facets, taken together, form a picture of how a country’s profile is manifested in library collections around the world. The project uses Scotland as a case study to illustrate the concept of a national presence in the global library resource, but the goal is to develop patterns of analysis that can be applied without significant modification to any country.

The project is investigating three major themes:

  • National Research Collection: the project examines the notion of a “national research collection” – the combined library collections of a nation’s higher education and research-oriented institutions – and how it aligns with the global library resource. The analysis focuses on the collective collection of the four ancient Scottish universities (Aberdeen, Edinburgh, Glasgow, and St. Andrews) as well as the National Library of Scotland. The purpose is to uncover the distinctive features of this collective research collection vis-à-vis groups of peer institutions in the library system (e.g., the collective holdings of ARL institutions), as well as the library system as a whole (as represented by WorldCat).
  • National Presence in the Global Library Resource: this aspect of the project shifts focus from the contents of Scottish library collections to the presence of Scotland-related materials in library collections around the world. The analysis focuses on materials published in Scotland, created by Scottish authors, or primarily about some aspect of Scotland. Key questions include the size of the Scottish national presence in the global library resource, as well as the characteristics of the materials comprising this presence.
  • Diffusion of a National Presence within the Global Library Resource: given the materials identified as comprising a national presence, WorldCat holdings data can be used to track their pattern of diffusion throughout the global library resource. From this, many interesting questions can be explored: the locations of extensive collections of Scotland-related materials outside Scotland; comparisons of the diffusion of Scotland-related materials with the diffusion of the “Scottish diaspora”; and global collecting activity as a means of identifying “core” (i.e., particularly influential) Scottish works.

Analysis of a national presence in the global library resource is relevant to a range of library decision-making needs, including collection development strategies, prioritization of digitization activities, “gap analysis” for national library collections, as well as other applications. A key source of value in all of these potential uses is the ability to consider the features of a national presence against the broader context of the global library resource. The capacity to frame collections and services within a system-wide perspective is a tool of growing importance for library-related analysis and decision-making.

Note: Thanks to colleagues at the Universities of Aberdeen, Edinburgh, Glasgow, and St. Andrews, and the National Library of Scotland, for their ongoing participation in this work. Special thanks to our much-missed colleague John MacColl, who, in his former role as an OCLC Research program officer, was instrumental in designing and guiding this project. We look forward to his continued participation in the project in his new role as University Librarian and Director of Library Services at the University of St. Andrews!

A crowdsourcing success story

Monday, March 21st, 2011 by Karen

I’m a great fan of the National Library of Australia’s Trove, a single search interface to 122 million resources—books, journals, photos, digitized newspapers, archives, maps, music, videos, Web sites—focused on Australia and Australians. You can search the OCR’d text of over 45 million newspaper articles that have been digitized.

OCR is not perfect. The original document is juxtaposed with the OCR transcription so errors are immediately apparent. Since the Australian Historic newspapers public launch in July 2008*, people have been correcting errors in the OCR’d text. Both the corrected text and the original text are indexed and searchable.

The enthusiasm of these public text correctors is amazing! The 15 March 2011 Trove newsletter notes:

Text correctors are still doing an outstanding job of improving the electronically translated text, and the number of corrections each month continues to increase. In January we had over 2 million lines of text corrected in a month for the first time, which continued through February. The running total of corrected lines has now reached 31 million!

One of the issues the RLG Partners Social Metadata Working Group addressed was to what degree moderation was needed when opening up the descriptions of cultural heritage resources to user contributions. The responses to the social metadata working group’s survey of site managers indicated that spam or “inappropriate behavior” was not a problem. Rose Holley (a member of the working group) provides additional corroboration that spam and derogatory comments were not a big problem after a careful review of comments.

…recently we made a decision to manually review the 18,000 comments that have been added to newspaper articles and other items in Trove. We found only 114 spam comments with URL that were removed and 71 comments placed by the same user in the same week that breached our terms and conditions (derogatory). These were also removed.

We thought that was very good news and supported our theory that moderation is still not required. We have however added a feature that enables a user to easily report spam via the trove forum.

This supports one of the working group’s recommendations: Go ahead! Invite user contributions without worrying about spam or abuse.

* For more details, see Holley, R. (2009) Many Hands Make Light Work: Public Collaborative OCR Text Correction in Australian Historic Newspapers, National Library of Australia, ISBN 9780642276940

What We’re Reading — Week of March 14, 2011

Friday, March 18th, 2011 by Bruce

Just a couple of posts from a very busy week, added by the intrepid Merrilee before she hopped a plane to Oxford.

Signal and SXSW: What Should I Ask WordPress Founder Matt Mullenweg? – John Battelle’s Searchblog

WordPress as a good example of a “freemium” business

Need Advice on What to Read? Ask the Internet –

OCLC Partner Goodreads integrates algorithm to recommend books.

What does it feel like to be [your library name here]?

Friday, March 18th, 2011 by John

It’s now around six weeks since I swapped the bird’s-eye view of the research library sector internationally, which I had as a member of the staff of OCLC Research, for the perspective of a University Librarian, working within this small, research-intensive university in Scotland. The photograph shows me looking up at the seagulls from the pier at St Andrews. These birds’ eyes scan some beautiful sights, though they seem to induce little more than outputs which are troubling for university buildings.

It has been a fascinating transition. I had a pretty good idea of what to expect, since it was working with the higher-level perspective, for OCLC Research, which was unusual within my career – even though I have had a fondness for it throughout, as my involvement in many JISC-funded UK-wide projects over the years had demonstrated. Reflecting on this, I consider one of the greatest challenges for organisations like the OCLC Research Library Partnership to be that of appropriate engagement. I am in no doubt about the value of the work performed by the staff of OCLC Research. I worked among them for three years, and was impressed day and daily by their intelligence, dedication, and powers of analysis. And so I am sure that we in the Library of the University of St Andrews will want to continue to take advantage of our membership of the Partnership. But to do so does require some virtuous behaviour on our part – a little like going to the gym regularly, or eating yogurt every day. We need to clear some space in diaries thick with meetings and commitments, to read the reports, attend the webinars, take part in the Working Groups, and – every once in a while – book some days out of the office to travel to meetings somewhere else. As a University Librarian I can now see how difficult it can be to make the spaces which these virtuous behaviours require. This is difficult for me, and in some ways it is even more difficult for those members of my Library staff who would benefit from and enjoy the aerial view over sector-wide problem analysis and resolution.

The challenge of behaving virtuously is differently shaped, according to type of research library, and by territory. A small research-intensive library in the UK, I can say with feeling, is likely to be quite different from a small research library in the US. A publicly-funded university will have a different shape from a private. A UK university library will have a different shape from a North American, or northern or southern European library. National libraries have their own shape – but there too the differences are significant (as is very evident in the UK). St Andrews already engages with the Partnership, but I look to OCLC Research to understand my particular difficulties in behaving virtuously, and to shape its way of engaging with me appropriately. As my friend and former OCLC Research colleague Lynn Silipigni Connaway regularly says in her presentations, ‘One size fits … nobody’. I know that the reshaped OCLC Research Library Partnership is already onto this, but there need to be plural forms of engagement within the Partnership, in order to deliver our dedicated and highly capable Partner professionals to the airspace in which the bird’s-eye views can be shared and the problems revealed and tackled.

I thought it would be good for some of our Partner ULs, AULs, etc to post about their own perspectives, and I am grateful to Jim Michalko for offering the pages of this blog as a venue. So – what does it feel like to be St Andrews University Library? Watch this space.

Work for (and with) us!

Tuesday, March 15th, 2011 by Merrilee

I’m happy to report that we have two new job openings in OCLC Research, both based in our office in Leiden (Netherlands). This is a rare opportunity to be a part of what I think is the best team on the planet. (I’m willing to concede that there could better groups to work with, but I’m not aware of them.

Check out the job listings and please forward to anyone who may be interested or well-suited for the positions of Research Scientist or Senior Program Officer.

What We’re Reading — Week of March 7, 2011

Friday, March 11th, 2011 by Bruce

A new hangingtogether feature is this round-up of a few interesting links found and shared in OCLC Research in the past week:

Make: Online » Is It Time to Rebuild & Retool Public Libraries and Make “TechShops”?

File under the future of public libraries. Love that the author hasn’t been a heavy public library user but still loves them. Also interesting to note that the first tool library was in Columbus! Hat tip to Eric Celeste.

Liber Quarterly – “Free Library Data?”

A very even-handed description of the landscape of library data, particularly from a European perspective


An old article about Leslie McFarlane, the bitter writer who was Franklin W. Dixon.

Google LatLong: You’ve got better things to do than wait in traffic

This would be handy. Made me want one for “book traffic”, showing me the quickest path to reading (print or e, bought or borrowed)

Mapping the Nation’s Well-Being – The Gallup-Healthways Well-Being Index – Interactive Map –

and here in California Congressional District 14 we are happiest of all the citizens. Really.

More From the “Murky Bucket”

Thursday, March 3rd, 2011 by Roy

The inspiration for my title comes from Lorcan Dempsey, who some years ago, before I joined him at OCLC, put a name to the unease I had been feeling about the state of library metadata. In a Library Journal column I had bemoaned the fact that not only was it impossible for library users to limit a search to online items available online in full, it was impossible for us to even implement such a feature.

Lorcan responded to that column, citing the ” ‘murky bucket syndrome’ that affects any large bibliographic database—we cannot entirely, unambiguously slice and dice the database because of historic data entry and cataloging practices that…were not oriented toward our new needs.” I’ll say. Also, around that time my soon-to-be colleagues at OCLC Research wrote a paper about some related work they had done: “Mining for Digital Resources: Identifying and Characterizing Digital Materials in WorldCat”.

Later I did a deeper investigation into this while still at the California Digital Library, from which came an informal report called “Trouble in Online Paradise: An Analysis of MARC 856 Usage at One Institution”. Basically, I took 1,000,000 MARC records from UC Berkeley, pulled out all of the 856 fields (about 20,000 at the time), and analyzed them. Since I have that work on my prototype server, you can still play around with it if you want.

Read the rest of this entry »