Archive for the 'Research Note' Category

Visualizing Network Flows: Library Inter-lending

Friday, July 12th, 2013 by Constance

Sankey intra-CIC flows

As part of our joint research project with the CIC Center for Library Initiatives and the OSU Library, we’re examining inter-lending flows within and outside of the 13-member CIC consortium. We are using a subset of the OCLC WorldCat Resource Sharing (WCRS) transaction data archive for this analysis. Our current data-set comprises 1.33 million request transactions, representing nearly 900,0000 individual titles loaned by CIC libraries over the past several years.

As Max Klein has noted elsewhere in this blog, the OCLC Research group is starting to experiment with new approaches to data visualization using the R statistical modeling environment and D3 JavaScript code library. (The WorldCat Live prototype is a nice example of how these are being put to use.)  We are eager to integrate some of this experimental work into the ongoing CIC analysis.

Bruce Washburn, Brian Lavoie and I met recently to look at examples in Mike Bostock’s D3 gallery, which provides some great examples of how different visualization techniques can be implemented.  We were looking for examples that were fit to purpose for this particular project.  Since inter-lending is a library-centric approach to balancing supply and demand, Brian suggested that we focus on examples that are particularly expressive for modeling import/export flows. We settled on force-directed graphs and Sankey flow diagrams as good candidates for further exploration. And because we are especially interested in understanding the flow of library resources across geographies, we decided it was worth some additional work to enhance our WCRS transaction dataset with geo-codes, so that we can experiment with mapping flows across regions.

From his experiments with TopicWeb, Bruce has developed some facility with D3 and he is now doing some work with R. But before we run head-long into any new development work, I wanted to do some low-level experimentation to see if the data we have in hand, and the questions we are trying to answer, lend themselves to visualization in Sankey diagrams.

When Brian, Bruce and I met to discuss models, we spent a fair bit of time discussing this diagram of horse import/export activity in Europe. It wasn’t until this week that I realized it had been produced by the prolific blogger and Open Data advocate Tony Hirst (aka @psychemedia, whom I’ve followed on Twitter for a long while) as an experiment in formatting data for Sankey diagrams. His blog post on this topic is great — unfortunately, it’s way beyond my current skill to implement. But in the comments, I noted that Bruce Mcpherson had developed some VBA code that uses Excel as the data input to a D3 Sankey library. This was just what I needed for some quick experimentation with our current data set.

Here are a few illustrative screenshots:

The three-letter symbols correspond to OCLC institution symbols for the 29 CIC collections we are examining in this project.

Sankey sample outbound CIC flows

Another, showing the breakdown of CIC borrowing of CIC returnables:

Sankey sample CIC flows with detail

And a third, this time with some detail for both Non-CIC and CIC borrowers — NB the number of non-CIC borrowers makes it difficult to represent them all in this format, hence the block of ‘others.’

Sankey sample CIC flows with detail non and CIC

Now, these are admittedly primitive pictures of how resources flow out of CIC libraries and into other places — but they do capture some important attributes that we are interested to explore further.  For instance, it is immediately apparent that there are some major ‘sources’ and ‘sinks’ for CIC returnables. And it’s clear that while the demand generated outside the CIC is significant (greater than the demand generated within the consortium), it is extremely diffuse — spread across a population of thousands of libraries.  Both of these are important for understanding for how existing library flows can be optimized. As we refine our analysis, we’ll be examining what factors are driving demand to particular libraries:  proximity of lender, scarcity of alternative supply options, price incentives, efficiency of service (as measured by turn-around) , etc.  And we’ll be looking at new ways to use data visualization to explore — and share — interesting and important patterns in the organization of the library system.

Update:  Thanks to Tony Hirst’s comment (below) and some subsequent Twitter exchanges with @timelyportfolio,

there is now a very nice tutorial on creating Sankey diagrams using rCharts and d3.  Many thanks to klr and Tony for taking the initiative.

Subsidence and uplift – the library landscape

Thursday, April 18th, 2013 by Constance

Approximate location of maximum subsidence in the United States.
Source: http://en.wikipedia.org/wiki/File:Gwsanjoaquin.jpg

There’s been a lot of attention to geologic subsidence of late, what with all the sinkholes opening up in Florida, Louisiana and other places. Here in California, we are more often concerned with the gradual change in ground level due to the draining of aquifers that support large-scale farming.  From year to year, the difference in ground level may be nearly imperceptible but over the space of a few decades the landscape has been radically transformed.

The subsidence metaphor was on my mind recently, as I was looking over some data compiled by my colleague Thom Hickey, documenting the usage of headings (subjects and names) in WorldCat. OCLC Research has done quite a lot of work exploring new approaches to managing subject and name authorities, notably in VIAF and FAST. I was interested to see how Thom’s data might be used to measure change — uplift and subsidence — in the library landscape. By computing the frequency with which FAST and VIAF headings occur in institutional collections cataloged in WorldCat, one can identify which libraries hold the most materials related to particular topics, places and people.  And this in turn provides a measure of the relative distinctiveness of library collections, judged not in terms of the ‘rarity’ of holdings but rather by the concentration of related content.

It  seemed to me that Thom’s data might have something interesting to say about how the emergence of large-scale digitized book aggregations – HathiTrust, Google Books, etc — is altering the library environment.  It stands to reason that as these large hubs begin to consolidate content sourced from libraries (and, in Google’s case, publishers), they will displace traditional library ‘centers of excellence’ in some subject areas.  Those who remember the DLF Aquifer project will recall that the initial prototype was designed to pool digitized resources in a given subject area (initially American History, later narrowed to Abraham Lincoln and the US Civil War).  In the very large aggregations of HathiTrust and GoogleBooks, subject specialization has emerged more gradually.  There has not been much public attention to measuring the scope of subject-based collections within those aggregations, nor to benchmarking them against existing institutional holdings.*

The FAST and VIAF centers data provide evidence of both subsidence and uplift in the current collections environment — that is, shifts in centers of excellence as measured by scope of subject based holdings.  The ‘re-leveling’ that has been wrought in just a few years of large-scale digitization is already significant.  Digital aggregations have, by design or accident, emerged as important subject repositories that rival and even outrank some of the largest institutional libraries in WorldCat.

For instance,  HathiTrust, an organization not yet five years old, already holds the greatest concentration of titles on the topic of marine biology, surpassing the Library of Congress as well as two major research universities with world-class oceanography programs.

Marine biology

In the case of Marine biology, the difference between the number of titles held by HathiTrust and the Library of Congress is not very large — fewer than 200 titles.  But in other instances, the relative subsidence of traditional centers of excellence is more dramatic.  For instance, Google Books substantially outranks several major research libraries in holdings related to Russian periodicals (journals, newspapers and the like).

Russian periodicals

This represents an important change in the library system, with monumental old hubs being progressively overshadowed by new collections that are produced not by the slow accretion of library acquisitions but by large-scale digitization and (re)aggregation.  It provides a compelling illustration of how Web-scale content aggregations are altering the library operating environment.  In the case of HathiTrust especially, this disruption can (and I think should) be seen as a positive change:  it enables libraries to rethink traditional, institution-scale collection management and stewardship — a topic we examined in our Cloud-sourcing Research Collections report some years ago.

Using Thom’s ‘centers’ data, we can identify hundreds of topics and identities for which HathiTrust offers better coverage than any other library in WorldCat.  Here a few topics in which the Digital Library distinguishes itself:

Hathi top topics

And a few of the personal names for which its coverage is unrivaled:

Hathi top names

Interestingly, the other top-ranked collections (by size) for these same subjects and identities are not always the source of HathiTrust’s richness.  One might have anticipated that Hathi’s leadership was simply a by-product of aggregating content from existing centers of excellence, but in fact Hathi has developed unexpected strengths by aggregating at a very large scale from a diverse pool of contributors.  For example, Harvard University and the University of Michigan each hold sizable collections of works by the poet Jean Ingelow; yet, the richness of Hathi’s Ingelow collection is mostly due to contributions from campus libraries in the University of California system.

The FAST and VIAF ‘centers’ data provide a fascinating new vantage point on the changing collections landscape.  We’ll be looking at ways to integrate it into ongoing research projects, including the mega-regions work, where we hope it can help us detect regional collecting trends that might inform shared stewardship priorities.

*Note:  HathiTrust provides a nice visualizations and a list of subject areas in the Digital Library, based on Library of Congress classification numbers.  These provide a good overview of subject-based coverage but without reference to comparable coverage in other libraries. It is generally known that Google is selective with respect to identifying library partners, but I’m not aware of any public documentation related to a specific collection development strategy. Their aim, famously, is to provide comprehensive coverage of the world’s books, not to develop excellence in any given subject area.

Managing print books: A mega-problem?

Wednesday, December 12th, 2012 by Constance

This research note was co-authored by Brian Lavoie  and Constance Malpas.

Opportunity cost seems to be the watchword for print book collections these days. The staff, physical space, and other resources consumed by print-centric collections and services are badly needed to support new priorities in library services, such as deeper user engagement and closer alignment with changing research and learning practices. In the face of evidence of declining print book usage, combined with an ever-expanding array of digital alternatives, it is not difficult to imagine a future where “bookless” libraries are the norm.

But this may be premature. Few libraries are prepared to pack up their print books and send them to off-site high-density storage. On several highly-publicized occasions, plans to reduce local print book inventory have met vigorous opposition – witness the recent firestorm at the New York Public Library. In short, print collections pose a dilemma for libraries: they are assets too valuable to dispose of, yet sinking in priority vis-à-vis other aspects of the library service portfolio. The phrase “managing down print”, increasingly common in print management discussions, neatly captures the dueling imperatives: the need to allocate resources away from managing print book collections, but to do so in a gradual, orderly way. So the search is on for the golden mean: a viable print management strategy that can at once leverage more value out of the legacy print investment, and lower maintenance costs. This question is far from settled, but the contours of the solution are becoming apparent. First, future print management strategies are likely to be collaborative, with print books increasingly viewed as a shared asset to be managed cooperatively. Second, the scale of cooperation receiving the most attention, in terms of both planned and implemented solutions, is at the regional level.

This is not to suggest that the rest is a mere matter of detail: for example, the policy and technical infrastructures needed to support a regional strategy for cooperative print management are still in early stages of development. In the meantime, we can speculate on what a network of cooperatively-managed regional print book collections might look like. The OCLC Research report Print Management at “Mega-scale”: A Regional Perspective on Print Book Collections in North America explores a new geography of print book collections based on the concept of mega-regions. Mega-regions are geographical areas defined on the basis of economic integration and other forms of interdependence. The mega-regions framework has the benefit of basing regional boundaries on a substantive underpinning of shared traditions, mutual interests, and the needs of a common constituency.

In the report, we combine WorldCat data with an operationalization of the mega-region concept by urbanist Richard Florida to produce a network of twelve mega-regional print book collections – i.e., the collective print book holdings of all libraries in each region – corresponding to the twelve North American mega-regions identified by Florida (see figure below; click on image to view full size). We explore the salient characteristics of the mega-regional collections individually and as a group, and synthesize these characteristics into a set of stylized facts. The stylized facts are then used to explore the implications of a regionally-based, cooperative print strategy across a wide spectrum of issues, including access, management, and preservation.

(Click on image to view full-size version.)

Viewing print book collections as a cooperatively-managed regional resource yields benefits on both the supply-side and the demand side. On the demand side, aggregating the print holdings of many institutions into a single collective collection creates a resource of greater scope and depth than any single local collection. Exposing this collective collection to users around the region – or even beyond – may amplify or even create demand for print books that experience little or no local use. On the supply-side, regional coordination could streamline print management and reduce costs. Opportunities emerge for collaboration and coordination in collecting and retention decisions – for example, by diminishing excessive duplication and sharing collecting priorities across many institutions.

While our application of the mega-regions framework to print management is speculative, evidence does suggest that the organization of library stewardship is being reconfigured on a new supra-institutional, regional basis. The Western Regional Storage Trust, a cooperative effort to archive print journals in libraries in many Western (and even Midwestern) US libraries, is one among many examples.  Some of these initiatives, like the CIC Shared Print Archive or the ASERL Print Journal Archive, have the potential – if not the explicit intent – to deliver benefit at mega-regional scale:  CIC member libraries are distributed across the expansive CHI-PITTS  region and ASERL’s membership is concentrated in CHAR-LANTA.  It will be interesting to see if these natural experiments in redistributing print preservation responsibilities across broad geographies result in a richer collective resource, undergirded by a robust federation of preservation commitments, or a differently fragmented set of regional collections.

In the coming year, we’ll have an opportunity to extend our mega-regions analysis by taking a demand-side view of the North American print book collection. We’ll be working with partner libraries in the CIC (notably the Ohio State University) to examine how inter-lending data might be combined with supply-side holdings data to inform a regional print management strategy for retrospective monographic collections in CHI-PITTS. Here’s a thumbnail sketch of the regional resource, excerpted from our project proposal:

In aggregate, the print book resource held in CHI-PITTS libraries amounts to more than 40% of print book titles in North America. About 16% of these titles are unique to the region, i.e. not duplicated in any of the other eleven mega region collections. The remainder constitutes a significant preservation “backstop” for other North American libraries: 50-92% of titles held by other individual mega-regions are duplicated in CHI-PITTS libraries. Thus, investments in the preservation of print books in the CHI-PITTS region can deliver significant benefit to libraries throughout North America. Conversely, there are relatively few regional collections that duplicate a significant share of the CHI-PITTS collection, which means that the burden of print preservation responsibilities (and investments) will be largely shouldered by institutions within the region. Since less than a fifth of the print books in the region are held by academic research libraries – traditionally viewed as the institutions with the greatest stake in print preservation – it seems apparent that networks like the CIC will have an important role to play in rationalizing regional print preservation priorities and investment.

The CIC is an interesting test case for this sort of project, since all libraries in the consortium are partners in the HathiTrust Digital Library, a shared digital repository. By our reckoning, a third or more of the titles held by CIC member libraries are already “backed up” by digital preservation copies in HathiTrust.  Yet from a regional perspective, the situation is strikingly different:  we estimate that less than a fifth of the print books in CHI-PITTS are duplicated by HathiTrust. The collective preservation burden therefore remains significant even in a region with comparatively robust cooperative library infrastructure.

In regions where shared library infrastructure is less developed or less integrated, the challenges may be even greater.  Take Southern California, for example.  We estimate that the regional print book resource in the SO-CAL mega-region amounts to just under 10 million titles with about 40 million library holdings (i.e. holdings set by libraries in the region).  While much smaller in size than the CHI-PITTS collection, the SO-CAL collection represents an important regional asset and a significant stewardship concern for academic libraries in the area.  As elsewhere, these libraries are individually and collectively reassessing the opportunity costs of managing local print inventory and considering “above the institution” solutions.  Not surprisingly, smaller academic libraries look to larger research-intensive institutions as partners in the preservation enterprise and potential providers of shared infrastructure.

The University of California system, with five large research libraries and a high-density storage facility in the SO-CAL region, is an obvious focus of attention. But the infrastructure developed to support a statewide research university system with a global brand cannot simply be extended to serve all other libraries in the region. There is no shared governance model for the regional library resource, which is distributed across hundreds of public and private institutions. And there is no business model currently in place that would enable libraries to opt in to “preservation by proxy” arrangements. Yet, progress is being made. A group of library leaders from academic libraries and consortia in and around Southern California will meet later this week to begin what is certain to be a long conversation about a regional print management strategy. Bob Kieft, a long-time supporter (and sometime agitator) for collaborative collection management, has organized the meeting, which will be hosted by UCLA. It’s impossible to predict what the outcomes of the discussion might be – there is certainly no recipe for success in regional print management – but it is unquestionably an important first step in addressing what is increasingly a “mega” problem.

 

Registering researchers in authority files

Monday, October 29th, 2012 by Karen

Last month we launched a new task group of OCLC Research Library Partner staff and others who are involved in uniquely identifying authors and researchers that can be shared in a linked data environment.

We were spurred by institutions’ need to uniquely identify all their researchers to measure their scholarly output, a factor in reputation and ranking. Yet national authority files cover researchers only partially. They do not include authors that write only journal articles, or researchers who don’t publish but create or contribute to data sets and other research activities.

We see a number of activities in this “name space” with potential overlap, including: the International Standard Name Identifier (ISNI), the Virtual International Authority File (VIAF), Open Researchers & Contributor ID (ORCID), the Dutch Digital Author Identifier system (DAI), The Names Project in the UK, the Program for Cooperative Cataloging’s NACO program, researcher profile systems such as VIVO, and Current Research Information Systems (CRIS).

The Registering Researchers in Authority Files Task Group will document the benefits of researcher identification; significant challenges; trade-offs among the current approaches; and mechanisms for linking approaches and data. We are starting with use case scenarios, for example:

  • Researchers who want to identify others in their field
  • Institutions that need to collate the intellectual output of their researchers
  • Funders who want to track the outputs for awarded grants
  • Services providing persistent identifiers for researchers that need to disambiguate names.and ensure correct attributions.

We are hoping that our report will help address all of the above needs, and suggest approaches for linking data from different sources in a coherent way. Details on this activity and the task group roster —including experts from the Netherlands, the United Kingdom, and the United States—are on our new Registering Researchers in Authority Files activity page on the OCLC Research website.

If there are systems or “name authority hubs” you want to make sure we look at, please let us know with a comment below.

 

Yet more social metadata for LAMs

Monday, April 23rd, 2012 by Karen

Today we released Social Metadata for Libraries, Archives, and Museums, Part 3: Recommendations and Readings. This is the last in a series of three reports a 21-member Social Metadata Working Group from five countries produced as the result of our research in 2009 and 2010.

The cultural heritage organizations in the OCLC Research Library Partnership have been eager to expand their reach into user communities and to take advantage of users’ expertise to enrich their descriptive metadata. Social metadata—content contributed by users—is evolving as a way to both augment and recontexutalize the content and metadata created by LAMs.

Our first report, Social Metadata for Libraries, Archives, and Museums, Part 1: Site Reviews, provides an environmental scan of sites and third-party hosted social media sites relevant to libraries, archives, and museums. We noted which social media features each site supported, such as tagging, comments, reviews, images, videos, ratings, recommendations, lists, links to related articles, etc.

Our second report, Social Metadata for Libraries, Archives, and Museums, Part 2: Survey Analysis, analyzed the results from a social metadata survey of site managers conducted from October to November 2009. Forty percent of the responses came from outside the United States. More than 70 percent had been offering social media features for two years or less. The vast majority of respondents considered their sites to be successful.

This third report provides eighteen recommendations and an annotated list of all the resources the working group consulted. The key message: “We believe it is riskier to do nothing and become irrelevant to your user communities than to start using social media features.” Among our recommendations:

  • Establish clear objectives and determine what metrics you need to measure success.
  • Leverage the enthusiasm of your user communities to contribute.
  • Look at other sites similar to your own that are already using social media features successfully before you start.
  • Consider using third-party hosted social media sites rather than creating your own.

All three reports total over 300 pages, so we’ve also prepared a much shorter Executive Summary with the highlights from all three reports.

The reports and the recording of our 9 March 2012 Webinar are all available here. We look forward to hearing your feedback – perhaps on our Social Metadata for LAMs Facebook page?

As with many OCLC Research publications, this report was written to help meet the needs of the OCLC Research Library Partnership. The Partnership not only inspires but also underwrites this type of work, so many thanks to the institutions who both contribute to and support our work!

 

 

 

Social metadata for LAMs on Facebook

Monday, March 12th, 2012 by Karen

Since sites relevant to libraries, archives and museums that support social metadata are changing and new ones are appearing quickly, the Social Metadata Working Group wanted to have a way for others to share information about the new or enhanced sites they come across. They also wanted to be able to point to interesting articles, blogs and videos related to social metadata and social media. 

Following several of the group’s recommendations (in our third report, to be published soon) such as look at what others have done and consider using third-party hosted sites rather than creating your own, we’ve  created a Social Metadata for LAMs Facebook page.

Please visit the page and “like” it so you’ll see all future postings on your own Facebook wall. We also encourage you to post any comments you have about our Social Metadata for LAMs reports, new social media “site sightings” relevant to libraries, archives or museums,  or related information.

I look forward to seeing some of you on FB!

 

More social metadata for LAMs

Monday, January 16th, 2012 by Karen

Today we released Social Metadata for Libraries, Archives, and Museums, Part 2: Survey Analysis. This is the second of a series of three reports a 21-member Social Metadata Working Group from five countries produced as the result of our research in 2009 and 2010.

The cultural heritage organizations in the OCLC Research Library Partnership have been eager to expand their reach into user communities and to take advantage of users’ expertise to enrich their descriptive metadata. Social metadata—content contributed by users—is evolving as a way to both augment and recontexutalize the content and metadata created by LAMs.

Our first report, Social Metadata for Libraries, Archives, and Museums, Part 1: Site Reviews, provides an environmental scan of sites and third-party hosted social media sites relevant to libraries, archives, and museums. We noted which social media features each site supported, such as tagging, comments, reviews, images, videos, ratings, recommendations, lists, links to related articles, etc.

The second report is our analysis of the results from a social metadata survey of site managers conducted from October to November 2009. Forty percent of the responses came from outside the United States. A few highlights:

  • More than 70 percent had been offering social media features for two years or less.
  • Engaging new or existing audiences is used as a success criteria more frequently than any other criteria.
  • A minority of survey respondents are concerned about the way the site’s content is used or repurposed outside the site.
  • Spam and abusive user behavior are sporadic and easily managed.
  • The survey results indicate that engagement is best measured by quality, not quantity.
  • The vast majority of respondents considered their sites to be successful.

The upcoming third report provides recommendations on social metadata features most relevant to libraries, archives, and museums and factors contributing to success and an annotated list of all the resources the working group consulted.

As with many OCLC Research publications, this report was written to help meet the needs of the OCLC Research Library Partnership. The Partnership not only inspires but also underwrites this type of work, so many thanks to the institutions who both contribute to and support our work!

We look forward to hearing your feedback!

 

 

 

Social metadata for LAMs

Monday, October 3rd, 2011 by Karen

Metadata helps users locate resources that meet their specific needs. But metadata also helps us to understand the data we find and helps us to evaluate what we should spend our time on. Traditionally, staff at libraries, archives, and museums (LAMs) create metadata for the content they manage. However, social metadata—content contributed by users—is evolving as a way to both augment and recontexutalize the content and metadata created by LAMs.

The cultural heritage organizations in the OCLC Research Library Partnership are eager to expand their reach into user communities and to take advantage of users’ expertise to enrich their descriptive metadata. In 2009 and 2010, a 21-member Social Metadata Working Group from five countries reviewed 76 sites of most relevance to libraries, archives, and museums that supported such social media features as tagging, comments, reviews, images, videos, ratings, recommendations, lists, links to related articles, etc. The working group analyzed the results of a survey sent to site managers and discussed the factors that contribute to successful—and not so successful—use of social metadata. The working group considered issues related to assessment, content, policies, technology, and vocabularies. Central to the working group’s interest was how to take advantage of the array of potential user contributions that would improve and deepen their users’ experiences.

Our first of three reports, Social Metadata for Libraries, Archives, and Museums, Part 1: Site Reviews, provides an environmental scan of sites and third-party hosted social media sites relevant to libraries, archives, and museums. It summarizes the results of our review, captured in the “At a Glance: Sites that Support Social Metadata” spreadsheet, and more detailed reviews of 24 representative sites. Cyndi Shein, assistant archivist at the Getty Research Institute, wrote the section on LAMs’ use of third-party sites and blogs. The second report is an analysis of the results from a survey of site managers conducted from October to November 2009. The third report provides recommendations on social metadata features most relevant to libraries, archives, and museums and factors contributing to success and an annotated list of all the resources the working group consulted.

As with many OCLC Research publications, this report was written to help meet the needs of the OCLC Research Library Partnership. The Partnership not only inspires but also underwrites this type of work, so many thanks to the institutions who both contribute to and support our work!

We look forward to hearing your feedback!

 

A crowdsourcing success story

Monday, March 21st, 2011 by Karen

I’m a great fan of the National Library of Australia’s Trove, a single search interface to 122 million resources—books, journals, photos, digitized newspapers, archives, maps, music, videos, Web sites—focused on Australia and Australians. You can search the OCR’d text of over 45 million newspaper articles that have been digitized.

OCR is not perfect. The original document is juxtaposed with the OCR transcription so errors are immediately apparent. Since the Australian Historic newspapers public launch in July 2008*, people have been correcting errors in the OCR’d text. Both the corrected text and the original text are indexed and searchable.

The enthusiasm of these public text correctors is amazing! The 15 March 2011 Trove newsletter notes:

Text correctors are still doing an outstanding job of improving the electronically translated text, and the number of corrections each month continues to increase. In January we had over 2 million lines of text corrected in a month for the first time, which continued through February. The running total of corrected lines has now reached 31 million!

One of the issues the RLG Partners Social Metadata Working Group addressed was to what degree moderation was needed when opening up the descriptions of cultural heritage resources to user contributions. The responses to the social metadata working group’s survey of site managers indicated that spam or “inappropriate behavior” was not a problem. Rose Holley (a member of the working group) provides additional corroboration that spam and derogatory comments were not a big problem after a careful review of comments.

…recently we made a decision to manually review the 18,000 comments that have been added to newspaper articles and other items in Trove. We found only 114 spam comments with URL that were removed and 71 comments placed by the same user in the same week that breached our terms and conditions (derogatory). These were also removed.

We thought that was very good news and supported our theory that moderation is still not required. We have however added a feature that enables a user to easily report spam via the trove forum.

This supports one of the working group’s recommendations: Go ahead! Invite user contributions without worrying about spam or abuse.

* For more details, see Holley, R. (2009) Many Hands Make Light Work: Public Collaborative OCR Text Correction in Australian Historic Newspapers, National Library of Australia, ISBN 9780642276940 http://www.nla.gov.au/ndp/project_details/documents/ANDP_ManyHands.pdf.

Sorting out Demand: some thoughts on library inter-lending

Friday, July 30th, 2010 by Constance

Over the past few years, OCLC Research has done quite a bit of analytic work based on what my colleague Brian Lavoie refers to as “supply-side” data. Examples include the well-known Google 5 study, as well as a variety of projects examining the library long tail, several of them summarized in an article Lorcan published some time ago. Much of this work has been based on data aggregated in the WorldCat bibliographic database. These data have been contributed over many years by OCLC members to support a variety of shared library services, including cooperative cataloging and inter-lending operations; as a secondary effect, the aggregation has provided a rich source of information about the system-wide library collection that is regularly mined in both internal and extra-mural research projects.

More recently, we have begun to think about how we might make better use of the demand-side data that is generated by a variety of routine library operations, especially circulation and inter-lending.  Lorcan in particular has given thought to how “intentional data” might usefully shape library service provision.

Inter-library loan transactions are a particularly interesting example of intentional data, I think. Read the rest of this entry »