Archive for the 'Libraries' Category

Authority for Establishing Metadata Practice

Monday, April 7th, 2014 by Karen

A metadata fkiw duagram
That was the topic discussed recently by OCLC Research Library Partners metadata managers. Carlen Ruschoff (U. Maryland), Philip Schreur (Stanford) and Joan Swanekamp (Yale) had initiated the topic, observing that libraries are taking responsibility for more and more types of metadata (descriptive, preservation, technical, etc.) and its representation in various formats (MARC, MODS, RDF). Responsibility for establishing metadata practice can be spread across different divisions in the library. Practices developed in relative isolation may have some unforeseen outcomes for discovery in awkward juxtapositions.

The discussion revolved around these themes:

Various kinds of splits create varying metadata needs. Splits identified included digital library vs. traditional; MARC vs. non-MARC; projects vs. ongoing operations. Joan Swanekamp noted that many of Yale’s early digitization projects involved special collections which started with their own metadata schemes geared towards specific audiences. But the metadata doesn’t merge well with the rest of the library’s metadata, and it’s been a huge amount of work to try to coordinate these different needs. There is a common belief in controlled vocabularies even when the purposes are different.  The granularity of different digital projects makes it difficult to normalize the metadata. Coordination issues include using data element in different ways, not using some basic elements, and lack of context. Repository managers try to mandate as little as possible to minimize the barriers to contributions. As a result, there’s a lot of user-generated metadata that would be difficult to integrate with catalog data.

Metadata requirements vary due to different systems, metadata standards, communities’ needs. Some digital assets are described using MODS (Metadata Object Description Schema) or VRA. Graphic arts departments need to find images based on subject headings, which may result in what seems to be redundant data. There’s some tension between specific area and general needs. Curators for specific communities such as music and divinity have a deeper sense of what their respective communities need rather than what’s needed in a centralized database. Subject headings that rely on keyword or locally devised schemes can clash with the LC subject headings used centrally.  These differences and inconsistencies have become more visible as libraries have implemented discovery layers that retrieve metadata from across all their resources.

Some sort of “metadata coordination group” is common.  Some libraries have created metadata coordination units (under various names), or are planning to. Such oversight teams provide a clearing house to talk about depth, quality and coverage of metadata. An alternative approach is to “embed” metadata specialists in other units that create metadata such as digital library projects, serving as consultants. After UCLA worked on ten different digital projects, it developed a checklist that could be used across projects: Guidelines for Descriptive Metadata for the UCLA Digital Library Program (2012). It takes time to understand different perspectives of metadata: what is important and relevant to curators’ respective professional standards.  It’s important to start the discussions about expectations and requirements at the beginning of a project.

We can leverage identifiers to link names across metadata silos. As names are a key element regardless of which metadata schema is used, we discussed the possibility of using one or more identifier systems to link them together. Some institutions encourage their researchers to use the Elsevier expert system. Some are experimenting with or considering using identifiers such as ORCID (Open Researcher and Contributor ID), ISNI (International Standard Name Identifier) or VIAF (Virtual International Authority File). VIAF is receiving an increasing number of LC/NACO Authority File records that include other identifiers in the 024 field.

Implications of BIBFRAME Authorities

Thursday, April 3rd, 2014 by Karen


Bibframe graphicThat was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Philip Schreur of Stanford. We were fortunate that several staff from the Library of Congress involved with the Bibliographic Framework Initiative (aka BIBFRAME) participated.

Excerpts from On BIBFRAME Authority  dated 15 August 2013 served as background, specifically the sections on the “lightweight abstraction layer” (2.1- 2.3) and the “direct” approach (3). During the discussion, Kevin Ford of LC shared the link to the relatively recent BIBFRAME Authorities draft specification dated 7 March 2014, now out for public review:

The discussion revolved around these themes:

The role of identifiers for names vis-à-vis authority records. Ray Denenberg of LC noted that when the initiative first began, the framers searched unsuccessfully for an alternate name for “authorities” as it could be confused with replicating the LC/NACO or local authority files that follow a certain set of cataloging rules and are constantly updated and maintained. BIBFRAME is meant to operate in a linked data environment, giving everyone a lot of flexibility. The “BIBFRAME Authority” is defined as a class that can be used across files. It could be simply an identifier to an authoritative source, and people could link to multiple sources as needed. The identifier link could also be used to grab more information from the “real” authority record.

Concern about sharing authority work done in local “light abstraction layers.” It was posited that Program for Cooperative Cataloging libraries, and others, could share local authorities work and expose it as linked data. This is one of the objectives for the Stanford-Cornell-Harvard Linked Data for Libraries experiment. They plan to use a type of shared light abstraction model, where they may share URIs for names rather than each institution creating their own. Concerns remain about accessing, indexing and displaying shared local authorities across multiple institutions, and the risk of outages that could hamper access. Although libraries could develop a pared down approach to creating local authority data (which may not be much more than an identifier) and then have programs that pull in more information from other sources, some feared that data would only be created locally and not shared and libraries would not ingest richer data available from elsewhere.

Alternate approaches to authority work. Given the limited staff libraries have, fewer have the resources to contribute to the LC/NACO authority file as much as they have in the past. The lightweight model could serve as a place for identifiers and labels, and allow libraries to quickly create identifiers for local researchers prominent in the journal literature but not reflected in national authority files. Using identifiers instead of worrying about validating content—doing something quick locally that you can’t afford to do at a national level—is appealing. Alternatively, a library could bring in information from multiple authority sources—each serving a different community—noting the equivalents and providing an appropriate label.  BIBFRAME Authority supports both approaches. Other sources could include those favored by publishers rather than libraries, such as ORCID (Open Researcher and Contributor ID) or ISNI (International Standard Name Identifier), or by other communities such as those using EAC-CPF (Encoded Archival Context – Corporate bodies, Persons and Families). This interest overlaps the OCLC Research activity on Registering Researchers in Authority Files.

Concern about the future role of the LC/NACO Authority File.  Some are concerned that if libraries chose to rely on identifiers to register their scholars or bring in information from other sources, fewer would contribute to the LC/NACO Authority File. Will we lose the great database catalogers have built cooperatively over the past few decades? Some would still prefer to have one place for all authority data and do all their authority work there. LC staff noted that a program could be run to ingest authority data done in these local (or consortial) abstraction layers into the LC/NACO Authority File.

Issues around ingesting authority data. We already have the technology to implement Web “triggers” to launch programs that pull in information from targeted sources and write the information to our own databases. OCLC Research recently held a TAI-CHI webinar demonstrating xEAC and RAMP (Remixing Archival Metadata Project), two tools that do just that. There are other challenges such as evaluating the trustworthiness of the sources, selecting which ones are most appropriate for your own context and reconciling multiple identifiers representing the same entity. Some are looking for third-party reconciliation services that would include links to other identifiers.

Those interested in the continuing discussion of BIBFRAME may wish to subscribe to the BIBFRAME listserv.



Happy Extraterrestrial Abduction Day!

Wednesday, March 19th, 2014 by Roy

abductionJust in time for Extraterrestrial Abduction Day, commemorated by Earthlings everywhere on 20 March, we bring you the list of the top ten most popular library items with the subject heading “Alien abduction”:

  1. Close Encounters of the Third Kind (1977  movie)
  2. The True Meaning of Smekday, by Adam Rex
  3. Abduction: Human Encounters With Aliens, by John E. Mack
  4. Abducted: How People Come to Believe They Were Kidnapped By Aliens, by Susan A. Clancy
  5. Transformation: The Breakthrough, by Whitley Strieber
  6. Little Green Men, by Christopher Buckley
  7. Close encounters of the fourth kind : alien abduction, UFOs, and the conference at M.I.T., by C.D.B. Bryan
  8. The Light-Years Beneath My Feet, by Alan Dean Foster
  9. The Fourth Kind (2009 movie)
  10. Skyline (2010 movie)

For even more suggestions on library items one can borrow to get in the mood, see what FictionFinder suggests.

But in all of our excitement celebrating Extraterrestrial Abduction Day let’s not forget the most important item of all: How to Defend Yourself Against Alien Abduction, by Ann Drufell (see cover). I mean, whether you are returned to Earth or not, the best outcome of any attempt by aliens to abscond with you is to not be abducted in the first place. At least, that’s what I’m thinking.

Another Step on the Road to Developer Support Nirvana

Monday, March 10th, 2014 by Roy

devnetToday we released a brand spanking new web site for library coders. It has some cool features including a new API Explorer that will make it a lot easier for software developers to understand and use our application program interfaces (APIs). But seen from a broader perspective, this is just another way station on a journey we began some years ago to enable our member libraries to have full machine access to our services.

When I joined OCLC in May 2007, I immediately began collaborating with my colleagues in charge of these efforts, as I knew many library developers and had been active in the Code4Lib community. As a part of this effort, we flew in some well-known library coders to our headquarters in Dublin, OH, to pick their brains about the kinds of things they would like to see us do, which helped us to form a strategy for ongoing engagement.

From there we hired Karen Coombs, a well-known library coder from the University of Houston, to lead our engagement efforts. Under Karen’s leadership we engaged with the community in a series of events we began calling hackathons, although we soon changed to calling them “mashathons” in response to the pejorative nature the term “hack” had in Europe. In those events we brought together library developers to spend a day or two of intense learning and open development. The output of those events began populating our Gallery of applications and code libraries.

Karen also dug into the difficult, but very necessary, work to more thoroughly and consistently document our APIs. Her yeoman work in this regard helped to provide a more consistent and easier to understand and use set of documentation from which we continue to build upon and improve.

When Karen was moved into another area of work within OCLC to better use her awesome coding ability, Shelley Hostetler was hired to carry on this important work.

In this latest web site release I think you will find it even easier to understand and navigate. One essential difference is it is much easier to get started since we have better integrated information about, and access to, key requesting and management when those are required (some services do not require a key).

Although this new site offers a great deal to developers who want to know how to use our growing array of web services, we recognize it is but another step along the road to developer nirvana. So check it out and let us know how we can continue to improve. As always, we’re listening!


The Most Edited Book Records in WorldCat

Friday, February 7th, 2014 by Roy

hungergamesIn my last post I identified the most edited records in WorldCat, which, no surprise, were all serials. Someone who read the post asked about this information by format (e.g., books, maps, scores, etc.). I doubt that I will get to all of the various formats, but I decided to take a look at books.

Unlike serials, for which I noted those that had 60 or more edits, for books I had to lower the threshold to 40 to get any at all (the most edited item had 58 edits). So here are the book records which have been edited more than 39 times in WorldCat (in no particular order):

An inevitable conclusion from the above seems to be that the more libraries that hold a book the more likely a cataloger will be to touch the record for it, which would explain how Harry Potter and the Hunger Games books made it on the list.

Countries of Publication in WorldCat

Tuesday, December 10th, 2013 by Roy

I’m a data geek. I just love processing data in various ways to see what I can find out. So recently I decided to look into the countries of publication as recorded in the 300+ million MARC records in WorldCat. Just for kicks I did some processing of the 260 $a subfield, which is  the “Place of publication, distribution, etc.” as it appears on the piece, or noted in various other ways if it doesn’t.

As you might imagine, what results from such an investigation is a complete dog’s breakfast, with a large variety of punctuation marks, typographical errors, imaginative spellings, and just plain junk. No, it is much better to parse bytes 15-17 of the 008 field, which at least are supposed to only contain values from this list maintained by the Library of Congress. Progress.

That is, until one discovers that this “Code List for Countries” is not exactly that. If you happen to be in a certain select part of the world (mostly the United States, Canada, and Australia), you can also select state or province-specific codes. So before I used this table to translate the codes for actual countries I first had to translate the table, so that the code for “California” translated instead to “United States”. Progress.

Oh, and then countries have this tiresome tendency to change over time. The Soviet Union broke up. Czechoslovakia split into two. And don’t even get me started about the hot mess that used to fall under the general term of “Micronesia”. So I had to make some executive (and no doubt indefensible) decisions about how to deal with those. By and large, if I could identify some geography (e.g., Uzbekistan) that had a former life that could also be identified (e.g., Uzbek S.S.R.), I translated them both into the current entity. But lord only knows how many items that don’t have this distinction end up being miscounted. But progress of some sort nonetheless.

Oh, and places like “West Berlin” got their own code. How quaint. But now I’m just whining.

In the end I had the table translated into my twisted view of reality and could run my program against the entirety of WorldCat, parsing out the precious three bytes from the 008 and running my undoubtedly flawed translation on the result. I just love that “Unknown” came out on top. Somehow, after this journey, it seemed fitting.

With no further ado, here are the top 25 “countries” of publication from the records in WorldCat:

74,330,023  Unknown
52,460,566  United States
34,014,675  Germany
24,374,828  United Kingdom
21,009,805  France
 9,142,988  Japan
 8,706,853  China
 7,950,373  Spain
 6,649,599  Italy
 6,312,625  Netherlands
 6,142,256  Canada
 5,641,525  Switzerland
 3,725,639  Russia
 3,516,374  Australia
 3,310,194  Poland
 2,923,655  Denmark
 2,739,910  Sweden
 2,219,850  India
 1,996,800  Slovenia
 1,936,800  Austria
 1,612,948  Belgium
 1,518,478  Israel
 1,514,824  Brazil
 1,412,034  Mexico
 1,197,454  Finland

The full list is here. Knock yourself out. I sure did.

Learning Commons: well-made in Japan

Wednesday, November 27th, 2013 by Jim

During a very hectic, very interesting week visiting research libraries in Japan last week I had the good fortune to tour the new (April 2013) Learning Commons at Doshisha University. It is not a library-managed facility but the library helps to staff it along with other Student Support Services staff. The facility itself is as good an implementation as I’ve seen anywhere including the new facilities at North Carolina State University’s new library. The Doshisha University Learning Commons brochure

The Commons itself is a multi-story structure constructed adjacent to the library and connected to the library at various levels. As a consequence students can move very freely from the collections and quiet of the traditional library to the group study, presentation, production and technology areas of the learning commons. There are plenty of visible but unobtrusive staff available to the students. People in red jackets offer technology support, in blue jackets peer instruction and guidance, in yellow you get media production and on each floor a desk staffed by a librarian.

There are no fixed furnishings in the entire facility. Everything can be moved. As an experiment they left one group study space with two tables without rollers. That space is the most infrequently used in the building. I was impressed with the energy of the staff and the enthusiasm of the students. The location of the facility bordering on one of the busiest streets in Kyoto purposely serves to advertise the learning environment of this private university. The big study and computing rooms are lined up along picture windows that face out onto this boulevard ensuring that Kyoto citizens know that Doshisha is a good place to learn.

Check out some photos taken during my walk-through in this Flickr set. Look for the Global Village sign that designates an area where no Japanese is to be spoken.

P.S. After the original post my colleagues at Doshisha advised me that an English language version of their Learning Commons brochure is available (.pdf).

Metadata for digital objects

Tuesday, November 26th, 2013 by Karen

That was the topic discussed recently by OCLC Research Library Partners metadata managers. It was initiated by Jonathan LeBreton of Temple, who noted the questions staff raised when describing voluminous image collections such as: Do we share the metadata even if it would swamp results? What context can be provided economically? What are others doing both in terms of data schemas and where the metadata is shared?

The discussion revolved around these themes:

Challenges in addressing the sheer volume of digital materials.  Managers are making decisions based on staffing, subject expertise, collection’s importance and funding. It was suggested that some metadata could be extracted from the technical metadata, such as dates and location. We discussed the possibility of crowd-sourcing metadata creation, although experience to date is that a few volunteers are responsible for most contributions, and the successful examples tend to be for transcription, editing OCR’d text, and categorizations. (The At a Glance: Sites that Support Social Metadata chart indicates the ones that enhance data either through improved description or subject access.) The context must matter to people for them to volunteer their efforts. (See the OCLC Report, Social Metadata for Libraries, Archives and Museums: Executive Summary.) With the anticipated increase of born-digital and other digitized materials, there’s a greater need for batch and bulk processing.

Grappling with born-digital materials.  Libraries are receiving the digital equivalents of personal papers and using the Forensic Toolkit to “process” these digital collections.  Preservation and rights management, in addition to description, are important components and no commercially available system yet addresses these needs. The Association of Research Libraries is working with the Society of American Archivists to customize its Digital Archives Specialist (DAS) Program to develop the requisite skills for managing born-digital for ARL library staff. OCLC Research has produced several reports in conjunction with its Demystifying Born Digital program of work.

Concerns about “siloization”, or proliferation of “boutique” collections, using different metadata schema. Metadata is being created in different native systems within an institution, metadata that is often not loaded into a central catalog or even accessible in the local discovery layer. User-created metadata in institutional repositories may be OAI harvested by OCLC and thus may appear in WorldCat even if not visible in the institution’s local discovery tool. Managers grapple with whether to spend resources on updating such metadata before it is exposed for harvesting.  Another challenge is deciding what to include in which discovery layer, and what should be silo’d.  The numerous repositories within an institution can result in complex metadata flows for discovery, as illustrated by UC San Diego’s Prezi diagram. Some institutions map their various metadata schema to MODS (Metadata Object Description Schema), but all non-MARC metadata is converted to MARC when loaded into WorldCat.

What are the “essential elements” to provide access across collections? We posited that librarians have been discussing “core” or “essential” metadata elements for decades, starting with Dublin Core and the Program for Cooperative Cataloging’s “BIBCO Standard Record”. Librarians have been entering metadata for the system it was designed for, but then ultimately the data moves to another system later.  Library metadata is no longer confined to a single system: it may be exposed to search engines and viewed with lots of non-library metadata.

The Library of Congress’ Bibliographic Framework Initiative  portends a future where all metadata will be “non-MARC” and we will rely more on linked data URIs in place of metadata text strings.  How can we use the promise of that future to get to where we need to be?

WorldCat shows dispersal of global resources

Tuesday, November 19th, 2013 by Karen

Number of institutions with WorldCat holdings for Arabic-language resources

Number of institutions with WorldCat holdings for Arabic-language resources

A differentiating feature of WorldCat is that it includes more than two billion holdings of libraries from around the world. My colleague Roy Tennant recently generated statistics on the Arabic-language resources described in WorldCat records. I was struck by the dispersal of the holdings of those materials, as shown in the map above.

Big caveat: Many strong Arabic-language collections are under-represented or not represented at all in WorldCat. Even so, we can see at a glance that Arabic-language materials are collected by institutions in countries far away from the counties of origin, even where Arabic is not widely spoken. Scholarship is international. We could produce similar maps for other language materials.

My colleague Brian Lavoie’s report earlier this year, Not Scotch, but Rum: The Scope and Diffusion of the Scottish Presence in the Published Record, describes in detail his analysis of holdings for materials published in Scotland, by Scottish people, and about Scotland. It concludes, “Most holdings of materials in the Scottish national presence are by institutions outside Scotland, which reminds us that a national presence in the published record may be primarily manifested outside the home country’s borders.”  

Does technical services still have a distinct role?

Monday, October 28th, 2013 by Karen

That was the topic discussed recently by OCLC Research Library Partners metadata managers from seven countries. It was initiated by Philip Schreur of Stanford (and recently Chair of the Program for Cooperative Cataloging), who noted that although “technical services” had traditionally been organized around the modules of a local system, changes in the library environment have resulted in some major restructuring. Libraries have increased their use of outsourcing and now batchload records from vendors or other sources, blurring the lines between library and IT, and vastly reducing the number of materials that need to be cataloged manually locally. This in turn has allowed staff to devote time to broader issues of discovery and data management, and make strategic alliances with new partners outside of technical services.  Meanwhile, “metadata creation” is needed for resources not always part of the local catalog, such as digital collections or materials in an Institutional Repository.

The discussion revolved around these themes:

More widespread use and need for metadata, far beyond the traditional “bibliographic” metadata created by technical services staff.  Metadata specialists (a new alternative for “catalogers”) now deal with metadata of all types, with decreased focus on print and more emphasis on digital.  Technical services staff aspire to provide intellectual access to all resources, beyond those represented in the local catalog. A common discovery tool has driven the movement to more active metadata integration from the beginning of projects to ensure that metadata is cohesive.

Changing service portfolios and workflows, with new or expanded expectations. Technical services staff have taken on tasks that used to be done elsewhere. Among these new tasks: authority control for the institutional repository; managing electronic resources and licenses; integration with special collections and archives; helping researchers organize their data; creating metadata for digital projects; producing reports, dataloading and installing system upgrades (which used to be done by systems staff). There is a challenge to balance the workload between the influx of electronic and digital resources with print backlogs. Sensitivity to “organizational culture” in different units is more important than organizational structure.

New collaborations within the institution and with other organizations. Technical services staff  increasingly work in cross-divisional teams, such as staff involved with digital projects, archives, data mining, IT and liaisons with faculty. Alison Felstead at Oxford referred to two new posts in the Oxford institutional repository who report to cataloging but are part of the systems staff.  Libraries would like to work more closely with publishers to load metadata for e-resources into commonly used tools.

Need for new skill sets.  Managers need to “build digital confidence” in their staff—provide training in what is required to adequately describe and provide access to digital and electronic resources, and allow periods for experimentation. There is competition to recruit computer-savvy staff with IT, where the pay scale is much higher.

Several noted the need to evolve beyond “boutique-y” collection development and the need for a “metadata shepherd”. (Stanford recently posted a position for a “Metadata Strategist”.)  In general, we are seeing an emerging trend towards more fluid structures that allows staff to adapt to new workflows rather than organized around traditional functions.