Analyzing MARC tags and projecting MARC’s future

The RLG Partners working group that has been gathering and analyzing evidence over the past two years about MARC tag usage to inform library metadata practices completed its work. The 72-page Implications of MARC Tag Usage on Library Metadata Practices report was published on March 12 — with links to thirteen detailed data tables for those who love to immerse themselves in statistics. They’re spreadsheets, so you can also filter and sort the data as you like.

The working group’s studies focused on machine applications. This is an important user category that has generally been ignored in user studies.  MARC data is also used for machine matching and manipulations, linking, harvesting, collection analysis, ranking, and providing systematic views of publications. If we envision a future of linked data so that all the work information professionals have invested into creating and maintaining legacy MARC data are available to the rest of the information universe, machine applications will become increasingly important. Future encoding schemas will need to have a robust MARC crosswalk to ingest our millions of legacy records.

We believe that MARC data cannot continue to exist in its own discrete environment. It will need to be leveraged and used in other domains to reach users in their own networked environments. With the increase of digitized full text from various mass digitization efforts, we advise MARC practitioners to focus on authorized names, classifications, identifiers, and controlled vocabularies that key-word searching of full-text will not provide, rather than on “descriptive metadata”.

The working group held a Webinar on March 18, 2010 to discuss its findings and projections for MARC’s future with those interested. I was grateful that Catherine Argus at the National Library of Australia was willing to get up extra early to present her work, at 7:00 am local time, so that RLG Partner staff on the east coast of the US could join the discussion at 4:00 pm EDT. A couple of Catherine’s colleagues at the NLA also listened in. Lisa Rowlison de Ortiz (University of California, Berkeley), who collaborated on the executive summary which pulled together all our work and presented the working group’s views on MARC’s future summarized above, also joined the discussion. The recording of that Webinar will be available on the OCLC Research’s Webinars page soon.

The working group members each selected a topic to research, and then wrote a report summarizing the findings, which we presented during the Webinar:

  • I analyzed MARC tag usage in WorldCat, with a focus on how tag usages differed in non-book formats compared to the rest of WorldCat. This was based on a September 2009 snapshot, when WorldCat contained 145 bibliographic records. I focused only on the tags described in MARC 21 documentation, excluding OCLC-specific and local tags so people could compare tag usages with those most likely to be found in their own catalogs. Only 39 MARC tags (of 199 total) occur in 5% or more of WorldCat records. The number of tags that occur in 10% or more of non-book records range from 21 (Scores) to 30 (Computer files). Some of those tags are far more heavily used in specific formats than WorldCat as a whole, where they’re scarcely used.
  • Hugh Taylor (University of Cambridge) analyzed the MARC tags used for matching records while building five aggregated databases — the Research Libraries UK’s union catalog for record retrieval, COPAC (the public union catalog derived from the RLUK database), WorldCat, the former RLG Union Catalog, and Libraries Australia — and compared the tags used with those mandated by the Program for Cooperative Cataloging and OCLC Level 3 records. He found that only five elements in the leader, four MARC fields, and a few core bibliographic data elements are common in matching. (Because of the late hour of the Webinar in the UK, Hugh was the only working group member who couldn’t participate, but he reviewed the slides we used.)
  • Catherine Argus analyzed the MARC tags that are indexed in five aggregate databases — AMICUS (the national union catalog of Canada), COPAC, Libraries Australia, WorldCat.org and OCLC’s FirstSearch. She found that only a subset of fields is indexed across all five databases, and that each database offered at least one search option not offered by any of the others. She didn’t see much variation in indexing and display among different formats.
  • Chew Chiat Naun (University of Minnesota) analyzed the MARC fields represented in WorldCat records bearing different encoding levels. He concluded that encoding level has limited value for selecting the “most complete” record and that encoding levels assigned at a batchload or project level can be misleading of a record’s content.
  • Timothy J. Dickey (OCLC Research) collaborated with Peter Hirsch (The New York Public Library) to compare the use of form/genre designations and relator terms in the NYPL’s local catalog and in WorldCat.  He found that NYPL catalogers were using these MARC fields in much greater frequency than the profession at large (as reflected in WorldCat) and were much more selective in their choice of terms.
  • Timothy’s work led to documenting requirements for enhanced library data mining, which he presented during the Webinar. Search log data currently captured by library systems usually cannot provide enough information on user behavior. The working group would have benefitted if systems provided the search logs and circulation data that met these data mining requirements.

Much of the Webinar discussion and chat room focused on the future of MARC, or more specifically, how to transition to a post-MARC future.  The discussion identified the need for data models and systems — what is needed to create, capture, structure, store, search, retrieve, and display objects and metadata if we didn’t have to use MARC and if we weren’t limited by MARC-centric library systems? Commented one participant: “Labels, labels, labels for the specific building blocks (elements) so users can construct whatever they want (users include machines).”   Feel free to add your own comments here!

Tweet about this on TwitterShare on TumblrShare on LinkedInShare on FacebookBuffer this pageShare on Google+Email this to someone

About Karen Smith-Yoshimura

Karen Smith-Yoshimura, program officer, works on topics related to renovating descriptive and organizing practices with a focus on large research libraries and area studies requirements.