How many WorldCat MARC tags are REALLY used?

Answer: Not many. And there are a lot of MARC tags that are rarely used. I recently analyzed the occurrence frequency of MARC tags in a December 2007 WorldCat snapshot prepared by our Office of Research colleagues. At that time WorldCat comprised 96,174,586 records and 1,210,107,485 holdings. The results don’t differ substantially from those Bill Moen presented from his extensive research on MARC designations at the 2006 RLG Members Forum: More, Better, Faster, Cheaper, even though WorldCat comprised just 56 million records at the time he had done his research.

Of the total 226 MARC tags represented in WorldCat.

  • Only 27 tags occur in 10% or more of WorldCat records.
  • 52 tags occur in 1% – 9% of WorldCat records.
  • 147 tags occur in less than 1% of all WorldCat records.

Or 65% of all tags in WorldCat occur in less than 1% of all records. Bill Moen reported that of 167 unique fields identified in his research 66% occurred in less than 1% of all records. The record numbers change, but not the percentages, much.

The distribution of MARC tags in WorldCat looks like this.

I also looked at the tags weighted by the 1.2 billion WorldCat holdings attached to the records where these MARC fields appear, representing the number of items represented. The results are similar:

  • 35 tags occur in 10% or more of WorldCat items.
  • 48 tags occur in 1% – 9% of WorldCat items.
  • 143 tags occur in less than 1% of WorldCat items.

Some tags are used more often by specific communities. For example, non-Latin script records are more likely to use uncontrolled subject terms (653 field, used in 18% of non-Latin script records) compared to the rest of WorldCat (4.27%). Vendor-supplied ordering data (in the OCLC-specific 938 field) occurs in more than half of all WorldCat items, although it is present in only 6% of all WorldCat records. Although form/genre terms in the 655 field occur in only 4.15% of all WorldCat records, it occurs in more than half of mixed material records (53.15%), 26.53% of visual material records, and 15.77% of integrated resource records.

Still, 40 tags occur in fewer than 1,000 records in a WorldCat database of over 96,000,000 records. Tags we can forget about since no one is using them anyway?

8 Comments on “How many WorldCat MARC tags are REALLY used?”

  1. Bill, I agree completely that this discussion is extremely important and especially so across communities. I greatly appreciate the work you and Shawne have done.

    My initial gripe with this post was the fact that it seemingly ignored all that context. I realize that wasn’t the intention, but I am seeing an extremely large amount of “conversation” lately that takes place at a highly superficial level and that is worrisome, to say the least.

    That, and the points you make, is all that I was trying to point out.

  2. Karen: Thanks for sharing the results and I’m pleased to see the similarity in the analysis you did with our results.

    Mark: At the 2006 RLG Forum, I used one example of a field/subfield that occurred just one in 7.5 million LC-created Books, Pamphlets, and Printed Sheets records. It was the 656 $a. Interestingly, one of the folks in the audience pointed out that in their community (I think it was archives), that field is more frequently used than what was represented in our data. Two take home lessons for me from that…

    1. Even though we worked with 56 million WorldCat records, there are other collections of bib records that might show different patterns of MARC use by catalogers. So, while large, the WorldCat records may be a representative sample of all MARC bib records in the world.

    2. Different communities served by MARC may find certain fields/subfield critical and used frequently.

    One of the hopes I had for the research project that Dr. Shawne Miksa and I conducted was to provide results that could engage the various user communities of MARC with a baseline for discussion. While we did develop reports on core elements based on frequency data and comparison with existing guidelines (e.g., Bibco), it was not our intent to make any prescription about what fields/subfields should be retained or removed. We do believe, though, that the empirical data can be used in discussions about changes to MARC, or maybe more to the point, the types of elements to move forward into MODS or other future bib record metadata schemes.

  3. Yes, I agree Karen and I hope I didn’t sound harsh because I didn’t mean to. I simply meant as you said that some of those will be important in context but many will not be.

  4. Mark: There’s lots and lots of data behind the tag frequency summary. You are correct that some field tags are far more important in certain resource types – the 655 form/genre field for mixed materials mentioned in my blog post is just one example. More details to come.

    Nevertheless, there are MARC tags that are RARELY used. I was impressed that WorldCat had grown by over 70% since Bill Moen had done his research and yet we both came to the same result: Two-thirds of MARC tags occur in less than 1% of all records.

  5. @Karen, i was disappointed to click the link to see that “The distribution of MARC tags in WorldCat looks like” a pie chart and not your color coded analysis. Surely you could share a screen shot!

    @Mark, i can understand your comment about diversity and the deeper analysis needed. It will be interesting to see which fields are infrequent but relevant because they’re cataloging a very specialized items and which fields are infrequent because it’s just too hard, too time consuming, to create that metadata. I often wish there was more specialized and richer metadata to build interfaces around. It’s a bit of a chicken an egg issue, i expect. One needs a demonstration of the value of the metadata to be motivated to create it.

  6. Based on the information presented one is entirely unable to offer any sort of intelligent answer to the final question.

    Which fields exactly? In which records exactly? Are they critical for that record? for that resource type, etc.? All of those questions need to be addressed before one can offer any sort of considered answer to the question.

    Otherwise, perhaps we can start culling the redheaded population as a start since they are such a small minority. Me? I would fight to the death to keep them because I consider it important to have that diversity [not a redhead personally]. Perhaps the diversity of those MARC tags is needed also. We are certainly unable to begin to discern that, though, from anything presented in this blog post or the image accompanying it. And I am aware that I can get that data thanks to Moen’s work.

Comments are closed.