In a previous post, I’ve shared some background about the data analysis phase of our Museum Data Exchange Mellon grant, and posted some of the questions our museum participants wanted to have answered. In the meantime, we have created a spreadsheet [pdf] which captures our ideas to date of what questions we may want to ask of the 850K CDWA Lite XML records from 9 museums. Note that the methodology captured by this spreadsheet lays out a landscape of possibilities – it is not a definitive checklist of all the questions we will answer as part of this project. Only as we get deeper into the analysis will we know which questions are actually tractable with the tools we have at hand. I’d appreciate any thoughts on additional lines of inquiry we could pursue with our analysis, or other observations!
Archive for April, 2009
Susan Hamson (Columbia) came out with this zinger. We were talking about public services and delivery for archives, special collections and rare books. I think the topic that day was ILL of special collections, a real hot potato.
How can we get more people’s fingers on the pages and in the boxes? Not just in reading rooms, but on the web? Over the past three months, the RLG Steering Committee for Special Collections Delivery tackled questions at the public services end of the lifecycle of unique material. Dennis and I listened in as Cristina Favretto (Miami), Mattie Taormina (Stanford) and Susan sifted through creative ambitions to “do it better.” The committee asked, “What is the collective mind? What to stop doing? Who has the most innovative practices?”
Mattie, Cristina and Susan each asked their administration what changes management could support. They settled on four projects for starters: sharing (really sharing) special collections, balancing copyright management and risk, tapping the expertise of users, and best practices for scan-on-demand and photography. If you want to participate in one of these projects, put your hand up.
I felt a bit as if I was watching Wall-E sorting through the detritus of past cultures, considering each piece thoughtfully and then picking up projects that could change the world, system-wide, for real. In every case, at least one or two of the trio had good reasons not to tackle the topic at their own institution, but agreed the project would have an impact. In a Friday afternoon email volley, Susan wrote:
“Are we representing the interests of our institutions or do we move forward representing the interests of the profession and the patron?Â ILL is tricky, permission fees are too–but what are we doing if not pushing the boundaries to engage a debate and discussion?Â We’re not establishing policy for our institutions, but we are professionals engaged in the work of exploration and, maybe, change.Â If not where we sit but some place else.Â We’re not proposing that our institutions throw caution to the wind and abandon all that it good and holy–we’re just pausing to think about something new.Â Putting it out there doesn’t make it so (well…).
“Now I’m not comfy with the ILL thing, but I still want to put it out there.Â We’re archivists, dammit!Â Â We have super powers (my bone folder is the source of all my super powers).”
Recently, OCLC launched an experiment in making it easier for members to update and correct WorldCat records. Dubbed the Expert Community Experiment, the goal is to engage the community in improving overall database quality. Specifically, members with full-level cataloging authorizations have the ability to improve and upgrade WorldCat master records during the experiment. It began in February and will last six months.
In March, there were 18,910 Expert Community Experiment replaces.Â There were 1001 institutions that did at least one replace.Â Individual institution numbers ranged from 3 institutions doing more than 500 replaces to 242 institutions doing 1 replace each. Other figures:
Database Enrichment: 18,235
Minimal-Level Upgrade: 14,791
Enhance Regular: 15,052
Enhance National: 3,583
CONSER Authentication: 1,929
CONSER Maintenance: 6,183
To put this into perspective, during the same period OCLC staff replaced 1,086,715 records. This isn’t to say that we couldn’t see substantial improvements in database quality under a less strict editing regime, only that you likely didn’t know just how hard we work to improve the WorldCat database. I sure didn’t, and I work here.
A little over a year ago, I inherited a project that didnâ€™t have much more than a name: â€śExplore and understand the place of large digital text aggregations in scholarship and research.â€ť
I had several discussions with my colleagues about what this project might turn out to be. We had several ideas:
Â– Create a shared understanding of the expectations that researchers and students bring to their interactions with large-scale text aggregations on the web and the requirements for making these collections fit for scholarly use.
Â– Convene an invitational meeting of those already engaged in large-scale digitization efforts to establish a common understanding of scholarly use-cases and the core requirements for library-sourced research services.
Â– Identify service capabilities (bookmarking, annotation, citation management, etc) that are required to support scholarly use of text aggregations.
Â– Assemble a text archive for prototyping and analysis.
Â– Investigate needs of scholars (via focus groups?)
Â– Experiment with the metadata we get from OCLCâ€™s e-Content Synchronization service to see how we can characterize the contents of book aggregations
Â– Experiment with full text functionality we might be able to offer a) on a specific aggregation b) across aggregations
What we were exploring went beyond finding and using a single document. It was about identifying works from many silos to incorporate into a local environment. And it was about performing actions against an index (or multiple indexes) of aggregated digitized works. We could investigate how scholars would work with the range of book text archives, starting with use case scenarios of the types of queries (e.g., in areas such as linguistic analysis, lexical frequency, translation studies, edition comparisons, things like occurrence of geographic place names in fiction, and coincidence of events – like being able to explore how a race riot affected neighborhood population dynamics).
Read the rest of this entry »
After nearly eight months at OCLC Research, Iâ€™m finally doing my first blog post. Why am I so intimidated at the prospect? Finally, Merrilee can stop pestering me to get off the dime.
Some of you may be aware that OCLC established a WorldCat Local Special Collections Task Force last summer. This happened quickly after Matt Goldner, Executive Director of End-User Services, became aware from the special collections community that Local is missing lots of information that we need for both display and indexing. The group of experts that got together for this task worked industriously throughout the fall and submitted a detailed report to OCLC in December.OCLC has now sent its response back to the Task Force Both reports are linked from the RBMS Bibliographic Standards Committee website.Â
As all good things in life, this took a little longer to see the light of day than I had thought it would, which means I am doubly delighted to announce: we have now officially released the suite of tools generated through the Mellon-funded Museum Data Exchange project. Youâ€™ll find a lot of informative detail in this announcement. Hereâ€™s what it all boils down to: Museums now have access to COBOAT and OAICatMuseum 1.0 software.
COBOAT is a metadata publishing tool developed by Cognitive Applications Inc. (Cogapp) that transfers information between databases (such as collections management systems) and different formats. As configured for this project, COBOAT allows museums to extract standards-based records in the Categories for the Descriptions of Works of Art (CDWA) Lite XML data format out of Gallery Systems TMS, a leading collection management system. Configuration files allow COBOAT to be adjusted for extraction from different vendor-based or homegrown database systems, or locally divergent implementations of the same collections management system. COBOAT software is now available on the OCLC Web site under a fee-free license for the purposes of publishing a CDWA Lite repository of collections information at www.oclc.org/research/software/coboat/default.htm.
OAICatMuseum 1.0 is an Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) data content provider supporting CDWA Lite XML. It allows museums to share the data extracted with COBOAT using OAI-PMH. OAICatMuseum was developed by OCLC Research and is available under an open source license online at www.oclc.org/research/software/oai/oaicatmuseum.htm.
Try out the enhanced and expanded Virtual International Authority File at viaf.org.. It now contains 7.8 million records built from 9.2 million source authority records from the Library of Congress, the BibliothĂ¨que nationale de France, the Deutsche Nationalbibliothek, and the National Library of Sweden. More files will be added. Thom discusses the recent changes to VIAF in his Outgoing blog:
The VIAF site has recently had a major overhaul.Â What you now search are records created from a merge of matching source authority records.Â Within this record you can see what source records were used to create it, along with cross references and other information gleaned both from the authority records and from associated bibliographic records.
We all have our favorite â€śauthority control poster childrenâ€ť, as Lorcan calls them.Â The example he blogged about is Flann Oâ€™Brien.Â One of my favorites is Chiang Kai-shek â€“ that is the preferred form in the LC and National Library of Sweden authority files, but itâ€™s listed on top with Jiang, Jieshi, the preferred form in the BibliothĂ¨que nationale de Franc and the Deutsche Nationalbibliothek authority files. It illustrates a difference in perspective: Jiang Jieshi is the Pinyin romanization of the Mandarin pronunciation of the characters in Chiang Kai-shekâ€™s name. One of the beauties of VIAF is that it aggregates the preferred forms used in different sources without itself preferring one form over another.
Click twice to see the full sized images.
And it lists all the alternate forms each of those sources includes, a very long list that also includes several forms in Chinese characters:
JISC and Oxford University Library Services hosted a meeting in Oxford on Thursday to mark the conclusion of JISCâ€™s Libraries of the Future Campaign. The event took place in the afternoon, with a series of challengingly short presentations from a set of well-known librarians and commentators: Sarah Thomas, Bodleyâ€™s Librarian at Oxford; Chris Batt â€“ former Chief Executive of the Museums, Libraries & Archives Council; Santiago de la Mora, of Google Books (Europe); Peter Murray-Rust, Reader in Molecular Informatics at Cambridge; and Robert Darnton, Harvard University Librarian. The event was amplified – in the way JISC is now becoming very good at â€“ with bloggers and microbloggers in place, a Twitter-feed beamed onto one of the three projection screens; a Second Life version of the event (featuring the usual Star Wars cast of avatars) displayed on another; and a live video-stream of the event available from JISCâ€™s website. In fact, it was possibly overamplified, as I realised at one point when I checked the website to see not only that a particular colleague was obviously present in the audience somewhere behind me, but that his laptop screen was also in view (fortunately displaying only innocent windows, as far as I could see). Next moment, a tweet appeared on the Twitter-feed screen exclaiming about the fact that peopleâ€™s screens were visible. While there are some obvious benefits to amplification, the splitting of attention that it engenders can sometimes seem to defeat the point of having a conference, which is an opportunity for concentration. I expect we will move towards a happy medium in due course, and JISC is leading the way in helping us find it. Read the rest of this entry »