Archive for June, 2008

Terminologies, implemented

Monday, June 30th, 2008 by Merrilee

Way back when (in November), I wrote about the report from the terminologies services meeting we held in September. The report gives a ranked list of the highest priorities use cases for a terminologies service. The top vote-getter was “Leveraging terminology for search optimization,” and the elves in Dublin, Ohio (namely, Diane Vizine-Goetz and her team) got right to work. The result is the very nice Terminologies Service.

The service currently supports the following vocabularies: Faceted Application of Subject Terminology (FAST subject headings), Form and genre headings for fiction and drama, Library of Congress AC Subject Headings, Library of Congress Subject Headings, Medical Subject Headings (MeSH®), and the Thesaurus for graphic materials (TGM I & II). Other vocabularies are under consideration.

The service can be queried using SRU, and the retrieved terms/headings are returned in HTML, MARC XML, SKOS, or Zthes. I am really out of my depth here, so I urge you to look at the webpage that Diane and her team have set up, and to read the documentation. Also note that the subtitle of the page is Experimental Services for Controlled Vocabularies.

Something everyone can look at is an example of the experimental services in action. Please keep in mind what you are looking at is not a production service. This is subject to the “usual” disclaimers (may go down without notice, has not been prettied up for end users, etc.) But it will give you some idea of how the service could be used. In this case, it expands a users search term, targeting TGM I. I like the terms “grapes” and “beer.” Thanks to our colleagues at Indiana University (particularly Mike Durbin and Jenn Riley) for sharing.

http://fedora-dev.dlib.indiana.edu:8080/search/index.jsp

I am particularly interested in hearing from you. How would you be interested in using web services for terminologies?

RBMS: day three

Saturday, June 28th, 2008 by Merrilee

We had a half day of plenary sessions yesterday, starting with a pairing of a scholar who relies on digital sources to help his work (Matthew Fisher, UCLA) and a digital library professional (Steven Davison, UCLA). Matthew’s talk was a great reminder of all that library cataloging is (and is not) doing for a medieval scholar, and all the digital library projects are (and are not) doing for a medieval scholar. Digital resources have fundamentally changed teaching and learning, he said. However, saying “wow, you can look at this manuscript from Oxford and this one from Paris side by side without going anywhere” is no longer very compelling. Young scholars like himself take this place-shifting for granted. Matthew needs materials to mashup for his work — catalog records, images, transcriptions. Steven’s talk made a useful distinction between digitization (created piles of content) and digital projects (services for piles of content). Making the shift in digital library projects and programs to a more Web 2.0-like approach is difficult, but at UCLA, they are working on a number of projects that have Web 2.0 characteristics.

Peter Kaufman (Intelligent Television) gave the penultimate talk, reminding us that the truth is out there (online) and that special collections have a responsibility to help feed an inexhaustible appetite for content. This is in keeping with a long tradition of libraries serving as content producers. Special collections content needs to be visible “downtown” (a reference to Karen Calhoun’s reconfigured Tokyo subway map, showing major content providers at the center with libraries nowhere in sight). Peter referenced the importance and emergence of ubiquitous, mobile computing as well as a trend to more and more video consumption. Film editors are your friend. They want your content and your help.

Jackie Dooley gave a very nice wrap up with her 10 Commandments (delivered in front of the obligatory image of Charlton Heston). I’ll attempt to summarize these;

  1. Embrace the continuum of the book (which includes not only digital, but also other formats). Our missions of access are well served by each step in technology which make transmission of information easier. Beauty may be sacrificed at each step, but digital has a beauty of it’s own (okay, I added that last bit myself).
  2. Find yourself. Revisit what you love about your work, and translate this to the digital. We are the permanence people, remember this as you move forward.
  3. Digitize with abandon. Don’t be limited by the edge cases. Find the things that are easy to digitized and go for it.
  4. Educate yourself about the born digital (and know what’s going on in institutional repositories, if you are on a campus).
  5. Make your work economically sustainable.
  6. Follow the archivists’ lead. Greene-Meissner has provided a good path for archivists. Perhaps there is an analog for the rare book crowd?
  7. Be promiscuous (with your metadata and content, that is). Read That Woman’s Report. Broadly expose metadata and collections.
  8. Collaborate. In the digital realm, you cannot go it alone. Work across your institution, with IT people. If you are in a campus environment, be sure the library is involved in any digital humanities center.
  9. Revere the knowledge and opinions of the young. Young people have great ideas and insights into new ways of doing things. Encourage them.
  10. Proactively define our collective future. It’s our time. Let’s make it happen. We’ve started a conversation, let’s keep it going.

The RBMS preconference was terrific this year. It made a lot of people very excited, but also made a number of people deeply uncomfortable. Some of us were talking at dinner last night, and I think phrases like “get on with it,” and “get over it” were taken quite personally, and perhaps in some of the wrong contexts. Digitization is not appropriate for all collections. Not everyone has backlogs that they need to worry about. Creating a brief record and making that available in the short term does not mean never adding richer data later. Those of us who are excited about digital futures have an obligation to be more concrete in terms of providing examples for some of the very good potential we see in exploiting the power of excellent library data and methods and models for moving forward that take theory into practice.

RBMS: day two

Friday, June 27th, 2008 by Merrilee

Susan Allen opened the second day by picking up Gary Strong’s (by now) much repeated “it’s our time.” It’s our time, yes, said Susan, but we must be vigilant. Sentiment is not enough for asserting that books and special collections are important. Susan has started a list of reasons and has 17! Send her more (I’ll see if she will share them, and if so, I’ll post them here or point you to them).

Karen Calhoun, my colleague at OCLC gave the first of two presentations on access. Some take aways include: library search environments invite defection; our materials need to be highly visible, and available for remixing; discovery happens elsewhere, so we need to make delivery our key area of focus; preserve your right to remix (citing Good Terms again). Karen made numerous references to the Greene-Meissner report, More Product, Less Process, likening it to the Calhoun Report for archives. Karen said, “I was referred to as ‘that woman.’ Do you refer to Mark Greene and Dennis Meissner as ‘those men’?” (For those who are interested in following more of Karen’s ideas, she has started a new blog called Metalogue).

Karen’s presentation was followed by one from Tom Scheinfeldt (Center for History and New Media at George Mason University) on Omeka, open source software for creating exhibits. He described it as a low cost, easy to use system to present and expose data and allow for interaction. Tom provided some interesting ideas: the internet opens us (and our collections) to audiences we don’t know very much about; “users” is a lazy term, because it doesn’t help us think about a variety of uses (he prefers consumers, which I have to admit I don’t like much).

I was unable to attend the seminar on blogging that I was interested in (the room was quickly filled with 100 people who also were interested), so I sat out on the Getty terraces with other blogging refugees and compared notes (gossiped). I later attended a discussion group sponsored by the Digitization of Special Collections Task Force (I am a member) which focussed on how “mass digitization” efforts are impacting special collections.

The final plenary for the day was on selection. Rich Szary (University of North Carolina, Chapel Hill) spoke quite eloquently about needing to diversify: building highly curated, selection intensive resources is one model — he’d like to find ways to build much larger pools of content and outlined some of the ways that both UNC and the Archives of American Art at the Smithsonian are doing just that. Rich cited many of the recommendations from Shifting Gears. Rich also cautioned that digitization is but one of many priority functions we need to pursue: acquisition, description, preservation. In thinking digital, we are largely forgetting about media materials and born digital.

I gave the second presentation in this session, and have no perspective on it right now, so I hope someone else shares notes, or I’ll type it up another time (my presentation was based on Powerpoint slides from Barbara Taranto, NYPL Labs).

Today we’re back at the Luxe hotel for two more plenary sessions and the wrap up. Then on to Anaheim.

RBMS: day one

Thursday, June 26th, 2008 by Merrilee

The first day of the Rare Books and Manuscripts Section preconference was actually Tuesday, but since most of that day involved getting myself to the conference site (including sitting in stopped-dead LA traffic for about 20 minutes while an accident was cleared) and attending a reception, there was not much to report. Aside from that fact that RBMS is, as always, a complete love fest.

The conference opened yesterday with a welcome by Gary Strong (UCLA) and a keynote by Alice Prochaska (Yale University). Gary was quite eloquent in urging special collections librarians and archivists — “this is our time.” This phrase has been used quite a bit since then, and there’s a real feeling that we (special collections) are at a turning point in terms of gaining recognition and centrality.

The opening session was followed by a plenary on copyright. Maureen Whalen from the Getty spoke about issues in permissions and licensing. Although there was nothing new for me, it was a great talk, and rather more encouraging in tone than some of the talks given on these subjects. Peter Hirtle followed and gave not the usual “Peter Hirtle talk” — instead, he talked about Google Books has sharped our thinking on copyright and licensing. One of his points was, if you look closely, library practices around making materials freely available are just as open to criticism as Google’s practices. Encouraging the creation of proprietary collections is expensive. We are frequently paying for services we don’t want or need (elaborate indexing, when free text searching is what’s most used and useful). Think instead about making materials available in a way that they are open to the most downstream use. This last bit is something we talk about in Programs quite a bit — it’s not just about discovery, it’s also about enabling use. I was pleased that both Peter and Maureen referenced Good Terms. It’s great that our work is getting out there and getting noticed.

The afternoon was given over to seminars. Just quickly, I attended a fabulous presentation by Lisa Berglund (professor of English at Buffalo State College) on teaching rare books to undergraduates. Her students are frequently working, not particularly ambitious (“I’m not interested in learning, I’m interested in getting an education”), and the collections at Buffalo State are not exactly a treasure trove of resources. She’s making it happen, nonetheless. The other seminar presentation of note was from Mattie Taormina from Stanford Special Collections. Mattie has implemented a “beta” digital camera in the reading room policy. I was particularly gratified to see this presentation because there has been a huge amount of talk about whether or not to allow digital cameras in the reading room, but little action. Mattie confirmed my suspicions: her study found that discussions about digital camera use by patrons has been going on for 8 years. Enough! Erika Dowell (Indiana University) introduced this session by saying she’d been inspired by the Digitization Matters symposium we held last year. Again, we’re making a difference and that’s great.

This is in haste, I’m off to the Getty Center for day two.

Annual meeting materials now online

Tuesday, June 24th, 2008 by Merrilee

All the Powerpoints, summary notes, and MP3 files from our annual meeting are available now.

This includes both the general meeting (Day One and Two), as well as the Digitization and the Humanities Symposium.

Jennifer Schaffner and I have written a report about the symposium, which includes our general observations about where we think the community could profitably invest further effort. That report will be available shortly.

Google Books — what’s the likely impact?

Monday, June 23rd, 2008 by Merrilee

A recent article in Library Journal (Google Books vs. BISON) has had a number of us talking, and has reminded us of a presentation the Constance and I gave at CNI last year. I’ve had intentions of turning the presentation into a proper paper, because the information and our reasoning is, I think, quite pertinent to the question: what’s the likely impact of mass digitization on library collections? The answer is, not what you might expect (and not what’s outlined in the LJ BISON article).

One of the assumptions put forth in the BISON article is that increased discoverability (via Google books) will result in increased accessibility. Searchers will find texts via Google Books, and Google Books will likewise serve up the books in digital form. Right? Wrong.

In our CNI presentation, we looked at WorldCat for some measures of just how much material might be classed as out of copyright and hence available for full-text presentation in Google Books (or in any other system). This represents a very small fraction of library-owned content. We also considered how scholars are actually interacting or are potentially interacting with those older texts (for the most part, materials published before 1923, and some portion of materials published to 1962 in the United States). Without delving into the specifics of the presentation (and what we hope to cover in our article), public domain full-text is inadequate to support current scholarly practice. The inadequacy of Google Books for supply of texts is further compromised by Google’s well-known conservative stance on what qualifies as public domain. So while these books (and articles, if you expand the view to include Google Scholar) are much more highly discoverable, the content is not available online without authentication (in the case of journals) which would be provided by the library, or without purchase.

We find, in many of our discussions with Programs Partners, that there is a real yearning for a repeat of the “Anatomy of Aggregate Collections” study (otherwise known as the Google 5 study). It’s thought that “Google 5 plus me” or “Google 5 plus me and my friends” will allow institutions to get a better handle on the total volume of material scanned, which then will enable institutions to manage their print holdings differently. We think this is not where the answer lies. We will only be able to make use of this information when we can disclose something about the availability and preservation status of those digitized texts.

The BISON article says,

If Google Books is scanning old materials and also getting new content from publishers, this leaves relatively little for small to medium-sized academic libraries to contribute…. [W]hat will happen to the library’s role in preservation, cataloging, and circulation? Will Google and Google Books lead to the extinction of academic research collections as we know them?

What this misses, I think, is the main point. Without the library, without “the stuff,” there is no delivery chain. Books and e-resources, while indexed by Google Scholar and Google Books, are held by libraries. Because of copyright and licensing agreements, Google cannot deliver this material. The fact is, these monographs are discoverable, but not available, online. As long as this continues to be the case, this much increased discoverability without equal accessibility will put greater pressure on delivery of print holdings for some time to come.

If you are dying to look at our presentation, I’ve loaded it into SlideShare.

Oh, and PS. If any of you are still expecting users to use Boolean operators, take a page out of the BISON study and cut it out right now. I say this with all the love in the world, knowing that you actually want your users to be successful when searching your catalog.

A new word cloud

Friday, June 20th, 2008 by Merrilee

Our Work Agenda

It’s pretty — go ahead and click on the image to see it bigger. From Wordle

June: a busy month

Tuesday, June 17th, 2008 by Merrilee

Some HangingTogether readers may wonder if we are all off on summer vacation. Hardly!

June has been a busy month, starting with our Annual Meeting on June 2 and 3. On June 4th, we held a symposium on the impact of digitization on humanities scholarship. We’re still dotting our i’s and crossing our t’s, and the session recordings and other materials should be available on our website very soon. If you attended the Annual Meeting or the symposium, you should have received a link to a survey. The survey will close on Friday, so please share your thoughts with us so that we can improve future events.

Annual Meeting wrap up and followup has segued right into planning for the American Library Association annual meeting, which will be held in southern California. Jennifer Schaffner and I will both be attending the Rare Books and Manuscripts Preconference (Rare and Special Bytes: Special Collections in the Digital Era) in Los Angeles from the 24th to the 27th, and then we’ll head right to Anaheim for the main event. If you will be at ALA, look for RLG Programs staff (Constance, Bruce, Dennis, Jennifer, Karen, Roy, me) at any number of events.

I hope to see some of you in Los Angeles or Anaheim. And I look forward to more regular blogging soon.

Output from the RLG Metadata Tools Forum

Tuesday, June 17th, 2008 by Karen

I previously blogged about how the May 8, 2008 RLG Programs Metadata Tools Forum all came together, with some photos. We had already added the “summary sheets” created by each of the nine tool developers who showcased their work to the RLG Programs Metadata Tools Forum Web page. Now if you go to that page you’ll also see:

  • Short videos created by David Williamson (LC) demonstrating the WebCat Assistant

To complete the feedback loop, we also conducted a survey of all who attended the forum, and about half responded. The feedback was indeed positive: 97% thought the forum was worth or well worth their time. Most of what people liked best fell into these two categories:

  • The opportunity to talk directly with the tool developers and ask questions of practitioners
  • Exposure to such a variety of tools and to see the tools in action

Attendees were also asked to rank each of the tools they saw in order of “potential utility in your own shop”: “Not useful to me”, “Potentially useful to me but I’m unable to use it or adapt it for my environment”, and “I’m interested in using it or adapting it to our environment”. It was indeed gratifying to us as forum organizers to see that every tool garnered interest in using or adapting it in the attendees’ own environments. Terry Reese’s MARCEdit garnered the highest use/adapt it interest. Roughly half of the respondents were also interested in using/adapting Brad Westbrook’s Archivists’ Toolkit, Michael Park’s MODS Editor, Wan Wong’s Subject Selector, and David Williamson’s WebCat Assistant or Raphael Villena’s UCLA adaption of the WebCat Assistant. A third or so expressed interest in using/adapting Scott Schwartz’s Archon and Mark Phillips’ Metadata Analysis Tool. Several noted interest in Jim LeBlanc’s LS Tools.

During the recent RLG Partners Annual Meeting, staff who attended the breakout session on Renovating Description Practices I facilitated identified “sharing best practices and tools for streamlining metadata creation workflows in and from diverse environments” as one of the areas for collective action. The tools forum provided an example of how we can do that.

Acronyms – fragile and high

Saturday, June 14th, 2008 by Jim

For quite awhile Lorcan has commented on the relatively heavy weight of library data and exchange standards. In general our community has opted for high value and low participation choices in this arena. This has diminished the library’s impact in the web world. The barrier of our own high acronymic density has made it more difficult for others (who are operating the platforms where people actually work) to incorporate useful library services and information.

What’s more the infrastructure built on these standards has a very fragile foundation. They are understood fully by few, supported and enhanced by an even smaller group of enthusiasts, and remain invisible to the wider community of web practitioners. Some years ago, I joined with Lorcan to highlight this issue of support, fragility, take-up and sustainability.

In the last month or two there were a few events that I thought might cause these circumstances once again to be debated in a useful way. Google’s cessation of support for OAI-PMH in sitemaps was one. The second was the launch of the OpenLibrary API which eschewed SRU and OAI-PMH. In a Code4Lib posting Eric Lease Morgan characterized the latter decision saying “In reality I think two things worked against the adoption of SRU and OAI. First my description of their functionality was not as eloquent as it could have been, and second, the Open Library personnel had never heard of nor knew anything about either protocol. This is another example of library standards being too library-centric. Think Z39.50.”
These two events stirred up discussion that disappeared surprisingly quickly. There seemed to be a kind of library-centric satisfaction that allowed people to dismiss the Google decision along the lines of ‘This wasn’t really for them – it’s for us” while the OpenLibrary decision got covered over in rhetoric about OpenLibrary re-inventing the idea of a catalog and needing to do it without regard to prior work.

This arena continues to worry me. We have such a small cadre of skilled systems engineers and designers in the library world. Dispersing their energies in multiple directions that are domain-focused not only lessens their impact but that of the library community.

I am not a systems engineer and these are not matters of technical theology for me. They should, however, be matters of genuine management concern. These choices and investments represent crucial scaffolding and infrastructure decisions that will decide the ultimate shape of the future library. Libraries are challenged to recreate their value in a web world by delivering new services around the workflow of their constituents. Management needs to understand how their own system development investments are constraining those ambitions and working against their realization.

Of course, most library managers don’t have the technical understanding to interrogate this level of system design. But they do have a responsibility to insert themselves into the decision process. They shouldn’t be intimidated by prospective embarrassment. They should be asking the same kinds of questions they ask about other service investments. A white board and three ‘Why?’ questions in a row would be a good starting place for a motivated manager.