Yes, for most people the Grail is Google. For many of us in the library world (even for some library users) it’s WorldCat. But there are apparently still numerous quests for one way to search everything that’s available at an institution.
As discovery continues to rise to the network level, there are still some valid reasons to think locally:
• The undergrad wanting to know what she can get her hands on now, but doesn’t want to have to think of all the various places on campus she might look (not to mention the systems she’d have to navigate)
• The museum curator hoping to use other objects or resources in other departments to augment the exhibit he’s planning
• The fundraiser needing to showcase the breadth and depth of the collections, highlighting whatever topic is of interest to the potential donor
• A faculty member wanting to know what resources the university has that she might use for a new course
• The special collections curator courting a donor of a collection, who wants to highlight other like materials across the institution and to show how easily they are accessed for research
• An institution wishing to showcase and provide access to its collections on the institutional web site.
In eight visits I’ve made to RLG partner institutions in the last six months, six of them said that this quest was a top priority for them (and perhaps when the other two finish their big building projects, the quest will find its way to the top of their priorities, too). Some call it metasearch, others call it federated searching, others plan to use OAI harvesting to create a single index of all their collections, and still others think about putting all the metadata into a single system. Most of them had tried something already and were unhappy with it and were determined to try something else. No matter which approach they had tried, in each case the perceived problem involved mapping the data. It wasn’t done correctly and next time they hoped it would be better.
If only the mapping were better, the functionality would be improved. If only the mapping were better, the results could be presented in a more meaningful manner. If only the mapping were better, disparate data would coexist more easily together. Some thought the poor mapping was a characteristic of their chosen approach and if they went another route (or even the same route with different software) the problem could be addressed. The community is playing musical chairs with federated search software. When the music stops, one will try it again with SingleSearch, one will try it again with MetaLib, one will try it again with MetaFind, one with WebFeat, one with LibraryFind …, and the one left out will try OAI harvesting. All hoping to make the mapping work this time.
Thinking about this on the plane after the last of these visits, it became clear to me that it’s not about the mapping; that’s a red herring. Stop trying to do it better; it can’t be done. Sure if you’re mapping two collections of books both using MARC and AACR2, you can do a pretty good job (and some of the visited institutions had). But these institutions wanted to allow people to search across books, special collections, archives and museum collections, digital image collections, faculty and departmental collections, and in one case, even course offerings and faculty bios.
The network can’t provide the solution; much of the content is not likely to find its way into WorldCat or the open web, due to rights issues (think slide libraries) or other content ownership issues (think museum images with revenue potential) and some of it is really local (think teaching materials or licensed images).
The simple fact is that, to offer fielded searching of disparate data, the data has to be mapped and the lowest common denominator prevails. As anyone who’s done any mapping knows, not all metadata is created equal. Just coming up with a lowest common denominator is impossible. Images of geological formations may not have a creator, paintings don’t necessarily have a subject, ancient artifacts may not have titles, dates are often unknown… Forcing all records to map to a set of required fields and then offering parametric searching on those fields guarantees that a lot of relevant content will be omitted from the result set.
(At RLG, we found this to be the case even when we had a single standard (EAD) in a single union catalog (ArchiveGrid), because the standard had been applied differently by the various contributors and the collections being described varied in their nature and hence their description. While EAD allows for tagging of personal names, geographic locations, and controlled subject headings, the tags had been used so inconsistently that to offer indexes based on those fields would have resulted in vastly underreporting relevant results.)
So what is the right approach, you might ask. I think we need to shift our focus from mapping and start looking at other ways to approach the problem.
If we offer keyword searching of all the data in the records, we’ll get a big set that will likely include some irrelevant items. Lots of recall, not much precision. What would Google do? Improve the result set.
We can do that in many ways:
• We could tweak the relevance ranking algorithms. While we might not use the fields for searching, we can still use them for display – and for determining relevance. We could decide that if the search term was found in creator, title, or subject fields to rank that result higher.
• We could track previous use and put the most viewed records at the top.
• We could improve the data by using automated processes like Open Calais to identify personal names and geographic locations, to deduce subjects by text analysis, to normalize dates …
o Then we could improve the user experience by using those elements in a meaningful display of the results, sorting and quantifying the results by various elements would allow the user to have a better sense of the nature of the result set
o And we could offer the user ways to manipulate the results by those facets, much as WorldCat.org and IndexData’s MasterKey do.
• We could investigate ways to pre-limit by offering ways to search just a slice of the whole (anthropological content, things that have been digitized, non-book materials…).
• We might seek APIs to other services like name authorities or subject thesauri to improve or expand the query.
• We might look for ways to tap into things like LibraryThing, WorldCat, or flickr to use network effects to enrich our results.
If mapping is the roadblock on the route to Single, Simple, Successful Search, let’s choose a different route and get on with the quest.
And once we’re good at making our wonderful resources accessible within our own institutions, won’t we be in a better position to make them accessible to the world?
Ricky Erway, Senior Program Officer at OCLC Research, worked with staff from the OCLC Research Library Partnership on projects ranging from managing born digital archives to research data curation. Ricky left OCLC in 2015.