Archive for the 'Measurement and Behaviors' Category

Discovery AND Selection = Elsewhere

Monday, July 21st, 2008 by Jim

This slide caused the most discussion and comment during my presentation at the AALL workshop about which I posted previously. I return to it here for a few reasons.

Some of these assertions have attained meme status. In particular I’ve noticed that Roy’s characterization of searching and finding (which he’s been saying since at least 2005 – I’m sure he can tell us the exact date of the coinage) and Lorcan’s dictum about discovery were listened to with some skepticism and resistance only 12 months ago. They are now treated as common knowledge and an accepted starting point for discussions of our issues. This is good for us. It focuses us on change.

The next two about getting our services and assets into the work flow of the user on the network and about needing to present users with all of our system-wide assets aren’t yet memes but they have entered the vocabulary. Lorcan’s ‘networkflow‘ coinage I find helpful and apt in getting at the essence of the way we work and the collective collection phrase (about which Constance has blogged and spoken continuously) neatly and alliteratively captures what people really want to access. I hear other people use these phrases without expecting that they need to be explained. This is progress. These two observations are really about how we should change our services and invest our energy.

The last assertion about selection is far from a meme. In fact, it may not be true. But it could tell us more than any of the others about where we can choose to disinvest and redirect resources and effort.

The formulation arose in a group discussion led by my colleague, Arnold Arcolio, while reporting on user testing and interviewing that he was leading in connection with WorldCat Local at the University of California. While he has much analysis to do and considerable discussion yet to come with UC colleagues, one of the preliminary observations emerging is that the test partcipants overwhelmingly approach the local (or group catalog) with an item already chosen. Using the catalog as a research tool – a place to refine a general interest into a small number of selected ‘best’ items that answer an immediate need – seems to happen very infrequently. In these early interviews the idea seemed quite unusual to the faculty and graduate student users of the catalog.

During our group discussion this user behavior was capsuled as “Selection takes place without us.” We were intrigued with the potential import for our processes and practices should evidence emerge showing this to be generally true. Our investments in description and classification, in the functionality of the local/group catalog and many other areas could be re-examined and recast. If selection takes place without us then our efforts could be redirected to activities valued by our users and our institutions. I’m interested in spinning out the range of impact but, of course, we need evidence to take this beyond a thought-experiment.

LC and Flickr – 3 months later

Thursday, March 27th, 2008 by Günter

We had the good fortune today to talk to Helena Zinkham, Michelle Springer and some additional staff members from the 12 people team at LC which worked on the LC-Flickr project. We were also joined by George Oates, who shepherded the collaboration from the Flickr side. The conversation highlighted a number of interesting facets of the collaboration which I hadn’t fully appreciated yet, and I thought they’d be worth sharing

  • In a very elegant way, Flickr solves the authority conundrum of exposing collections content to social process. No need to worry if some comments or tags are misleading, arbitrary or incorrect – it’s not happening on your site, but in a space where people know and expect a wide variety of contributions. On the other hand, LC selectively reaps the benefit of these contributions. Over 100 cataloging records have been changed through input from the Flickr community.
  • Identifying and siphoning off the information of use to LC is a time-consuming and laborious process. While Flickr offers a number of ways to look at user interactions with the content, LC has started building its own database, which pulls in information through the Flickr API for more convenient evaluation. Social tagging in this framework doesn’t mean letting others catalog your collections for you – it really means offering up materials for a conversation which you have to follow closely to extract the bits worth bringing back.
  • We had an interesting discussion about what I’m tempted to call the “absorbency” of Flickr. The 3k+ images LC posted in the prototype seemed a reasonably easy chunk of material for the Flickr community to process, meaning tag and discuss. (In some instances, images actually have reached their Flickr-imposed limit of 75 tags.) The group speculated that a larger upload of images would have perhaps caused a less thorough review of the photographs, and this thinking also seems to have influenced LC’s decision to keep updating their Flickr stream 50 images at a time. George commented that Flickr has made 1000 Flickr friends through the project so far, and 50 images at a time probably seem delightful to them, while 10s of thousands at a time might be overwhelming.
  • While at a pace of 50 images per week, the entire photographs of the Bain collection (50k) will take about 20 years to expose on Flickr, I think that piece of math may miss the point: from the conversations I noted a much greater interest in deep engagement with the presented material rather than in comprehensiveness. The evidence suggests that this deep engagement has been achieved – see, for example, the discussion surrounding these two photographs. Those with the desire and need to see all of Bain can always do that on the LC website – Flickr compliments this offering by turning parts of the collection into conversation-starters. LC staff seemed so impressed with the value of the interactions on Flickr that they felt linking back out to the Flickr pages from the catalog was as important as bringing back salient corrections and updates into the catalog.

    For LC, Flickr is still a prototype – commitments on a policy level will be discussed after the prototype has been thoroughly evaluated. For Flickr, working with cultural institutions seems to become a way of life. George commented that she has about eight more cultural institutions ready to be launched over the next 8 months, ranging from very large to very small. There will be new and different things to be learned from the next launches – how will the material fare without the boost the LC-Flickr project enjoyed as the goundbreaking initiative? I’m looking forward to continuing the conversation with our LC colleagues, and I’ll be watching out for those next cultural heritage collections on Flickr…

    More Consensus; Less Controversy

    Tuesday, September 4th, 2007 by Jennifer

    A graduate student in history from UIUC told an audience of archivists at SAA that, because she begins her research on the web, she wants to know all the collections we have, and what we archivists did to them. Historians have not abandoned material archives, rather they are instilling a critical analysis of primary sources. Attention in the archival profession seems to me to be shifting toward focus on use and research. “Let’s be honest about what people want.” A well-known author and champion of archival description remarked that “our users don’t care about our precious standards.”

    The annual gathering of archivists surprised me with more consensus and less controversy. EAD finding aids are traditional; many archivists experiment with much-admired minimal processing; let’s digitize as much as possible and never mind item-level description; we want EAC to share expertise; the public and scholars may be nearly one and the same.

    Both archivists and researchers value the “stuff,” and want to find out about it on the web. We and they are calling for us to provide both context and content.

    Millennials – again and again

    Saturday, July 21st, 2007 by Jim

    I have one trailing anecdote and observation from my attendance at the recent American Association of Law Libraries annual meeting based on the session Multitasking Millenials: Blessing or Curse?. The session was intended to analyze the types of conflicts that can result with multitasking law students and new associates and create specific strategies to minimize them. The first part of the session featured a professional trainer discussing the use of games and a variety of personalization in introductory legal research classes. The second part had a psychologist explaining the characteristics of multi-tasking and the generally accepted psychological and personality features that distinguish this generation.

    Towards the end of the Q&A a young woman took the microphone and explained that she was, in fact, of the millennial generation and now a practicing lawyer. She had been one of the individuals consulted by the trainer about preferred tactics to enhance the training sessions. She stood to announce that as noted in the psychologist’s presentation she exhibited respect for experience and authority figures. In the case of the training consultation that had trumped honesty.

    “What we won’t tell you when you suggest these kinds of tactics is that they’re lame. We think they’re stupid and demeaning. Just treat us like professionals and we’ll learn in our own way.”

    Someone else stood to decry the rudeness of multitasking in the middle of the legal research classes that she led. “These students can sit through 26 viewings of Star Wars. Why can’t they muster the attention to sit through 30 minutes of this class?”

    This was greeted with lots of sympathy and nods of recognition but not much discussion. It occurred to me that the reason this generation feels free to direct their attention in ways that are independent of what is happening before them has to do with their expectation that all experience is replicable and can be repeated at the time of their choosing. They are the first generation for whom that expectation is reasonably and universally true. They’ll watch your class later if they’d rather invest their current attention in an email or video or the 26th viewing of Star Wars.

    You will find much more informed, and targeted insights on the information-seeking behaviors of this Millennial generation in the work of my Research colleague, Lynne Silipigni Connaway. Or see this post by Lorcan on student use of the network..

    A wonderful specimen of a user!

    Friday, March 9th, 2007 by Günter

    We’re always on the look-out for this mystical creature called “the user,” and I am exited to report a public sighting: for a good hour during the Bibliographic Control Working Group public meeting, we had a superb specimen right in front of us. Dr. Timothy Burke, an Associate Professor in the Department of History at Swarthmore College, spoke with detail and nuance about his information retrieval (read: search) behavior as a researcher, and provided a plethora of specific scenarios for different searching strategies. My only regret is that I didn’t walk up to him afterwards and thank him in person for his candor, wit and eloquence. I would have written this up in detail, but found that Karen Coyle already beat me to the punch. Thanks, Timothy (and Karen)!

    Let’s do the numbers…

    Thursday, November 30th, 2006 by Günter


    DISCLAIMER: Only trust a statistic you’ve faked yourself! Well, I didn’t fake it, but I did make a mistake in the list of domains – the data I had originally pulled and posted on 11-28 did not reflect a 3 months period, as I had claimed, but a 15 months period. Turns out that the stats for longer periods do look more meaningful for tracing trends, so I’ve changed the language below to reflect the correct time period covered (from hangintogether’s inception August 1 2005 through November 15 2006.) I also took the opportunity to cast a broader net and look at our top 5 visiting countries in addition to .org, .edu and .gov. That, in turn, prompted me to re-write some of the commentary on the domain session stats. All other data does reflect a 3 month time period as stated, and is (and has been) correct. All due apologies!

    What looks like ants crawling up a smear of blue jelly are actually session stats for hangingtogether from its inception on August 1st 2005 through November 15th 2006, and represents one way of trying to evaluate the success of our blog over time. The recent blogging panel at MCN made me curious about what audience we’re actually reaching. Pouring over our webstats using the ever-helpful Urchin, I realized that the numbers wouldn’t really mean much to me unless I had stats from other blogs to compare them to. Well, actually they meant even less until Walt explained to me the basics of how to read some of these stats – thanks, Walt!

    I thought this would be a fun thing to do via a meme, and I hope I’m not the only one. I am tagging Walt (no good deed shall go unpunished!), Lorcan and Richard Urban (on behalf of musematic.net) to disclose some of their stats for the 3 months from August 15th 2006 through November 15th 2006, and I hope they’ll make more bloggers tip their hands by tagging others. Let’s start with sessions, IP addresses, number of countries. As a bonus, feel free to throw in a list of the top 30 (or so) domains hitting your blog for a time-period of your choice, of course all the while being fully conscious of the fact that those who digest us through rss readers may go undetected by this method. (For whatever it’s worth, hangingtogether currently enjoys 248 Blogline subscriptions (counting both feeds).)

    Hangingtogether Statistics, August 15th 2006 through November 15th 2006
    Sessions: total of 119,469, daily average of 1,285 (an approximate measure of how often somebody spent a chunk of time on the blog)
    IP Addresses: total 9,413, daily about 105 (an approximate measure of how many individuals visited the blog)
    Number of Countries: total 91 (an approximate measure of how international the blog is)

    Bonus: I’ve saved the best for last. Here’s a list of the top 35 domains from .edu, .org, .gov and our 5 most visiting countries (.ca, .uk, .jp, .de, .nl) in order of sessions. The time period covered by this statistic is the inception of hangingtogether (08-01-2006) through 11-15-2006 (a total of 15 months, 15 days). Bold means RLG Programs Partner, cursive means we’re having a post-modern moment and reading ourselves. (I claimed the UC Office of the President as a Partner, since I suspect some of the California Digital Library traffic (also under cdlib.org) comes through that domain, but I may be wrong.) I also deleted about half a dozen network providers from the different country domains, since they don’t provide much of a clue in terms of finding out who reads us – it’s just like saying AOL read your blog! Here’s the tally:

    oclc.org 13871
    uchicago.edu 5341
    ucop.edu 4259
    ucsd.edu 3041
    virginia.edu 3014
    dmi.org 2962
    columbia.edu 2824
    yale.edu 2525
    vcu.edu 2261
    umich.edu 2186
    ucalgary.ca 2111
    keio.ac.jp 2079
    si.edu 2065
    loc.gov 1908
    unr.edu 1700
    nga.gov 1688
    pica.nl 1647
    rlg.org 1529
    ufl.edu 1392
    wheatonma.edu 1234
    yorku.ca 901
    leeds.ac.uk 898
    lsuc.on.ca 868
    harvard.edu 866
    brynmawr.edu 862
    shsu.edu 843
    uni-goettingen.de 731
    umn.edu 629
    upenn.edu 605
    swarthmore.edu 563
    mobot.org 557
    cdlib.org 534
    utoronto.ca 513
    coloradocollege.edu 493
    kb.nl 368

    Some random observations:

  • Of the 35 organizations who made the list, 17 are RLG Program Partners, and most others are good friends.
  • The most avid readers of hangingtogether outside of OCLC are at the University of Chicago! Congratulations! Keep coming back!
  • Our most avid readers outside of the US are at the University of Calgary (Canada) with 2,111 sessions, closely followed by Keio University (Japan) with 2,079 sessions. In the UK, the University of Leeds seems to enjoy our offerings (898 sessions), and in Germany, we find the University of Göttingen among the top 35 (731 sessions).
  • Who would have thought the Design Management Institute (dmi.org) accounts for 4% of our sessions?
  • We’ve always thought that UCSD and University of Virginia should be program partners, and here’s the proof – they’re both among our top 5 readers!
  • We’ve had more visits from the Smithsonian (2,065) than from the Library of Congress (1,908), and are generally well-read in the nation’s capital – the National Gallery of Art clocked 1,688 sessions.
  • I wonder what the Law Society of Upper Canada (lsuc.on.ca) is getting out of our blog?
  • Over to you, Walt, Lorcan and Richard!

    Not yet laid to rest – Digital Images in the Classroom

    Friday, November 3rd, 2006 by Günter


    I live in San Francisco’s Mission district, a neighborhood teeming with Mexican and Latin American immigrants, where Dia de Los Muertos gets honored with a fantastic parade and exhibit of altars in Garfield Park. During a pre-parade party at a friends house last night, I met a woman who teaches studio art as an adjunct at Stanford and UC Santa Cruz. We had a lively conversation, which quickly turned to the use of digital images in the classroom (don’t all cocktail conversations?)… and her frustration with said topic. Since she mainly teaches contemporary sculpture, she finds it extremely difficult to get her hands on anything worth projecting in digital form. She tried ARTstor at Stanford, but claims that the interface confused her to a degree that she just gave up. I oracled that I was certain her local art librarian would be delighted to show her the ropes, and she acknowledged how wonderful librarians are once you take the time to talk to them. More frustration: even if she can find an image of a sculpture, it usually doesn’t quite show the angle of the piece bringing out the particular feature she’d like to discuss. She also mentioned that a slide projector on eBay was about $40, and she just bought one. I’d claim this is a user we should strive to serve better!

    All of this caused flashbacks of the conversations me and some of my program colleagues had with faculty at Stanford, UC Berkeley and University of Southern California a couple of years ago, and it also reminded me that there are two brand-new studies in this area which I still haven’t gotten around to digesting yet. A CLIR/Rice University report on Art History and Its Publications in the Electronic Age states as its number 1 recommendation:

    Organize a campaign to break down barriers to access and distribution of images, in all media and at affordable prices, for scholarly research and publication.

    While this recommendation speaks to the availability of digital images, a report commissioned by NITLE and Wesleyan University (based on four hundred survey responses plus three hundred individual interviews with faculty / staff at 33 colleges and universities), authored by David Green, makes its number 1 recommendation faculty tools for enhanced management and sharing of the images:

    Develop and share tools and services to assist faculty in organizing, cataloging and managing their personal digital collections, in a user-centered content model.

    The little I have read of both reports makes me want to read more (and I hope I managed to wet your appetite as well), and it gives me hope that at another cocktail party in the not-too-distant future, I’ll find the faculty members present more impressed with the image resources available to them.

    Visual navigation of textual information

    Sunday, September 17th, 2006 by Jim

    Anne’s recent post about the nice Programs and Research mugs featuring our own decorative tag cloud made me think about the overall lack of success at leveraging visual displays to aid in the navigation of textual information. Tag clouds are one of the few that seem to have traction and add value beyond the decorative. I’m aware of Grokker and a variety of other attempts at visual displays (cf Visual Net) – the only one that I find useful is the hyperbolic tree in the Visual Thesaurus. It’s not that I haven’t been enamored of the notion. Our very early planning for what became the RedLightGreen service (soon to rest in peace within the more expansive potential of WorldCat.org) included my own take at a tree display of results. It’s from around April 2001 and looked like this:
    voltaire hyper

    We were hurried away from this approach by the design firm that we retained to help us on RedLightGreen – ironically enough the folks who were responsible for ThinkMap, the software that underpins the Visual Thesaurus, and who now concentrate solely on that product. If you have any interest in words, reading and writing their web site is worth a look.

    In any event I’d be glad to hear about innovative and useful visual displays that add to the utility of bibliographic searches. I expect that there will come a time when such displays will be very helpful and expected but the prerequisite at least on the discovery dimension might be the long-anticipated convergence of libraries, museums and archives .

    User studies and ArchiveGrid

    Thursday, April 27th, 2006 by Merrilee

    Two of my favorite things, tied together: access to archival information, and user studies. All documented here in this RLG Focus article by Arnold Arcolio on our user testing for ArchiveGrid.

    After you’ve read the article, take ArchiveGrid for a spin. Fun and easy to use, and now you’ll know why.

    Digital Library Federation meeting, some highlights

    Thursday, April 27th, 2006 by Merrilee

    [warning! long!]

    Before I got distracted, I was going to update you on some of my travels and activities. I’ll start with the Digital Library Federation Forum in Austin, Texas (I’ve already told you about the panel on the Open Content Alliance).

    From my perspective, some highlights from that meeting — keep in mind that I was not able to go to all of the various sessions, and I missed several that I would have loved to attend. Links in presentation titles will take you to PowerPoint presentations.

    Evolution of a Digitization Program from Project to Large Scale [no PowerPoint available] (Aaron Choate, UT Austin)
    Transition from unique and rare to high volume and how to do both. Outsourcing for less fragile items, focus on efficiency and workflow. University of Texas is a member of the Open Content Alliance, so they will be digitizing books to contribute to the overall effort. The are using Stokes Imaging workstations for high volume. Using CCS/DocWorks for automated OCR and structural metadata. High volume scanning leads to increased workflow for preservation and cataloging, as well as collection managers and programming/web development. There is a library team dedicated to working this stuff out, testing processes and communication, Changing image or object IDs from “metadata encumbered” (my term) to arbitrary, streamlining workflows from the more handcrafted earlier stages of digitization. Acknowledged compromise on quality. Using Sharepoint (MS Windows) to manage project data. I liked this presentation because of it’s practical nature, and because it ties in well with our upcoming Member Forum.

    Contextualizing the Institutional Repository within Faculty Research (Deb Holmes-Wong, USC)
    Anne Van Camp and I heard about this project when we visited USC in January. Before building their institutional repository, USC conducted an assessment. The group looked in literature and couldn’t find that anyone else had done this type of assessment before launching a repository. Interviewed USC faculty, found that they are disinterested in depositing published works, more interested in supporting materials (which can’t be published due to space reasons) and PhD research. They want a permanent URL, want to be able to strictly control who accesses. Faculty also want high quality scanning services. They want for their materials to persist over time and be migrated forward in terms of file formats. I liked this presentation because it ties in with RLG’s interest in working with users before developing/deploying.

    Repurposing Digital Collections at the University of Michigan via Print on Demand
    Interesting presentation on how U Mich is turning MOA and other projects into print on demand books, offered via Amazon with fulfillment done via Lightning Source. Growing business, working towards cost recovery for tracking, etc. I liked this presentation because it explores new economic models for libraries, and also addresses issues of availability — many of these books are out of print and unattainable at a reasonable cost, and this project makes them available to those who do not have ready access to a well-stocked library.

    Serials, the Next Motherlode for Large Scale Digitization? (U Penn, John Mark Ockerbloom)
    Looking at opportunities for digitizing out of copyright and orphaned series, and techniques for how to determine. A real need for tools to help in this area. I liked this presentation because there is a clear tie in to the work we are doing on the Open Content Alliance.

    Surfacing consistent topics across aggregated resource collections (clustering and classification techniques)
    All projects were looking, I think, at using data mining techniques to cluster and then classify documents based on metadata, not text, so this work is analogous to work we did “under the hood” with RedLightGreen. I found this set of presentations interesting because of the tie in with RedLightGreen.
    1. Emory, MetaCombine (Martin Halbert) Clustering and classification tools part of MetaCombine, still need work. Looking at creating tools that can be used in an unsupervised mode. Looking at using web services to give access to MetaCombine tools (so you don’t have to install them at your home institution), give access to training sets, etc. Interesting part of the presentation is some work they have done to modify Heritrix, the open source web crawler that almost everyone uses. They’ve taken Bow, developed at Carnegie Mellon and adapted it to Heretrix so now Heretrix will do crawls based on relevance (only following links from relevant pages onto other relevant pages — if a page is not relevant, it stops crawling in that direction).
    2. OAIser, University of Michigan (Kat Hegedorn). Used the MetaCombine tools from Emory. Conclusion was that clustering was useful over a very large data set, but classification was difficult and less useful. Also, large datasets take a long time (well, we could have told her that, a lesson learned from RedLightGreen where processing the very large dataset that is the Union Catalog took quite some time!).
    3. CDL, Bill Landis. Work from CDL’s America West project. Clustering is good at a global level, classification helps to meet local/project needs. Classification and bags of words can and should be shared.
    4. If you have no idea what any of the above is about, David Newman from TopicSeek gave a nice introduction to clustering and classification.

    Recommending and ranking: experiments in next generation library catalogs (on Melvyl, CDL, Brian Tingle presenting)
    Currently investigating how to get XTF to represent MARC data in FRBR, if circulation data or holdings data are more helpful for in ranking, if “people who checked out this book also checked out…” features would be interesting. Just finished one round of user testing, will do more in May. XTF providing better ranking of results than are coming from the ILS. I’m inviting this team to come to RLG to share findings, so I will have more to report in June. Lots of RedLightGreen synergies.

    Unbundling the ILS: Deploying an E-Commerce Catalog Search Solution
    Andrew Pace and Emily Lynema, North Carolina State
    This project has received quite a bit of play and this was my first real look at it. Using an e-commerce tool, Endeca, to help provide relevance and faceted browsing to the catalog. Runs fast, because all data is held in RAM (no surprise). Takes 7 hours to reindex data, which is done nightly, on something like 1.2 million records. They encountered the same issues we did, in working with a tech partner — wow, you have so many fields and you want to index them all?!? Future plans to FRBRize. I was gratified to see numerous acknowledgements of lessons learned from our RedLightGreen project. If you haven’t seen it, take a look.

    Finally, David Seaman announced that he will be stepping down as the Director of the DLF. This is sad news, and we will miss him, but fortunately he’ll be around through the next Forum in Boston.