Archive for August, 2012

Enjoying the Scots

Friday, August 31st, 2012 by Jim

I had a very enjoyable conversation today with Martyn Wade, National Librarian and Chief Executive, of the National Library of Scotland. He made me aware of the relatively new legislation that updates the purpose and functions of the National Library. The library had been operating under legislation that dated from 1925. The new legislation positions the Library to fulfill the kind of role that the citizenry and other national and higher education institutions expect in the digital age. The legislation is brief, to the point, seems actionable and aims to be ‘future-proof’. It’s worth a quick look at 20 very generously-spaced pages. I was particularly taken with a subheading under NLS Functions:

NLS is to exercise its functions with a view to—
(a)encouraging education and research,
(b)promoting understanding and enjoyment of the collections,
(c)promoting the diversity of persons accessing the collections, and
(d)contributing to understanding of Scotland’s national culture.

I’m not aware of other library mission statements that explicitly call out the need to ensure that their collections are enjoyed. I like that very much.

In passing Martyn mentioned the library exhibit called Going to the pictures: Scotland at the cinema. In connection with this exhibit on the library’s Facebook page there was an opportunity to “Scot-ify” famous lines from the movies – Scotland at the Cinema Strikes Back. It’s ongoing and has been very successful. It’s charming and funny. Worth a look. Postcards made from some of the submissions will, of course, be available for sale in the library shop.

Born Digital for Those Born Analog

Thursday, August 23rd, 2012 by Ricky

The first outcome of an ongoing OCLC Research activity, Demystifying Born Digital, is a report, You’ve Got to Walk Before You Can Run: First Steps for Managing Born-Digital Content Received on Physical Media.

Jackie Dooley and I submitted a draft of this document to some of the smartest people we know (see the list here) and what ensued was one of the liveliest professional discussions I have ever been part of. We wanted to end up with a very simple list of actions for getting a born digital backlog under control. The assumed reader was a person without expertise and in an archives where there was no policy in place and little IT support.

Many of our advisors instinctively pulled toward best practices. Others acknowledged that even with their expertise and tech support at relatively well-funded institutions, they were not able to do even what we had been calling minimal steps. Some of the steps became optional; others got a bit of elaboration — even the order of the steps was the subject of much deliberation, but we ended up with a simple, direct document to help those who were afraid to take the first steps.

Wikimania 2012: copyright and closing thoughts

Tuesday, August 14th, 2012 by Merrilee

This is the last in series of posting on my first Wikimania. I’m (mostly) focusing on the connection between Wikipedia and libraries, and approaching topics thematically, rather than going through the conference in order.

I was distracted by the Society of American Archivists meeting (which I’ll be blogging about soon!), but I’m back to wrap up Wikimania.

Wikisource, Wikicommons and the copyright conundrum

Discussion about IP rights came up in many discussions and presentations at Wikimania (as you would expect with a group so dedicated to increasing access to free knowledge), but the one I found most interesting was an Oxford style debate on the topic “That all Wikimedia projects should have Fair Use, or none of them.” Why is this important? Because Wikimedia is more than just Wikipedia, and has a range of projects which make content available. For example, The National Archives and Records Administration (NARA) here in the United States is contributing to both Wikisource and to Wikicommons (I explained a little about these two Wikipedia “sister” projects in a previous post). Because Wikimedia projects exist in a very international context, contributions to Wikisource and Wikicommons must be very strictly in the public domain or covered under an appropriate licence that renders the materials as “free content” in a similar way (it’s important to note here that licensing that requires attribution is acceptable). Putting materials into either project and claiming fair use is in fact strictly prohibited.

I know that many institutions will find both Wikisource and Wikicommons to be attractive options, but there are few (U.S. based) institutions that will be able to put most or all of what they have digitized into these projects (NARA may be an exception, as may other government institutions or those who exclusively collect material from the 19th century or earlier.) This is too bad, because otherwise, Wikimedia projects are ideally aligned with the mission and aims of cultural heritage institutions. Still, there is much to collaborate around, so I’m still very excited!

Conference roundup

I want to wrap up by giving some of the high points as well as oddities I noted at this conference. The conference was very inexpensive compared to many library conferences (thank you, sponsors!). Registration ranged from $35 to $95, which included morning food and lunch (additionally, there was a reception each evening with some level of food and drink). This is the first conference I’ve attended (with the possible exception of ALA) which was “trending” on Twitter. Thanks to ubiquitous wireless, ample power, and an enthusiastic cadre of Twitterati, the conference stream was useful, and at times overwhelming. All the people I met were amazing. During lunch I was touched when people noticed that I was scanning for a friendly face and invited me to sit with them.

The conference was not without flaws. On several panels, at least one of the scheduled presenters was not present. Odd to me, everyone presented from their own laptop, rather than consolidating presentations on a single machine. To make matters worse, almost everyone was presenting from a Mac and the majority of them had difficulty shifting displays. This was amusing to me, both because of the Mac’s “intuitive” reputation and also because of otherwise extraordinary tech prowess of presenters. However, the time wasted dorking with technology was considerable. If Raganathan had rules for conferencing, one of them would surely be “save the time of the attendee.”

Attending Wikimania was a terrific experience and I hope I have the opportunity to attend in the future. As I have said repeatedly, I have been excited about the potential for alliances between libraries and other cultural heritages institutions and Wikipedia / Wikimedia. Attending the conference only cemented my conviction.

Goodbye Wikimania, see you next time!

If you want to take a look at some other blog posts summarizing the conference from the LAM perspective, see Ed Summers on Wikimania Revisited and the Biodiversity Heritage Library’s report, Wikimania 2012 & BHL

Actualizing “Actualizing Infotopia”: The SXSW Race

Monday, August 13th, 2012 by Max

We’ve made our application to SXSW to present what we think Infotopia could look like.

If you wanted to understand our SXSW proposal in an image it you could use the one below:

If you wanted the same understanding as video, you could watch the one below:

And finally, if you wanted to vote for us, and secure this presentation’s actualization you can click on the image below. (You’ll have to make a SXSW account unfortunately).

Max Klein @notconfusing Merrilee Proffitt @merrileeiam

Adventures in Hadoop, #1: Introduction and the Research Cluster

Thursday, August 9th, 2012 by Roy

What is it with geeks and wacky names, anyway? Despite what it sounds like, Hadoop is neither a rarely-glimpsed mammal from Tasmania, nor a children’s board game. Nope, rather it is a family of technologies that implement various aspects of Google’s MapReduce algorithm cum programming model that is optimized for processing huge data sets.

Since WorldCat is nothing if not a huge data set, it seems only natural that we would be using MapReduce technologies to data mine WorldCat, and in fact we have – for years. But what is new to us is making the transition to the Hadoop family of technologies that implements MapReduce in Java and is tuned specifically for running on clusters of computers, as we have in Research.

As I stumble along the learning trail with my colleagues (in particular, Bruce Washburn here in the San Mateo office), I hope to write about some of my experiences here — not so much to educate as to entertain through laughter. You see, I tend to learn new technologies just well enough to create havoc, and that can be quite entertaining if not instructive about what not to do. But more on that later.

For today’s post I will introduce you to “gravel,” which is replacing our former compute cluster that had been dubbed “pebbles” (don’t ask). It looks like this:

  • 1 “head” (control) node – 2 6-core 3.1 GHz processors, 64 GB RAM 24 TB hard disk
  • 40 “compute” (processing) nodes – each with 2 4-core 2.6 GHz processors, 32 GB RAM, 6 TB hard disk

For those of you following along at home, overall we have well over a terabyte of RAM and about a quarter of a petabyte of disk storage. Needless to say, even with multiple copies of WorldCat we have plenty of headroom for data mining processes.

So hardware isn’t much of a limitation right now, but it will take me more work to get up to speed on the software side. I hope to document some of those antics here over the coming weeks. It may be as pretty as watching sausage being made (which, as an Indiana farm boy with German ancestry I actually have seen  – and smelled!), but I’m hoping it will be humorous if not also informative. You be the judge.

OCLC Research at SAA

Monday, August 6th, 2012 by Merrilee

Well, it’s that time of year — time for many of us to pull up stakes and head to the Society of American Archivists annual meeting, this year in San Diego. I’ll be attending along with other OCLC Research colleagues: Jennifer, Bruce, Ellen, Ixchel, and Jackie, who will be taking office as president of SAA at the close of the meeting. (As part of her official duties, Jackie has helped launch a new blog called “Off the Record” as an informal communication channel for issues related to SAA and “things archival.”)

You will see us at a variety of sessions. I’ll be starting off the meeting by attending the TS-EAD meeting on Wednesday (this is the group overseeing the long overdue update for Encoded Archival Description. On Thursday, I’ll be chairing session 201, Taking Stock and Making Hay: Archival Collections Assessment in Action. My colleagues will also be taking part in conference sessions. Jackie will be presenting in session 507, Strategies for Undertaking Electronic Records Management in Museums (her talk is titled “Setting the Context for Born-Digital Management in a Cultural Institution”); Jennifer is chairing session 508, Interlibrary Loan and Archives: The Final Frontier. And Ixchel is presenting in session 504, Breaking Down Boundaries Incorporating Users into Digital Repository Development (her talk is titled “Infusing Consumer Data Reuse Practices into Curation and Preservation Activities”). Unfortunately, all my colleagues are presenting in the same Saturday morning slot, so there are some tough choices there.

Jennifer is also helping to convene the EAD Consortia Brown Bag Lunch on Thursday afternoon; this is an opportunity for regional/statewide aggregators of archival resources to exchange of information about each others’ projects and programs, but all are welcome.

I will be hanging out at the ArchiveGrid booth with Bruce and Ellen when the Exhibit Hall is open on Thursday and Friday. We will be collect ideas about what people like about ArchiveGrid, and how we can improve it, and also getting your feedback on some new design ideas. I also have some special badges for those who visit the booth so please drop by booth #302! We will be giving updates about various OCLC Research projects in roundtables and section meetings, so it will be sort of hard to avoid us. Flag us down and say hello!

Wikipedia and Libraries: The Afterwebinar

Thursday, August 2nd, 2012 by Max

At 556 attendees strong the recent OCLC Research Webinars “Librarians are Wikipedians Too” and  ”Wikipedia and Libraries: The Connection” piqued the progressive, exploratory minds of Librarians worldwide. Conviced tech managers at independent research libraries asked for help to jump onto the Commons mass upload bandwagon. Reference Librarians started to dream up combined workshop / editathons, from the explanation of the two.  As well workshops and edithons the webinars outlined the 5 classical points of collaboration between the two communities, and how to forensically evaluate which areas of Wikipedia are fertile for Library linking.

A webinar is nothing without it’s audience and their questions.  We answered as many as we could at the time, but there were some more difficult questions to answer, which now clear of time restraints, I’ll answer in full.

Where to go next:

The answer of where to go next is somewhat of a mantra we hope to impose: “the wiki”.  The Wikipedia Loves Libraries portal is a growing base of related materials, ideas, and links to the subject. We recognize that using a wiki to get help with wikis can be somewhat of a contradiction, and have set up a simple form to get paired with Wikipedians in a more traditional way.

Unanswered questions from Chat:

Question from Bob Kosovsky to All Participants (02:54:43 PM):

Max: WP is 6th most used website; but acc. to visualizations I’ve seen, DPpedia is THE most used data source; can you talk about the implications of DPpedia being the MAIN source of data/information for numerous websites?

I think you’re referring to this image,

Linked Open Data

which shows DBpedia as the center of the Linked Open Data universe. DBpedia is a database of information scraped and infered from Wikipedia. It being this large has the implications that Google searches will be eerily smart, and occasionally possibly wrong. Beyond that it signals that despite some best effort to deride crowdsourcing as untrustworthy, the internet are utilitarian.

Question from Madeline Wagner to All Participant

I would like to know more about how “minority” views on a subject are handled : ie the recent article by a scholar who tried to edit the entry on the Haymarket affair.

This question leads to an advanced and philosophical design choice of Wikipedia. The controversy arond the Haymarkey affair on Wiki (chronicled here) highlights, that Wikipedia is not an encyclopedia of truth but an encyclopedia of proof . That is, by design, the facts that belong on Wikipedia are the ones that can be sourced, and true-but-no-provable statements aren’t valid Wikipedic content. Wikipedia is this way for practical reasons. For a full justfication read the essay “Wikipedia:Truth – A place for minority views.”

Question from Michele Combs to All Participants (02:58:36 PM):

Rule of thumb seems to be “no institutional WP accounts,” only individual ones so that there is a single responsible person for each edit; would you advocate permitting creation of institutional accounts for creation/editing so as to make edits more credible/authoritative?

Let us be pragmatic. It’s highly unlikely that Wikipedia would ever change it’s policy to allow group accounts, because if you are looking to make a user account’s edits more authoritative then we’ve lost the equity granted to anonymous users – a very historic tenet. To achieve a unity and community respect for a library’s editors as whole I’d suggest using a naming scheme in the vein of [name]+[institution]. For instance in my personal life I am User:Maximilianklein but when I edit for OCLC I use User:Maximiliankleinoclc which knots mine and my institution’s reputation.

Question from Kjerste Christensen to All Participants (02:33:07 PM):

If your library has a strong focus in a particular area, what about partnering with a WikiProject related to that subject area to look up information or scan media as needed?

This isn’t really a question at all but a fantastic comment. Click here to view the directory of Wikiprojects.

And remember — it’s  not confusing



Happy Birthday HangingTogether!

Wednesday, August 1st, 2012 by Merrilee

Today marks the 7th anniversary of this blog (and the first time we’ve noted it here). We’ve marked a lot of changes over the last 7 years, and we also have a lot of stats to share. By the numbers, we’ve had 673 posts (make that 674) and 741 comments (why are you so quiet?); thanks to Spam Karma and reCAPTCHA we’ve trapped almost 95,000 spam comments; according to Feedburner, we have 1860 subscribers (you are in good company!).

When we started the blog in 2005 our original contributors were Jim, GĂĽnter, Anne, and me — all voices from the Research Libraries Group and our Member Programs division. Karen and Constance started posting in late 2006, and were joined in by Brian, who we gained after the merger between RLG and OCLC. Dennis joined our ranks in 2007, along with new additions Roy and Jennifer. In 2008, we added John (who added British spelling and a Scottish voice) as well as Ricky. Jackie joined in 2009. Bruce started contributing in 2011 with a short lived but popular (according to our stats) series called “What We’re Reading”. Nancy also lent her voice to the choir in 2011, to introduce the new OCLC Research Library Partnership. And in 2012, we’re pleased to have Max join our ranks.

We’ve been through a lot together in the late seven years — new hires and departures (GĂĽnter, Anne, and John); weddings and births; illness; grants; papers published, webinars given, conferences planned and attended. Lots of phone calls, across many timezones. Too many emails to count. But through it all we’ve been hanging together. On that note, I’ll leave you with a a link to our very first post, for an explaination of our name (a declaration of dependence), and a link to Roy’s post Of Rivers and RLG where he makes the connection between rafting, teamwork, and riding the rapids of change. This blog is and has been about that change, and about how libraries and archives can make the transition together, without falling out of the boat. Thanks for being along for the ride.

Birthday Cake -- help us blow out the candles!

Birthday Cake — help us blow out the candles!