Archive for the 'Miscellaneous' Category

Happy Extraterrestrial Abduction Day!

Wednesday, March 19th, 2014 by Roy

abductionJust in time for Extraterrestrial Abduction Day, commemorated by Earthlings everywhere on 20 March, we bring you the list of the top ten most popular library items with the subject heading “Alien abduction”:

  1. Close Encounters of the Third Kind (1977  movie)
  2. The True Meaning of Smekday, by Adam Rex
  3. Abduction: Human Encounters With Aliens, by John E. Mack
  4. Abducted: How People Come to Believe They Were Kidnapped By Aliens, by Susan A. Clancy
  5. Transformation: The Breakthrough, by Whitley Strieber
  6. Little Green Men, by Christopher Buckley
  7. Close encounters of the fourth kind : alien abduction, UFOs, and the conference at M.I.T., by C.D.B. Bryan
  8. The Light-Years Beneath My Feet, by Alan Dean Foster
  9. The Fourth Kind (2009 movie)
  10. Skyline (2010 movie)

For even more suggestions on library items one can borrow to get in the mood, see what FictionFinder suggests.

But in all of our excitement celebrating Extraterrestrial Abduction Day let’s not forget the most important item of all: How to Defend Yourself Against Alien Abduction, by Ann Drufell (see cover). I mean, whether you are returned to Earth or not, the best outcome of any attempt by aliens to abscond with you is to not be abducted in the first place. At least, that’s what I’m thinking.

Top 10 Love Stories in Libraries

Friday, February 14th, 2014 by Jim
from The Visual Thesaurus

from The Visual Thesaurus

Folks may already have seen the list of top love stories we put out yesterday for St. Valentine’s day. It was pulled together very quickly by JD Shipengrover and Diane Vizine-Goetz.

This one is quite nice and we are pleased with the response.

It is programmatically generated and based on a fiction subset of Worldcat(which includes movies); it is FRBRized so it is at the work level and consolidates editions etc; the items are identified by genre data in the MARC records; and it is ranked by holdings.

We are going to do more lists in future.

Library Engagement with Digital Humanities

Wednesday, February 12th, 2014 by Ricky

When our colleagues at OCLC Research Library Partner institutions were asked for topics that they’d like to see addressed, they kept saying, “Do something about Digital Humanities,” but they didn’t say what needed doing. When we said at staff meetings, “We should do something about Digital Humanities,” we were asked what exactly and demurred. What did we have to contribute? Then Jen and I started looking around and realized that a lot has been written by librarians about DH and libraries, and librarians often go to meetings to talk to other librarians about DH. We decided to try to steep ourselves in DH from the researchers’ point of view and see what might come of it. After lots of reading, several conferences, and a couple of focus group sessions, we’d learned a lot. We synthesized what we heard and shared it with several colleagues who thought it would be useful to library directors who are not already in the thick of DH.

Humanités Numériques [crop] Calvinius CC BY-SA 3.0

Humanités Numériques [crop] Calvinius CC BY-SA 3.0

We extrapolated from what we heard from the DH scholars to what we thought it could mean for library directors. The result was a spectrum of possible ways for libraries to engage with DH. It ranged from making sure DH scholars knew about services you were already offering that could be helpful to them, to full-bore immersion in a DH center, and with lots of other possibilities in between. There are ideas about offering training, involving DH scholars in digitization initiatives, helping to make their work discoverable, enabling them to enhance metadata, preserving the outcomes of their work, and making sure that it’s all sustainable. Whether or not a library has one staff member, several, or a center devoted to DH is largely a local issue. That may be based on how much is wanted by the researchers and how much is doable or affordable — but these things will change over time. Where a library begins on the spectrum is not at all where they might end up.

From our conclusion:

Humanities scholars have always been a central constituency for research libraries. The digital humanities constitute an evolving approach to research, and directors must support this work as a component of the university’s research mission. Libraries offer many useful services to digital humanists. Where the need is clear and DH scholars are receptive, libraries can offer various dedicated services to further DH efforts. In some cases, a full-blown DH center may be warranted…

…No matter which approaches to supporting the digital humanities you opt to take, keep in mind that what we call “The Digital Humanities” today will soon be considered “The Humanities.” Supporting DH scholarship is not much different than supporting digital scholarship in any discipline. Increasingly, digital scholarship is simply scholarship.

Since our essay, “Does Every Research Library Need a Digital Humanities Center?” was released last week there has been a lot of important discussion on Twitter and on various blogs. The conversation has gone in many useful directions. The ACRL blog dh+lib has offered to continue the conversation.

The Most Edited Book Records in WorldCat

Friday, February 7th, 2014 by Roy

hungergamesIn my last post I identified the most edited records in WorldCat, which, no surprise, were all serials. Someone who read the post asked about this information by format (e.g., books, maps, scores, etc.). I doubt that I will get to all of the various formats, but I decided to take a look at books.

Unlike serials, for which I noted those that had 60 or more edits, for books I had to lower the threshold to 40 to get any at all (the most edited item had 58 edits). So here are the book records which have been edited more than 39 times in WorldCat (in no particular order):

An inevitable conclusion from the above seems to be that the more libraries that hold a book the more likely a cataloger will be to touch the record for it, which would explain how Harry Potter and the Hunger Games books made it on the list.

The Most Edited Records in WorldCat

Thursday, January 30th, 2014 by Roy

Recently I’ve been doing a large pile of data processing jobs that has me working in cycles of 20 minutes or so. In other words, I do some edits, kick off a job on our compute cluster (fondly named “Gravel” — don’t ask) and about 20 minutes later I do roughly the same thing. Yeah, I know, you’re thinking “why doesn’t he automate it?”. And I would, except that this is a shared resource and rather than kicking off my monster list of jobs that could keep the cluster running from now until…well…a long while from now I think it’s better to introduce some variability in load.

All of that is a long introduction to how I came to discover the most edited records in WorldCat. To fill in those 20 minute blocks I took up some “mini investigations” that do not take as long to perform.

For one such investigation I looked into how often WorldCat Records have been edited and by whom. I will be blogging about this in an upcoming post, but a small slice of this investigation was a closer look at the records that have been edited a lot. Since we keep track of the cataloging symbol of every institution that has edited a record, these can stack up for records that require updates on a regular basis — in other words, serials.

All of the records for these serials were edited more than 60 times over their life in WorldCat, and in no particular order:

Take a bow, serials catalogers, you’ve clearly earned your pay.

The Most Used English Title Words in WorldCat

Friday, January 3rd, 2014 by Roy

This is another installment in my continuing series of eclectic, peripatetic, and yes, let’s just say it: “pathetic” data investigations. The most recent identified the top countries of publication for WorldCat records. For whatever reason, I got it into my head to determine which English words appear the most in the main title of WorldCat items.

Clearly there are at least two ways to go about this: a) a formal, well-designed, highly replicable and ultimately near perfect investigation, or b) a slapdash, fast, seat-of-your-pants investigation of questionable merit. When given such a choice, I find the latter completely irresistable.  So I took part of my day today and did exactly that.

Since I already had code on our research cluster affectionately named “Gravel” that could extract a specific subfield, I powered it up and sucked out all of the 245 $a fields from WorldCat. As part of that process, I extracted only unique strings. The sharp ones among you have likely noticed a couple flaws already: 1) I was too lazy to filter based on language, and 2) I was too careless to normalize the title strings.

Flaws have never stopped me before, so I blazed on as if nothing was amiss. Then I threw that monster file onto another computer where I didn’t have to worry about interfering with any of the actually useful work that my colleagues where doing on Gravel (you’re welcome). There I wrote a special-purpose Perl script to take each title string, split it into individual words, lowercase them, and count up the occurrences. I dabbled in creating a “stop-words” list of useless words like “a” and “an” and “and” and “the” (ad infinitum) but that quickly began looking like a rabbit hole. As I was only really interested in identifying the top 30 or so words I figured my human eyeball would be sufficient to trap those in the end. Likewise with the foreign words.

That was really about it. Well, except for all the time I spent on Facebook waiting for the operations to complete. Did I say that out loud?

Anyway, without further ado (thank god) here are the top occurring meaningful English title words in WorldCat:

2020380 new
1853252 report
1431184 study
1159042 development
1069940 analysis
1004554 history
978681 county
968097 international
929294 state
890928 guide
856935 system
789983 education
778732 school
756569 united
748894 national
736474 management
706559 social
700137 book
688993 states
688328 studies
687695 general
687665 american
679083 systems
678582 public
677286 water
671552 research
666407 life
661707 health
645966 plan
644212 world
642100 effects

OK, now move along, nothing to see here.

Conversations about “Starting the Conversation”

Friday, December 6th, 2013 by Ricky

One of the best parts of my job is working with OCLC Research Library Partner staff on working groups. In this case we never got together face-to-face, but managed to put together a pretty good report, Starting the Conversation: University-wide Research Data Management Policy. Though we started out with a conference call, all the work took place via email and shared documents. The working group consisted of:
Dan Tsang, chair — University of California, Irvine
Anna Clements — University of St. Andrews
Joy Davidson — DCC, University of Glasgow
Mike Furlough — Pennsylvania State University
Amy Nurnberger — Columbia University
Sally Rumsey — University of Oxford
Anna Shadbolt — University of Melbourne
Claire Stewart — Northwestern University
Beth Warner — Ohio State University
Perry Willett — California Digital Library
I supplied the bones, they filled in some of the sections, and I polished it up.
Working in OCLC Research, we try to stay on top of the literature and we hear a lot about application, but there’s nothing like being in the thick of it, so it’s really great to have the expert input of those actually working in research libraries.

Learning Commons: well-made in Japan

Wednesday, November 27th, 2013 by Jim

During a very hectic, very interesting week visiting research libraries in Japan last week I had the good fortune to tour the new (April 2013) Learning Commons at Doshisha University. It is not a library-managed facility but the library helps to staff it along with other Student Support Services staff. The facility itself is as good an implementation as I’ve seen anywhere including the new facilities at North Carolina State University’s new library. The Doshisha University Learning Commons brochure

The Commons itself is a multi-story structure constructed adjacent to the library and connected to the library at various levels. As a consequence students can move very freely from the collections and quiet of the traditional library to the group study, presentation, production and technology areas of the learning commons. There are plenty of visible but unobtrusive staff available to the students. People in red jackets offer technology support, in blue jackets peer instruction and guidance, in yellow you get media production and on each floor a desk staffed by a librarian.

There are no fixed furnishings in the entire facility. Everything can be moved. As an experiment they left one group study space with two tables without rollers. That space is the most infrequently used in the building. I was impressed with the energy of the staff and the enthusiasm of the students. The location of the facility bordering on one of the busiest streets in Kyoto purposely serves to advertise the learning environment of this private university. The big study and computing rooms are lined up along picture windows that face out onto this boulevard ensuring that Kyoto citizens know that Doshisha is a good place to learn.

Check out some photos taken during my walk-through in this Flickr set. Look for the Global Village sign that designates an area where no Japanese is to be spoken.

P.S. After the original post my colleagues at Doshisha advised me that an English language version of their Learning Commons brochure is available (.pdf).

Harvesting Book Metadata From Wikipedia to Wikidata

Wednesday, November 27th, 2013 by Max

Infoboxes for a long time were Wikipedias’ way of storing data, and Wikidata is set to replace that techonlogy, with added bonuses like inter-language sharing. To get to that promise one first step is for Infoboxes to be harvested into Wikidata. I have started by harvesting Infobox Book in the 9 biggest Wikipedia languages that share the template: English, Italian, French, Spanish, Russian, Polish, Portugese, Swedish, and Japanese.

The point of harvesting Infobox Book specifically is that the Wikidata citation guidelines for books specify that the Library FRBR concept should be used, so I wanted to build out infrastructure to that end. FRBR is about describing Bilbliographic record at many different levels and here’s an example of what this kind of citation would look like in Wikidata:

With that in mind lets have a look at the data. O ur entry point is the set of Wikipedia pages that use Infobox Book -transclusion in Wikpedia parlance – in the 9 aforementioned languages. This measure is only an approximation and does not completely reflect how many Wikipedia topics are about books in a language for three reasons. The first is that the conception of a what is a book is not strictly enforced on Wikipedia.  An article could be about a physical item or an amorphous work idea,  or even sometimes the inclusion of an infobox book template is only a nod to a book like French article on this racing pigeon.  The second is that not all articles about a book necessarily contain a transclusion to Infobox Book. And thirdly some specialised Infobox Books have developed and are used instead, like Infobox Doctor Who Book.

In this next chart we look at the total Infobox Book transclusions, the total articles of a language, and the ratio between the two. Despite large variation in absolute numbers, the percentage of Books Articles in a Wikipedia is somewhere beteween .1-1% of all articles. Italians affirm themselves as the most bilbliophilic. We’ll also see later on about how their practice of labelling genre differs from the others.


Infobox Book Transclusion Counts By Language
Language Infobox Book Transclusions Total Articles (000′s) Percentage of Total Articles
en 30582 4432 0.690
es 3534 1057 0.334
sv 3023 1598 0.189
pl 2782 1005 0.277
pt 1975 803 0.246
ru 1865 1061 0.176
it 10788 1082 0.997
ja 1446 886 0.163
fr 7935 1441 0.551


In each Infobox I crawled for the most used properties across all languages and whose values were either string identifiers or links to other Wikipedia pages. When a value is a link to another Wikipedia page, for instance a link to the page of the author, that is useful because when harvested Wikidata can store the author property as a link to another Wikidata item. This is desirable as in Wikidata we seek to build a Wiki of relations.

Here is a graph of the properties that found, which were added to Wikidatak, and which were already in the database.

Properties Harvest

So as you can see there are now over 30,000 relations between books and their authors and illustrators in Wikidata, as well as the original language and genres of the books. In addition knowing which book is which from a disambiguation perspective is made easier by the inclusion of over 50,000 identifiers.

One difficulty that was encountered was that even though ISBNs are recorded in Infobox Book, the type of ISBN – 10 or 13 – was not discriminated. Wikidata does however discriminate, and so as I was sorting these ISBNs I thought it would be sage to also verify them. OCLC runs an API called  xID for this very purpose. While using xID it also struck me that the OCLC control number could be returned for a given ISBN. As Wikidata is rapidly evolving into a hub of identifiers, I included those in pushing to Wikidata. During this harvest then I also inserted an additional 10,117 OCNs (not pictured above).

As I mentioned It’s not just boring, nameless identifiers that we want to eventually integrate into all the Wikipedia pages by Wikidata. I inspected genre data as well to see how much cross-cultural benefit we’d receive by doing these sorts of harvests.  Below are the Top 10 genres found in Infobox Book by each language. The text shown are the English Labels of the Wikidata Items of links found in each local Infobox. I’ve also outlined those genres which are unique. So you can see that Swedes care a bit more about the choir books and the Japanese have a bent towards police drama.


Infobox Book Top 10s

What first jumped out at me is how inconsistently the idea of genre is used. In some ways its used to describe the content’s emotion and focus, like “science fiction” or “horror”. Other times its used to describe form like “novel”. In fact only the Italians really are very consistent as their top ten, albeit discusses form in “novel”, “essay”, “short story”, “poetry”, “anthology”, “autobiography”, “novella”, “dialogue”, and “poem”.

Another problem between languages is that the genres mismatch often because they are pointing to only slightly different articles. That is we see appearances from the Wikidata items for “fantasy”, “fantasy literature”, “Fantastique”, and “high fantasy”. (By the way you can draw your own conclusions about the demographics of Wikipedia editors when this much fantasy lit pervades the results.)

A conclusion that can be drawn from all this is that there is still some work to be done on negotiating cultural differences on Wikidata. Wikidata has made a lot of connections between Wikipedia articles in different languages, but not all of those merges are clean. The French conflate a pigeon and a book about a pigeon, and its linked to languages that discuss only the pigeon. Meanwhile how how the Italians interpret “genre” is a different, not necessarily incompatible, notion to others. There are some discussions still to be had probably before Infoboxes completely switch over to using Wikidata data, but we are at least one step closer to that goal.

Building and managing your social media brand

Monday, October 7th, 2013 by Merrilee

This blog posting evolved out of an assignment we received, to share with colleagues at OCLC Research how to build and maintain a social media brand. While there’s nothing on this list that is particularly original, we both thought that the advice we came up with for colleagues was also worth sharing with a larger audience. Some of you may be consumers of social media, but not already actively blogging, Tweeting, or Tumbling. Others of you are hardened experts and we hope you will share your wisdom in the comments below!

Your online brand is the reputation you establish over time by providing useful and appreciated value to others. Establishing your brand and maintaining requires commitment, since constant activity is better than episodic participation. Also, it is much easier to damage your online brand than it is to build it, so participate thoughtfully and with grace.

  • Determine what your online “handle” (nickname) will be and use it everywhere. The more consistent you can be with the use of your chosen handle the more likely your potential audience will recognize that it represents you in a variety of contexts (e.g., Twitter, Facebook, etc.). Register the domain name as well, in case you ever want to have your own web site.
  • Select the fora in which you will participate (e.g., Twitter, Facebook, Google+, etc.). You should participate in sites and services where the audience you wish to reach can already be found. For example, if no one you know hangs out on Google+, then skip it. Don’t forget mailing lists, which can still be an important venue for participating with a given community.
  • Participate consistently. Participating in online forums should be a regular part of your professional life if you are trying to build and maintain an online brand. Being a consistently contributing member of a community is the most important method to build your reputation.
  • Contribute real value. Your contributions should carry value for those who will likely see it. For example, if you are trying to establish a reputation for insightful commentary about libraries you may wish to avoid commenting a lot about politics or what you had for dinner. This can vary a bit by venue, with some being more formal than others, but always consider the impression you are trying to make within a given venue.
  • Say why. If you’re linking to something, say why your readers should click through to it. Be a good citizen and give as much context as possible, even with character limits.
  • Do not post when impaired (e.g., angry, drunk, depressed, etc.). What seems like a good idea at the time may not be. If you must, write it up but don’t post it until you’ve slept on it.
  • Consider reposting key items at opposite times of day (to make your impact felt in the broadest set of timezones possible).
  • Use the option to schedule posts when appropriate. Most social software clients have methods to schedule a post hitting a particular service at a particular day and/or time. This can be used, for example, to keep up an online presence during a vacation or to repost in a different timezone.
  • Keep in mind that nothing on the Internet can be considered private. Nothing. Whatever you write can show up in places you don’t expect, so be nice. Always. This doesn’t mean you should not forcefully argue your point, just be respectful about it.
  • Post appropriately for the venue. Twitter, for example, is a friendly venue for humor and personal comments. LinkedIn groups, however, are likely not.
  • Avoid “tweet bombing”. If you only check your social media account once a day, don’t make the mistake of posting or tweeting a lot during that one session. If people see many posts coming from you during a short period, it becomes more annoying than helpful. Rather, use the scheduling feature that your client software may (should) have to space out your posts or shares over time.
  • Be a good (and reciprocal) citizen. Everyone loves to have their links and wisdom shared. If you share something you got from someone else, give credit where credit is due. A name check at the right time will go a long way towards establishing good relationships.
  • Be in the flow. At conferences, you can leverage the conference stream both to keep up with what is happening in parallel sessions but also to boost your own signal. People who don’t normally follow you will follow a conference stream — if you are active in the stream, you will pick up new followers and also find some new people to follow yourself.
  • Periodically review your online presence. Are you participating in the right forums? Are there new venues you should add to your repertoire? Others that you can withdraw from? Keep in mind that your assessment of a given venue may change over time. I initially thought Twitter was only really useful at conferences, but later assessments changed my mind.
  • Use services such as, Feedburner, etc. to assess, not obsess. That is, services that rate the impact of your social media presence can be useful to get feedback on your impact, but do not become obsessed with increasing your score. Remember that your overall professional brand also includes other important factors, such as the articles you publish and the presentations you give.
  • Don’t worry about turning your back: Once you’re engaged, it can be hard to step away from the stream, but it will be okay. Anything really important will come back around again.
  • Be yourself (within limits!). Although social media is durable, there is no reason to hide who you are. In fact, expressing your personality let’s people know that you are not just someone who works for a particular organization, but a person with passions, interests, and (hopefully) a sense of humor.

With time and consistent performance, your online reputation can be a strong complement to your overall professional reputation. By establishing a strong and valued online presence you can increase the demand for your work in other venues, such as presentations at professional conferences or invited articles for professional journals. Without such a presence in an era dominated by electronic communication, you may run the risk of damaging an otherwise stellar professional reputation.