Recently I’ve required access to more computing power than I normally have available, since as part of the “Cloud Library” project managed by my colleague Constance Malpas we need to process millions of records from WorldCat, the Hathi Trust, and a regional storage facility. Luckily, OCLC Research sits on quite a bit of processing power, and it’s called “pebbles”.
Pebbles is a cluster of computers that consists of one “head” node and 32 “compute” nodes. Each node has 4 CPUs, 16 GB RAM, and around 1.5 TB of disk storage. The cluster takes its name from the fact that we use the Rocks open source cluster administration software to manage our cluster. Those of you who have been paying attention will remember that my colleague Thom Hickey wrote about this cluster when it was new, some 2 1/2 years ago.
So what I’ve been spending some time doing lately is learning how to work with it. Of course my colleagues who have been working on it for some time now have been quite helpful, but part of what I need to do is to just get my mind around how to process in parallel. I remember my first “cluster-fork” command (I’m telling you, I can’t make this stuff up) like it was yesterday. Oh, wait, it was yesterday. And if I had used the right switch, it would have run in parallel instead of serially. Oh well, there’s always tomorrow.
Part of what the Rocks software does is provide load monitoring. Below is a screenshot of when the cluster was under a heavy load:
I’m not yet knowledgeable enough to put it under that much load by my little lonesome, but give me time. I’ll get there.
Roy Tennant works on projects related to improving the technological infrastructure of libraries, museums, and archives.