Having recently returned from the International Conference on Digital Archive Technologies in Taiwan, I am still struck by both the event and the location. A note about Taipei: it has shopping malls of a size and number to make any upstanding American city greenback-green with envy, and they contrast rather strikingly with the Buddhist, Taoist and Confucian temples sprouting out of every nook and cranny of the city. Add to that the diversity of Asian cuisines offered up for sampling in upscale venues as well as on the street of the legendary Taipei nightmarkets, and a very hearty first introduction to Asia (as this was for me) is achieved.
More to the point of the event and the topic of this blog: if you don’t turn green with envy at the mention of massive shopping malls (and bookstores, may I add!), you’ll probably turn green once you learn more about the National Digital Archives Program (NDAP) of Taiwan, the organizing body of the conference. Through massive funding from the Taiwanese government (from US$ 10-20 million / year from 2002 on), NDAP has been able to solder together the nation’s major libraries, archives and museums into a reasonably unified digitization-machine. I think in the US we can only dream of a similarly focused vision of bringing together all digitized cultural resources under one description and access framework.
The sophistication of their approach to descriptive practice has impressed me during a talk [pdf] given by Shu-Jiun (Sophy) Chen during MCN 2005, and in Taiwan I learned that Sophy’s team (Metadata Architecture & Application Team) is now extending its reach into the sphere of Learning Objects. NDAP has also made great strides in digitizing video, as witnessed by an introduction to their research and tool-building efforts by Chih-Yi Chiu. As a matter of fact, digital video turned out to be one of the focal points of the conference. Richard Wright from the BBC eloquently described the conundrum audio-visual archives face: while they turn to digitization to flee the deterioration of analog tape, now they find themselves in an even tighter race against obsolescence with their digital content. According to Wright, analog formats were dependable for a few decades, while the first generation of digital video had a life-span of 10 years at the BBC. Another invited speaker, Pasquale Savino (National Research Council of Italy), introduced the European project ECHO to provide access to historical documentary films, with interesting insights into techniques for automated indexing of key frames, object recognition, text extraction from audio and subtitles, etc., as well as a data model [pdf] which makes use of FRBR to describe audio-visual materials.
Another cluster of talks focused on web archiving. Most fascinating: the keynote by Hsinchun Chen (Artificial Intelligence Lab, U of Arizona) described how he uses web archiving techniques to document the activities of terrorists. Striking fact: according to Chen, 80% of anti-terrorism intelligence information are public, i.e available in chat-rooms, on websites, in online videos, etc. The tools created by the lab enable gathering the information on websites (reasonably dynamic) as well as forums (extremely dynamic), plus the subsequent automatic statistical analysis of the content, for example to correlate the level of aggression in language with the actual real-life violence produced by a given group. The Artifical Intelligence Lab has gathered 1.2 Terrabytes of information on terrorist groups using these techniques.
There’s more to tell, but I’ve also got a full inbox of e-mail to read! I’ll write a little more about the conference later on in the week…