VIAFbot Debriefing
Wednesday, November 28th, 2012 by MaxShortly after reaching the 1/4 million edits milestone VIAFbot finished linking Wikipedia biography articles to VIAF.org. Examining the bot’s logs reveals telling statistics about the landscape of Authorities on Wikipedia. We can now know how much linked authority data is on Wikipedia, it’s composition, and the similarities between languages.
First, let’s understand the flow of the bot’s job. With VIAFbot I sought to reciprocate the links from VIAF.org to Wikipedia, which were algorithmically matched by name, important dates, and selected works. Therefore it started by visiting all the Wikipedia links that existed on VIAF.org. Note that owing to the delay between when the links were created and now, some of the pages had been deleted or merged (Fig. 1 orange region). For the rest of the set-up it utilized German Wikipedia which has focused a lot on their authorities data. VIAFbot also loaded all available equivalent German Wikipedia articles to our English matches, the “interwiki link” in Wikipedia parlance.
Next VIAFbot searched for the equivalent structured-data Authority control, and Normdaten templates to see what preexisting authorities data those pages held. German Wikipedia shone with 92,253 Normdaten templates (Fig 1. purple region), 74,864 had the VIAF paramater filled (Fig. 1 pink region), compared to English Wikipedia’s mere score of 9,034 templates with 770 VIAF IDs.
The program then compared the VIAF IDs supplied by English Wikipedia, German Wikipedia, and VIAF.org, although not always were all three sources present. While two or more sources didn’t conflict, VIAFbot wrote the VIAF ID to the English Wikipedia page. If a conflict was found, then the bot noted it for human inspection on Wikipedia along with which sources conflicted. One statistic that was telling was how often the different sources disagreed with one another. These disagreement rates were surprisingly similar, but German Wikipedia seemed to disagree marginally less with VIAF.org at 11.3% compared to English’s 15.9% (Fig 2.)
In the noncontroversial non-disagreement cases, of which there were 254,678, there were still some errors found of a different variety. Even though there was no disagreement among the sources, and probably in the instances in which there was only the VIAF.org source, the wrong VIAF number was written. Some very dedicated Wikipedians took to reporting these errors, and VIAF.org will incorporate those corrections. That is the power of crowdsourcing refining algorithmic accuracy.
The question still remains of how much these links being used? Google analytics on the VIAF.org site, can help answer that. German Wiki was the largest referrer to Wikipedia as late as September 2012. VIAFbot started editing in October, and the effect was immediately tangible – soon gaining poll position and then doubling total referrals (Fig. 3). It must be said though that this level of viewership may not be sustained as the “curiosity clicks” of Wikipedians being notified of changes through their watchlists starts to fade.
Still, don’t doubt the usefulness of the project. For instance we received this email from John Myers of Union College in Schenectady NY,
”I had an Arabic name to enter into a record as part of a note, and I wasn’t confident about the diacritics. So, I look in the authority file to temporarily download it, copy the form of the name, and then move on. Couldn’t find the name in OCLC. Look in Wikipedia under his common name – bingo. Even better, Wikipedia has a link to VIAF, double bingo! With the authorized form from VIAF, I could readily find the record in OCLC (I was tempted to copy the name form directly from VIAF, but didn’t want to push my luck.) The miracles of an interconnected bibliographic dataverse!”
VIAFbot had written the link for ‘Aziz ‘Aku ak-Misri only a few days prior.
The principal benefit of VIAFbot is the interconnected structure. Recognizing this, other Wikipedias (Italian and Swedish) have been in contact and asked for the same on their Wiki’s. Yet to truly be interconnected the next step forwards is to integrate VIAF IDs not into any one Wikipedia, but into the forthcoming Wikidata, a central database for all Wikipedias across languages. Fortuitously, the pywikidata bot framework is stabilizing, and I’m in need of a new project now.
Without confusion,
Max Klein (@notconfusing)






