Archive for the 'Wikipedia' Category

Sex Ratios in Wikidata, Wikipedias, and VIAF

Monday, May 13th, 2013 by Max

Last week I wrote about the ‘rope bridge’ between Wikidata and VIAF, and the new research it would afford. Today I bring you a sample of that research. I am investigating the sex associated with different Wikipedia Biography Articles for two reasons. Firstly, the Properties “Sex” and “VIAF” are two of the top 10 most used Wikidata Properties, with Sex at 587,312 items tagged, and VIAF with 301,763 (and rising, VIAFbot hasn’t finished scraping all languages yet). VIAF independently records sex per VIAF item, which gives us two comparable datasets. Secondly, after the so-called “Categorygate” piece in the New York Times I dug into Wikidata’s Sex Property and wanted to shed some light on the model currently in use.

Currently the Wikidata Property for Sex states:

Sex for humans, should be one of male, female , intersex, or the special “unknown” value

Finding this to be a rather rigid view of the world I started discussing it on the Discussion Page as per protocol. Of note, on the other hand is how VIAF records “gender” not “sex.” The current VIAF data model similarly limits values to male, female, or unknown but a change to a more nuanced model is planned for June. Its worth reminding that VIAF is populated with data from the many authority files it aggregates. One underlying authority file, which has a more nuanced view on this recording, is the Library of Congress Control Number (LCCN). The LCCN will record many “sexes” for a specific person with accompanying dates of validity. This at least shows that there are better ways of recording sex – if its necessary to record it at all – which prompts me to invite your input on the Wikidata Discusison Page about better ways to record sex. With that said, lets dig into some graphs. (Click to see larger versions.)

Sex Ratios by Language

The method used to perform this visualization is to view all the Wikidata items with Property:Sex and then look at the inter-language link section of the item to see which languages have articles relating to this item. Dividing along the lines of language, we can find sex ratios per language. Below shows each language with more than 1,000 articles tagged with sex data, sorted by the percentage of Female values.

Wikidata Sex Ratios By Language

Wikidata Sex Ratios By Language, Minimum 1000 Items

If you’re not well versed in Wikidata’s use of language codes, you can look them up. And if you’ve never browsed the winning and losing htwiki and tlwiki, the Haitian and Tagalog Wikipedias, then you can peruse the list containing minimum 10,000 Items with Sex Data.

WikidataSexRatiosByLangAlone_Min10000

Wikidata Sex Ratios By Language, Minimum 10000 Items

Two notable things arise here. Firstly, Chinese Wikipedia is seemingly the most progressive. Secondly The Intersex category fails to score a single pixel of recognition. In fact the Wikipedia with the highest ratio of Intersex values – as determined by Wikidata – is Korean Wikipedia, but at just 0.0078%.

Data Caveats

Is this data reliable? A lot of it was imported from the German and other major Wikipedias. That can be a problem, because for any given Wikipedia Language there exists articles that have no linked equivalents in other languages. There may very well be Wikipedias with more or less skewed sex ratios, but they haven’t migrated their sex data to Wikidata, or they have no equivalent article in a language which has migrated its sex data. Lets see which languages have the most articles associated with sex data, of those above 1,000.

WikidataSexTotalByLangAlone_Min1000

Total Number of Wikidata Items Tagged with Sex

Unsurprisingly we get a very Western view of the world. But wait, there are other data sources to corroborate against; that was one of the points of VIAFbot importing VIAF IDs into Wikidata. Let’s imagine an enhanced version of Wikidata, that uses VIAF sex data in addition to what’s currently tagged, using that VIAF ID bridge. I ran simulation of such enhanced version of Wikidata, but before we look at it, lets understand VIAF’s own biases.

Introducing VIAF

VIAF IDs have gender info derived from National Library files. There’s hope this may give us a different picture because VIAFs may be ever slightly less severe in its skew, although looking at its list of contributors reveals also a Western bias. Of ~24 Million VIAF records (not all about people)  1,299,396 have gender “male,” and 418,394 have gender “female.” This comes out to a percentage of 24.35% female.  (Unfortunately VIAF doesn’t note directly where LCCN has a more nuanced view, but it can be determined by crawling the RDF link to LCCN’s Marc XML which I explain later.) Now to compare the Wikidata and VIAF-enhanced-Wikidata sex ratios we overlay the two graphs. Here wherever you see light green that means that Wikidata’s data alone gave a higher female ratio, and where you see red, VIAF-enhanced-Wikidata data gives a higher female ratio.

Comparison of Wikidata Sex Ratios with and Without VIAF by Language, Minimum 1000 Items with Sex

Comparison of Wikidata Sex Ratios with and Without VIAF by Language, Minimum 1000 Items with sex

Reassuringly VIAF and Wikidata only disagreed on 0.0024% of 91,406 matches. There were seven cases where LCCN did have with multiple sexes and qualifying dates. Furthermore there are 52,407 cases where VIAF has Sex data but Wikidata does not. This might be a good juncture to import that data, if the Wikidata community wants.

Conclusions

There are articles in Wikidata which are not currently tagged with sex information, but whose sex information can be programmatically determined. There is some indication that tagging more articles would tend to produce more even sex ratios in Wikidata. If that were true, it would mean that “male” articles are more likely to be associated with sex data, though we cannot be positive about that claim. Finally recall that Wikidata’s data model for sex could also use some attention, and you the community are the instruments for that.

Software used

I wrote some simple scripts to crawl Wikidata and compare it VIAF and LCCN, its on  GitHub, and I also modified code from the Wikdiata community for parsing dumps which I plan to contribute back.

Did you find anything confusing? Leave your comments below or find me online. On twitter I’m @notconfusing.

The Ropebridges: Authority Control in Wikidata

Thursday, May 9th, 2013 by Max

You may recall that our Wikipedia reciprocal linking robot “VIAFbot” finished adding Authority Control to more than a quarter of a million (English language) Wikipedia articles, but what was the utility? Five months on, that question has been answered. Luckily, and unsurprisingly, other netizens proved additional Wikipedia -> VIAF linking utility. Unanticipated reuse is the magic of collaborative and open datasets, and four such examples highlight the benefits of Library data in Wikipedia.

First was John Mark Ockerbloom’s Forward To Libraries which proposes “find in a Library” boxes in Wikipedia pages. The idea is compelling: facilitate automatic searches in your preferred library site on the topics of Wikipedia articles — one option utilizes VIAF IDs.

Similar look-up facilities were created by Owen Stephens and Thomas Meehan conducting pointed inquiry at the British Library site and other UK Academic resources. Stephens’ contemporaneous finds authors sharing their birth year with the Wikipedia page in question. Meanwhile Meehan’s bookmarklet will funnel you into relevant pages linked by VIAF at UCL’s Explore, and COPAC.

VIAF connections can also pave the way for new scholarly research. A team from Vienna University of Technology, released a paper that visualized Art History networks of Wikipedia, through VIAF IDs, and then ULAN. Here you can see the proportion Art History Subjects in Wikipedia, displayed on two dimensions derived from the ULAN connection: time and nationality.

All of this is to say that VIAF data in English Wikipedia can as a very good ropebridge that allows for reuse, or recombination. The idea of a ropebridge is apt because the connection is somewhat shaky, at the moment it’s free text, semi-structured data that can be changed by anybody, but that doesn’t mean that the chasm isn’t being crossed.

Can you spot the weakness in all this collaboration though? We focused our first effort on English Language Wikipedia. The Germans, to their credit, have just as many VIAF IDs in their Wikipedia. The Italians copied the English Language data. However these separate efforts are not scalable to all 285 Wikipedias, nor does it allow all 285 Wikipedias to collaborate on the language-neutral VIAF Unique Identifiers.

Fortunately there is a solution, and that solution is Wikidata. Wikidata is first new Wikimedia Project since 2006, and will do three things. It will organize inter-language links into a central database (inter-language linking before was arduous and asymmetric). It will provide a central store of Semantic Data from the Wikipedia articles. And in the future it will be able to query that semantic data. Want to know more about Wikidata? Then look up Wikidata on Wikidata (obviously?!).

    Now for a surprise – I’ve just finished migrating English Wikipedia’s VIAF data to Wikidata, and German, French, Italian, and Japanese datasets are in progress. (Code on Github). It takes about two weeks to inspect, clean, and copy the data over from each Wikipedia. I’ll post a full statistical breakdown once all the languages have finished. For now I’ll just say that the Wikidata VIAFbot is also migrating LCCN, GND, BNF, and SUDOC Identifiers as well as integrating for the first time ISNI IDs. At the time of this writing it records 750,000 edits and counting.

    What does VIAF in Wikidata look like you ask? All pages about encyclopedic concepts are known as “Items” in Wikidata parlance, so lets inspect the item for Germaine Greer.

    wikidata_claims

    We first see all the Semantic Data Wikidata has about this topic. Each modicum of data is known as a “Claim” in Wikidata, is a triple,  and is structured as [this page] [property] [value]. You can see that [Germaine Greer] [GND (read: "is a " according to the German National Library)] [Person], and that [Germaine Greer] [is of sex] [female]. You can also see here that she’s got a lot of identifiers associated with her thanks to VIAFbot, which has sourced where it found the original VIAF ID. Now lets draw our attention to the bottom of the page to understand the impact.

    wikidata_iwlinks

    This Wikidata page is associated with articles in 48 other languages. Each of those articles can capitalize on the semantic data stored above. That’s the beauty of Wikidata. Which now means that all of the data reuse cases that previously only worked for the English language Wikipedia, will now work for all of them. Austrian researchers can inspect Art History biases of not just English Wikipedia, but of dansk, Ελληνικά, हिन्दी, interlingua, Runa Simi, 中文, etc. etc.  That’s one of the starting reasons why it’s important to have Authority Control in Wikidata. There are of course more directions than one to travel across a ropebridge. Leading data-mules of bibliographic information across from VIAF into Wikidata is next.

    Wikipedia Analytics Engine

    Monday, January 14th, 2013 by Max

    Wikipedia has its own data-structure in templates with parameters — if you are not familiar with Wikipedia templates, an example is “infoboxes,” which show up as fixed-format tables in the top right-hand corner of articles. Templates, and the metadata they contain, have been exploited for research in the past, but I’ve wanted to create a toolchain that would connect Wikipedia data and library data. I also wanted to be able to include a few more features than the standard Wikipedia statistics engines. For instance (a) working over all pages in a MediaWiki dump to analyze the differences between pages that do and don’t include certain templates (b) take into account what I term subparameters of of templates, and (c) do it all in a multithreaded way. Here is an early look at some analysis which may shed light on the notion of systemic biases in Wikipedia.

    Birthdates

    Of all the biases Wikipedia is accused “recentism” has seemed to me one of the more subtle. To investigate I wanted to compare the shape of the curve of global population to that of birthdates of biography articles on Wikipedia. For data, I looked in templates, specifically English Wikipedia’s {{Persondata}} for parameter DATE OF BIRTH, and German Wikipedia’s {{Personendaten}} for the parameter GEBURTSDATUM. For the comparison of Global Population I used UN data. In both cases you can see that the Wikipedia curves are below global population until about 1800, and outpace population in growth thereafter. These more exponential curves corroborate Wikipedia leaning covering more recent events more heavily. Curiously both Wikipedia lines peak at about 1988 and then all but disappear. If you want a biography article on Wikipedia apparently it helps to be 25 years old.

    Occurences of Birth Dates in English and German Wikipedia Compared to Global Population

    Simple Metrics

    This is quite a simple analysis. One of the chief benefits of working with OCLC is that there is a lot of bibliographic data to play with, so lets marry the two sources: Wikipedia template data and OCLC data. For this section I queried all the Wikipedia pages from December 2012 for all the citation templates, and extracted all the ISBNs and OCLC numbers.

    One way to characterize the cited books is audience level, derived from WorldCat holdings data. Audience level is expressed as a “a decimal between 0.01 (juvenile books) and 1.00 (scholarly research works).” Taking simple mean averages of audience level across all citations gives 0.47 on English Wikipedia. In German it’s 0.44. If we plot the histograms of each, we get moderately normal curves, that actually even tend to skew left.

    Audience Level English Audience Level German

    Is Wikipedia stuffed with incomprehensibly dense knowledge? Maybe, but it’s citations aren’t necessarily.

    Subject Analysis

    Another bias claim lodged against Wikipedia is that content is heavily concentrated towards certain subjects. Is the same true for its citations? Every Wikipedia article could have any number of ISBNs or OCLC numbers, (see figure below). In FRBR terms, these identifiers relate to manifestations so using WorldCat they were clustered into works, at the expression level. And every work is about any number of subjects. Here I used the FAST subject headings, which are a faceted version of Library of Congress Subject Headings.

    Subject Anaylsis Procedure for Wikipedia

    Subject Analysis Procedure for Wikipedia

    Then I totaled the number of citations on Wikipedia within each subject, creating a list of subjects with their respective citation frequency. Utilizing that list here is a word-cloud visualization of Wikipedia’s 100 most cited subjects, inferred through the subjects assigned to the works cited.

    A world cloud of the FAST Subject Headings of the most cited Books in Wikipedia

    A world cloud of the FAST Subject Headings of the most cited books in English Wikipedia

    There is a large preponderance of subjects that confirm subcultures that Wikipedia is noted for its bias. Politics, Military History, Religion, Math and Physics,  Comics and Video Games, and Mycology. At least of they are going to be overrepresetented in general, they should be well cited.

    Below is the same algorithm applied to a different Wikipedia – can you guess the language?  Quite funny to see courts, administrative agencies, and executive departments with such prominence.

    dewiki-fast-word-cloud

    That should give just a glimpse as to the range of avenues of inquiries available from being able to deeply search and connect Wikipedia template parameters with library data. Any special requests for specific queries?

    Wikily yours,

    Max

    twitter: notconfusing

    OCLC Research 2012: Wikipedia and Libraries

    Tuesday, December 18th, 2012 by Merrilee

    At the end of 2012, we are doing a mini series of blog postings to reflect on some of the year’s high points. This posting is the first in the series. Watch for updates!

    2012 has been a great year for me, because I’ve had the privilege of seeing a project I’ve been passionate about for some time come to life — exploring the connection between Wikipedia and Libraries. Around this time last year I began making connections with the Wikipedia GLAM community, and exploring the idea of OCLC Research hosting a Wikipedian in Residence. We were fortunate enough to receive organizational support for this idea, and with help from folks in the Wikipedia community, craft a position description, and bring Max Klein into our team in OCLC Research. Having Max working with us has been terrific and not just because of his Wikipedia skills.

    Since we’ve had Max on board, we attended Wikimania, have held not one but two Wikipedia Loves Libraries events, held two successful webinars attended by more than 500 librarians, done countless videos (okay, I counted them up and there are at least 8). And then there was the Open Access Wikipedia Challenge on P2PU. Oh, and VIAFbot, which brought authority control templates and VIAF links to thousands of articles on the English language Wikipedia.

    Earlier this month, I presented a breakout session at CNI (along with Sara Snyder, from the Archives of American Art) on the connection between Wikipedia and Libraries. The session was well attended but more importantly, there was a lot of interest and excitement about the connection between Wikipedia and libraries. I’m very pleased that Max’s term has been extended, so he can help us explore some of those possibilities. So as we close out a successful and productive year, I look forward to another year of highlights in this area.

    Want to know more? View all the HangingTogether blog posts on this topic!

    VIAFbot Debriefing

    Wednesday, November 28th, 2012 by Max

    Shortly after reaching the 1/4 million edits milestone VIAFbot finished linking Wikipedia biography articles to VIAF.org. Examining the bot’s logs reveals telling statistics about the landscape of Authorities on Wikipedia. We can now know how much linked authority data is on Wikipedia, it’s composition, and the similarities between languages.

    First, let’s understand the flow of the bot’s job. With VIAFbot I sought to reciprocate the links from VIAF.org to Wikipedia, which were algorithmically matched by name, important dates, and selected works. Therefore it started by visiting all the Wikipedia links  that existed on VIAF.org. Note that  owing to the delay between when the links were created and now, some of the pages had been deleted or merged (Fig. 1 orange region). For the rest of the set-up it utilized German Wikipedia which has focused a lot on their authorities data. VIAFbot also loaded all available equivalent German Wikipedia articles to our English matches, the “interwiki link” in Wikipedia parlance.

    Next VIAFbot searched for the equivalent structured-data Authority control, and Normdaten templates to see what preexisting authorities data those pages held. German Wikipedia shone with 92,253 Normdaten templates (Fig 1. purple region), 74,864 had the VIAF paramater filled (Fig. 1 pink region), compared to English Wikipedia’s mere score of 9,034 templates with 770 VIAF IDs.

    Figure 1.

    The program then compared the VIAF IDs supplied by English Wikipedia, German Wikipedia, and VIAF.org, although not always were all three sources present. While two or more sources didn’t conflict, VIAFbot wrote the VIAF ID to the English Wikipedia page. If a conflict was found, then the bot noted it for human inspection on Wikipedia along with which sources conflicted. One statistic that was telling was how often the different sources disagreed with one another. These disagreement rates were surprisingly similar, but German Wikipedia seemed to disagree marginally less with VIAF.org at 11.3% compared to English’s 15.9% (Fig 2.)

    Figure 2.

    In the noncontroversial non-disagreement cases, of which there were 254,678, there were still some errors found of a different variety. Even though there was no disagreement among the sources, and probably in the instances in which there was only the VIAF.org source, the wrong VIAF number was written. Some very dedicated Wikipedians took to reporting these errors, and VIAF.org will incorporate those corrections. That is the power of crowdsourcing refining algorithmic accuracy.

    The question still remains of how much these links being used? Google analytics on the VIAF.org site, can help answer that. German Wiki was the largest referrer to Wikipedia as late as September 2012. VIAFbot started editing in October, and the effect was immediately tangible – soon gaining poll position and then doubling total referrals (Fig. 3).  It must be said though that this level of viewership may not be sustained as the “curiosity clicks” of Wikipedians being notified of changes through their watchlists starts to fade.

    Figure 3. Referral traffic to VIAF.org.

    Still, don’t doubt the usefulness of the project. For instance we received this email from John Myers of Union College in  Schenectady NY,

     ”I had an Arabic name to enter into a record as part of a note, and I wasn’t confident about the diacritics.  So, I look in the authority file to temporarily download it, copy the form of the name, and then move on.  Couldn’t find the name in OCLC.  Look in Wikipedia under his common name – bingo.  Even better, Wikipedia has a link to VIAF, double bingo!  With the authorized form from VIAF, I could readily find the record in OCLC (I was tempted to copy the name form directly from VIAF, but didn’t want to push my luck.)  The miracles of an interconnected bibliographic dataverse!”

    VIAFbot had written the link for ‘Aziz ‘Aku ak-Misri only a few days prior.

    The principal benefit of VIAFbot is the interconnected structure. Recognizing this, other Wikipedias (Italian and Swedish) have been in contact and asked for the same on their Wiki’s. Yet to truly be interconnected the next step forwards is to integrate VIAF IDs not into any one Wikipedia, but into the forthcoming Wikidata, a central database for all Wikipedias  across languages. Fortuitously, the pywikidata bot framework is stabilizing, and I’m in need of a new project now.

    Without confusion,

    Max Klein (@notconfusing)

     

     

     

     

     

     

     

    Open Access Wikipedia Challenge on P2PU

    Tuesday, October 23rd, 2012 by Max

    It’s been traditional recently to hold Wikipedia Loves Libraries events during Open Access Week, and I fully support the practice. What’s also been traditional, in a way that I wanted to change, was the editathon format for those events. After scrunching my mind to brainstorm and consulting with other Wikipedia Loves Libraries volunteers on ways of experimental trainings and celebrations, we came up with Open Access Wikipedia Challenge.   The challenge is to embed media that was harvested from Open Access journals in Wikipedia, and we created a special edition barnstar for completing it. This challenge is totally friendly to newbies and librarians as it includes over 1 hour total of six screencast tutorial videos that explain every detail right from account creation, to Wikipedia’s transclusion, and each module has waypoint challenges. At the time of this writing already nine challengers have accepted.

    Below is the introductory video which is hosted on youtube, and the challenge is on P2PU.

    Max Klein
    twitter: @notconfusing 

    Wikipedia Loves Libraries — how you can participate?

    Monday, October 15th, 2012 by Merrilee

    Over the summer, our Wikipedian in Residence, Max, did two webinars that gave librarians a glimpse behind the curtain of Wikipedia. One of the things he highlighed in those webinars was Wikipedia Loves Libraries, a Wikipedia-conceived initiative to bring libraries (and archives) closer together. We were heartened to learn that at least two of the events that are planned (at the Multnomah County Library on October 27th and West Hollywood on November 17th) were at least in part inspired by our webinars! There are also events planned at the New York Public Library, Princeton University, the Smithsonian, George Washington University, Indiana University, and elsewhere — you can check out the full list here.

    What about you? Are you interested in hosting an event and partnering with local Wikipedians? There is a handy form to get you started, and lots of good models online. And if you want some handholding or have questions, don’t hesitate to get in touch.

    You can also watch the webinars if you are intrigued.

    Cousins: The Bookworm and Wikignome

    Wednesday, October 3rd, 2012 by Max

    As we all know, the best you can hope of a meeting is not a conclusion, but a chuckle at a statistical oddity. When OCLC’s Top Library Loans List came out, such a positive meeting was had. Upon glancing the pulp fiction (see chart below) I wondered if Wikipedia editors were also driven by such trivia? I turned to Python, R, and article edit histories to find out.

    The top 10 list is such:

    1. Hunger Games by Suzanne Collins
    2. Catching Fire by Suzanne Collins
    3. Mockingjay by Suzanne Collins
    4. Fifty Shades of Grey by E.L. James
    5. Game of Thrones by George R. R. Martin
    6. The Help by Kathryn Stockett
    7. Thinking, Fast and Slow by Daniel Kahneman
    8. Steve Jobs by Walter Isaacson
    9. Quiet: The Power of Introverts in a World That Can’t Stop Thinking by Susan Cain
    10. Dance with Dragons by George R.R. Martin

    Now let’s take a tour through Wikipedia’s history for a feeling for the editors affinities towards these monographs:

    We can tell that there isn’t a lot of similarity between the novels, except that they’ve all experience small peaks within the last year or so. That isn’t surprising, because the list in question is for the most requested inter-library loans for the period of a year starting July 11th 2011. So let’s take a look at how actively edited these books were in that time frame.

    Besides the fact that the relationship here looks a bit exponential, as we’d expect of crowdsourced material, there is another curious correlation afoot. The ordering of the monographs by edits, is remarkably similar to the ranking by loan-requests. Keeping the by-edits ordering, then charting the loan positions we get something reassuringly linear.

    In fact the the Top 6 are exactly predicted. If you were going use  quick-sort inversion counting analysis to compare the closeness of two lists, I believe you get the low count of 2 (correct me if I’m wrong). This indicates that there is a possible correlation between book demand in the library and wikipedia editor interest online. So librarians take note, when deciding on your stock, pre-empt the rush and look to Wikipedia – the Wikignomes have a psychic connection with the bookworms.

    Not confusingly yours,

    twitter.com/notconfusing

    Max Klein, Wikipedian in Residence

    P.S. The code to look at and graph Wikipedia articles is a small project I’ve open sourced, and is available on github. I’ve also built in the functionality to pull stats from a Wikipedia category, which allows for such fun as, looking at the entire edit histories of all the Pulitzer Prize Winners.

    Wikimania 2012: copyright and closing thoughts

    Tuesday, August 14th, 2012 by Merrilee

    This is the last in series of posting on my first Wikimania. I’m (mostly) focusing on the connection between Wikipedia and libraries, and approaching topics thematically, rather than going through the conference in order.

    I was distracted by the Society of American Archivists meeting (which I’ll be blogging about soon!), but I’m back to wrap up Wikimania.

    Wikisource, Wikicommons and the copyright conundrum

    Discussion about IP rights came up in many discussions and presentations at Wikimania (as you would expect with a group so dedicated to increasing access to free knowledge), but the one I found most interesting was an Oxford style debate on the topic “That all Wikimedia projects should have Fair Use, or none of them.” Why is this important? Because Wikimedia is more than just Wikipedia, and has a range of projects which make content available. For example, The National Archives and Records Administration (NARA) here in the United States is contributing to both Wikisource and to Wikicommons (I explained a little about these two Wikipedia “sister” projects in a previous post). Because Wikimedia projects exist in a very international context, contributions to Wikisource and Wikicommons must be very strictly in the public domain or covered under an appropriate licence that renders the materials as “free content” in a similar way (it’s important to note here that licensing that requires attribution is acceptable). Putting materials into either project and claiming fair use is in fact strictly prohibited.

    I know that many institutions will find both Wikisource and Wikicommons to be attractive options, but there are few (U.S. based) institutions that will be able to put most or all of what they have digitized into these projects (NARA may be an exception, as may other government institutions or those who exclusively collect material from the 19th century or earlier.) This is too bad, because otherwise, Wikimedia projects are ideally aligned with the mission and aims of cultural heritage institutions. Still, there is much to collaborate around, so I’m still very excited!

    Conference roundup

    I want to wrap up by giving some of the high points as well as oddities I noted at this conference. The conference was very inexpensive compared to many library conferences (thank you, sponsors!). Registration ranged from $35 to $95, which included morning food and lunch (additionally, there was a reception each evening with some level of food and drink). This is the first conference I’ve attended (with the possible exception of ALA) which was “trending” on Twitter. Thanks to ubiquitous wireless, ample power, and an enthusiastic cadre of Twitterati, the conference stream was useful, and at times overwhelming. All the people I met were amazing. During lunch I was touched when people noticed that I was scanning for a friendly face and invited me to sit with them.

    The conference was not without flaws. On several panels, at least one of the scheduled presenters was not present. Odd to me, everyone presented from their own laptop, rather than consolidating presentations on a single machine. To make matters worse, almost everyone was presenting from a Mac and the majority of them had difficulty shifting displays. This was amusing to me, both because of the Mac’s “intuitive” reputation and also because of otherwise extraordinary tech prowess of presenters. However, the time wasted dorking with technology was considerable. If Raganathan had rules for conferencing, one of them would surely be “save the time of the attendee.”

    Attending Wikimania was a terrific experience and I hope I have the opportunity to attend in the future. As I have said repeatedly, I have been excited about the potential for alliances between libraries and other cultural heritages institutions and Wikipedia / Wikimedia. Attending the conference only cemented my conviction.

    Goodbye Wikimania, see you next time!

    If you want to take a look at some other blog posts summarizing the conference from the LAM perspective, see Ed Summers on Wikimania Revisited and the Biodiversity Heritage Library’s report, Wikimania 2012 & BHL

    Wikipedia and Libraries: The Afterwebinar

    Thursday, August 2nd, 2012 by Max

    At 556 attendees strong the recent OCLC Research Webinars “Librarians are Wikipedians Too” and  ”Wikipedia and Libraries: The Connection” piqued the progressive, exploratory minds of Librarians worldwide. Conviced tech managers at independent research libraries asked for help to jump onto the Commons mass upload bandwagon. Reference Librarians started to dream up combined workshop / editathons, from the explanation of the two.  As well workshops and edithons the webinars outlined the 5 classical points of collaboration between the two communities, and how to forensically evaluate which areas of Wikipedia are fertile for Library linking.

    A webinar is nothing without it’s audience and their questions.  We answered as many as we could at the time, but there were some more difficult questions to answer, which now clear of time restraints, I’ll answer in full.

    Where to go next:

    The answer of where to go next is somewhat of a mantra we hope to impose: “the wiki”.  The Wikipedia Loves Libraries portal is a growing base of related materials, ideas, and links to the subject. We recognize that using a wiki to get help with wikis can be somewhat of a contradiction, and have set up a simple form to get paired with Wikipedians in a more traditional way.

    Unanswered questions from Chat:

    Question from Bob Kosovsky to All Participants (02:54:43 PM):

    Max: WP is 6th most used website; but acc. to visualizations I’ve seen, DPpedia is THE most used data source; can you talk about the implications of DPpedia being the MAIN source of data/information for numerous websites?

    I think you’re referring to this image,

    Linked Open Data

    which shows DBpedia as the center of the Linked Open Data universe. DBpedia is a database of information scraped and infered from Wikipedia. It being this large has the implications that Google searches will be eerily smart, and occasionally possibly wrong. Beyond that it signals that despite some best effort to deride crowdsourcing as untrustworthy, the internet are utilitarian.

    Question from Madeline Wagner to All Participant

    I would like to know more about how “minority” views on a subject are handled : ie the recent article by a scholar who tried to edit the entry on the Haymarket affair.

    This question leads to an advanced and philosophical design choice of Wikipedia. The controversy arond the Haymarkey affair on Wiki (chronicled here) highlights, that Wikipedia is not an encyclopedia of truth but an encyclopedia of proof . That is, by design, the facts that belong on Wikipedia are the ones that can be sourced, and true-but-no-provable statements aren’t valid Wikipedic content. Wikipedia is this way for practical reasons. For a full justfication read the essay “Wikipedia:Truth – A place for minority views.”

    Question from Michele Combs to All Participants (02:58:36 PM):

    Rule of thumb seems to be “no institutional WP accounts,” only individual ones so that there is a single responsible person for each edit; would you advocate permitting creation of institutional accounts for creation/editing so as to make edits more credible/authoritative?

    Let us be pragmatic. It’s highly unlikely that Wikipedia would ever change it’s policy to allow group accounts, because if you are looking to make a user account’s edits more authoritative then we’ve lost the equity granted to anonymous users – a very historic tenet. To achieve a unity and community respect for a library’s editors as whole I’d suggest using a naming scheme in the vein of [name]+[institution]. For instance in my personal life I am User:Maximilianklein but when I edit for OCLC I use User:Maximiliankleinoclc which knots mine and my institution’s reputation.

    Question from Kjerste Christensen to All Participants (02:33:07 PM):

    If your library has a strong focus in a particular area, what about partnering with a WikiProject related to that subject area to look up information or scan media as needed?

    This isn’t really a question at all but a fantastic comment. Click here to view the directory of Wikiprojects.

    And remember — it’s  not confusing http://twitter.com/notconfusing.