Cross-posting the Collection to Wikimedia Commons and the Internet Archive

I’ve said it before and I’ll say it again:  it’s simply not enough to publish assets on our own website—we cannot expect people to come to www.brooklynmuseum.org and we need to be reaching out to communities on the web to engage interest in our collections.   With that, I’m happy to announce that we are now cross-posting our collections to Wikimedia Commons and the Internet Archive.  This is something we’d been wanting to do for a long time, but in order to get here we needed to get through the recent rights project and the records release project.  Now that those two key elements are in place, cross-posting made sense as the next logical step.

brooklyn_wiki_IA_600.jpg

I should probably take an aside at this point and remind everyone that we’ve not always had great success working with the Wikipedia community.  If you remember Wikipedia Loves Art and managed to get through all the blog posts including the four-part lessons learned, then you know just how complicated and painful that project was.  However, the wiki community is one of the most vibrant on the web and, as a community-minded organization, we needed to regroup and figure out a better way of working with these folks—turning our backs and giving up was not an option.  Let’s look at how this is different.

One of the biggest issues with Wikipedia Loves Art was how much work it created for all involved—countless hours from volunteer photographers, hundreds of staff hours to clean up and caption submissions and even more hours from the wiki community to upload all the assets.  This was a project that simply didn’t scale.  So, in this round, our aim was to keep this a collaborative process with much simpler information management. By cross-posting our assets using a programmable bot, we can get the information to the wiki community in a much more efficient way.  Since the wiki is a living, breathing thing that constantly changes, one of the most important parts of this project is creating a second bot that will monitor the changes the wiki community makes to these records and show us how the records are used.  Once we can get a grasp on these collective changes, we can think about ways to integrate that information back into our collection online—it’s this second bot that creates a two-way exchange and allows us to collaborate more effectively with the wiki community.  In addition, this second bot will also write metadata changes to the wiki, so our data does not get stale.  This is a process that must be carefully choreographed so we don’t overwrite community changes, but we think this delicate dance is one that we can learn an enormous amount from.  All that said, the first bot has been created and is happily uploading assets as I write this. The second bot will follow shortly after we’ve gotten everything posted.

By contrast, posting to the Internet Archive was a much simpler process primarily because it’s a one-way dump—they’ve got a clearly documented API and they have a very open structure to work with.  It’s a bit of a blank slate—you can create your own fields, which means you can apply rights information as needed.  We are posting all of our “no known copyright” images there, as well as all images that we’ve licensed with CC-BY-NC.    Wikipedia will be getting fewer assets because they don’t accept Creative Commons licensing that restricts to non-commercial use and retaining commercial rights is still something that the Museum is interested in maintaining.

Posting to these two communities complements what we’ve been doing at The Commons on Flickr.  Seb has an excellent blog detailing why The Commons on Flickr is fundamentally different in nature and these are all things that we agree with.  Here’s to hoping that bot number two helps to bridge some of that gap.

Our bots are uploading now and will be making progress throughout the next several weeks to finish up the initial upload.  To see the progress, check out: Internet Archive | Wikimedia Commons.

Many thanks are owed to Paul Beaudoin for his great work (bot magic, really!) on this project – thanks, Paul.  For all of their help and coordination, thanks are also owed to Maarten Dammers, Richard Knipel, and Liam Wyatt on the Wikipedia front; Alexis Rossi, George Oates and Yolanda King at the Internet Archive.  Cheers!

Author profile

About Shelley Bernstein

Shelley is the Chief of Technology at the Brooklyn Museum where she works to further the Museum's community-oriented mission through projects including free public wireless access, web-enabled comment books, projects for mobile devices and putting the Brooklyn Museum collection online. She is the initiator and community manager of the Museum's initiatives on the social web. She organized Click! A Crowd-Curated Exhibition, Split Second: Indian Paintings, and GO: a community-curated open studio project. In 2010, Shelley was named one of the 40 Under 40 in Crain's New York Business and she's been featured in the New York Times. She can be found biking to work or driving '74 VW Super Beetle in Red Hook, Brooklyn with her dog Teddy. ::contact::
Filed under: Technology
Tagged: , , , , ,
Bookmark the permalink

10 Responses to Cross-posting the Collection to Wikimedia Commons and the Internet Archive

  1. emijrp says:

    Great. A question, how many images are going to be uploaded to Wikimedia Commons? Thanks for your effort. Regards.

  2. Paul says:

    Hi emijrp
    At writing we have 5,157 primary object images and 4,354 Library & Archives images queued for upload.

  3. GerardM says:

    Thank you for this wonderful news. I have been so bold and blogged about it :) http://ultimategerardm.blogspot.com/2010/04/brooklynmuseum-finds-its-w ay-to-work.html
    Thanks,
    GerardM

  4. ammeveleigh says:

    Could you explain in a little more detail please about how you envisage the workflow for the second bot? I’m particularly interested in the ways in which organisations are starting to revise their own catalogues (and other resources hosted by the organisation) with community-sourced information. When you say “integrate that information back into our collection online” am I correct in thinking you mean you will be incorporating revisions from wikimedia directly into records on http://www.brooklynmuseum.org? And then copying these changes back over to wikimedia, without obliterating any further user-changes made in the meantime? Will you be taking any steps to verify/authenticate user alterations or scan them for ‘usefulness’, or does the ‘crowd’ do this for you?

  5. Hi Alexandra,

    The answer to your first two questions is yes. One of the interesting things about this project, to me at least, is the idea that we don’t know what kinds of changes are going to be made. In this situation, we know what we published, so we can programatically go in and look at what changed. That’s step one: to see which records are changing and what those changes are and to see how the records are being used to illustrate articles. This will likely be released as a project in our collection labs area. The labs area was setup for experimentation, so this is a perfect project for it.

    What I can’t answer yet is your last question. We need to get the data, analyze it and then make some decisions about further integration into the collection records. I will say, our experience with user-generated corrections has been stellar on the Flickr Commons, but we’ve been working with that community for many years and know the ropes and what to expect and we’ve established workflow around that change process. That’s what we are trying to do today with Wikimedia – get started with the community, see what happens and then figure out how best to adjust internally.

    In the meantime, so we don’t have stale records on the wiki, bot 2 will push our metadata changes to wikimedia on a regular basis, but it will be respectful to not overwrite community changes. Likely, it will push the change automatically if we have not seen a community change. If the bot sees the field it is trying to update has been changed on the community side, then it will be either flagged for human review or our data will somehow be appended and a notation made. Not sure exactly, but this next stage of the bot process is going to be something we work with the wiki community on so they are happy with how we are making updates to records.

    As with everything, I’ll be posting back here as we learn things and will publish lessons learned along the way.

  6. Mary Harrsch says:

    Shelley, I have a large archive of images on Flickr licensed with Creative Commons non-commercial attribution share alike and plan to upload 800X600 derivatives to Wikimedia Commons for all uses with attribution while preserving my commercial rights for my high-resolution originals. Would this scenario work for the Brooklyn Museum’s image collection? I’ve found the most time consuming part of the Wikimedia upload process is the assignment of categories. I assume your bot was written specifically for your project. Do you know of any bot software available to the public that could be used for batch uploading to Wikimedia?

  7. Hi Mary,

    You should contact one of the wikimedians I mentioned in the post – they should be able to help you. As for the bot, yes, we wrote it specifically for us. I don’t know of others, but I think the wiki folks could help with that as well.

  8. Jonathan says:

    Artists’ books? Like mine:

    http://library.brooklynmuseum.org/record=b624893~S2

    What happens to them? Are they getting photographed and added to the collection?

  9. Hi Jonathan,

    Contemporary works still fall under copyright, so those are not being cross-posted.

  10. Elitre says:

    Hi Shelley :)
    Is there a chance that http://www.brooklynmuseum.org/opencollection/labs/whereinthewiki.php can be updated anytime soon?
    Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

3 Reactions

  1. Pingback: Establishing Trust in Archives Online « Around the World in 80 Gigabytes

  2. Pingback: Museu Picasso Barcelona » Blog Archive » Museos - Wikimedia: resumen del encuentro en Museums and the Web

  3. Pingback: Brooklyn Museum: Community: bloggers@brooklynmuseum » Where in the Wikiverse is the Brooklyn Museum?