Skip to content

Instantly share code, notes, and snippets.

@kylemcdonald
Last active September 8, 2015 19:54
Show Gist options
  • Select an option

  • Save kylemcdonald/8d8e385274a32f888d9f to your computer and use it in GitHub Desktop.

Select an option

Save kylemcdonald/8d8e385274a32f888d9f to your computer and use it in GitHub Desktop.
Instructions for downloading and manipulating some raw dumps from the Ars Electronica Prix archive server.

Ars Prix Data Dump

Download 27 years of Ars Electronica Prix submissions: prixdata.zip (1.5MB zipped, 15MB unzipped).

This data was pulled from a public-facing Django endpoint:

curl -o prixdata.json "http://archive.aec.at/prixdata/?iDisplayLength=49119&iDisplayStart=0&sEcho=0"

(It can take a few minutes for the server to prepare the data.)

After downloading, some metadata was removed:

  • The number of submissions each year.
  • Some HTML describing a page that displays this data.
  • The total number of submissions.

Beyond this, it has not been santized or processed, and features a bunch of HTML meant for formatting on the main search page. Looking at the data, it seems to be formatted as arrays with four values:

  1. <b>\d{4}</b> Corresponding to the year. 2014 only has the winners right now, but every other year has all submissions.
  2. Authorship information. Multiple authors are usually separated by commas, but sometimes by slashes or other markers, due to the inconsistency it is probably the result of free text entry. Institutional affiliation is followed by a | pipe, which appears to be more consistent.
  3. Title of the work. If the work was a winner, it is surrounded by HTML bold tags, and includes a link to /showmode/prix/?id=\d+ corresponding to the ID of the piece. Other pieces are also available at /showmode/prix but do not appear to have any additional information besides what is presented in this JSON file.
  4. The entry category.

For the quickest way to work with this data, consider node and some command line tools:

$ node
> var data = require('./prixdata.json');
> data[0][1];
'Nicolas Deveaux'
> var output = '';
> data.forEach(function(piece) { output += piece[3] + '\n'; })
> var fs = require('fs');
> fs.writeFile('output.txt', output);
$ sort output.txt | uniq -c | sort
 117 Visionary Pioneers of Media Art
 153 Media.Art.Research Award
 312 World Wide Web
 688 [the next idea]
1083 Net Vision
1199 Net Vision / Net Excellence
1299 .net
1516 cybergeneration - u19 freestyle computing
1781 Computeranimation
2362 Digital Communities
2722 Hybrid Art
2724 Computermusik
2798 Computer Animation / Film / VFX
4221 Computer Animation / Visual Effects
4709 u19 - freestyle computing
5688 Computergraphik
7447 Interactive Art
8300 Digital Musics & Sound Art

Notes

The data appears to be pulled from a psql database at 90.146.8.4 called webarchive, but it seems to be meant for internal use only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment