Download 27 years of Ars Electronica Prix submissions: prixdata.zip (1.5MB zipped, 15MB unzipped).
This data was pulled from a public-facing Django endpoint:
curl -o prixdata.json "http://archive.aec.at/prixdata/?iDisplayLength=49119&iDisplayStart=0&sEcho=0"(It can take a few minutes for the server to prepare the data.)
After downloading, some metadata was removed:
- The number of submissions each year.
- Some HTML describing a page that displays this data.
- The total number of submissions.
Beyond this, it has not been santized or processed, and features a bunch of HTML meant for formatting on the main search page. Looking at the data, it seems to be formatted as arrays with four values:
<b>\d{4}</b>Corresponding to the year. 2014 only has the winners right now, but every other year has all submissions.- Authorship information. Multiple authors are usually separated by commas, but sometimes by slashes or other markers, due to the inconsistency it is probably the result of free text entry. Institutional affiliation is followed by a | pipe, which appears to be more consistent.
- Title of the work. If the work was a winner, it is surrounded by HTML bold tags, and includes a link to
/showmode/prix/?id=\d+corresponding to the ID of the piece. Other pieces are also available at /showmode/prix but do not appear to have any additional information besides what is presented in this JSON file. - The entry category.
For the quickest way to work with this data, consider node and some command line tools:
$ node
> var data = require('./prixdata.json');
> data[0][1];
'Nicolas Deveaux'
> var output = '';
> data.forEach(function(piece) { output += piece[3] + '\n'; })
> var fs = require('fs');
> fs.writeFile('output.txt', output);
$ sort output.txt | uniq -c | sort
117 Visionary Pioneers of Media Art
153 Media.Art.Research Award
312 World Wide Web
688 [the next idea]
1083 Net Vision
1199 Net Vision / Net Excellence
1299 .net
1516 cybergeneration - u19 freestyle computing
1781 Computeranimation
2362 Digital Communities
2722 Hybrid Art
2724 Computermusik
2798 Computer Animation / Film / VFX
4221 Computer Animation / Visual Effects
4709 u19 - freestyle computing
5688 Computergraphik
7447 Interactive Art
8300 Digital Musics & Sound Art
The data appears to be pulled from a psql database at 90.146.8.4 called webarchive, but it seems to be meant for internal use only.