Skip to content

Instantly share code, notes, and snippets.

@dfkoz
Last active August 29, 2015 14:02
Show Gist options
  • Select an option

  • Save dfkoz/4c94025ff0d40757f442 to your computer and use it in GitHub Desktop.

Select an option

Save dfkoz/4c94025ff0d40757f442 to your computer and use it in GitHub Desktop.
#!/usr/bin/env ruby
require 'nokogiri'
require 'open-uri'
require 'net/http'
open('results.txt', 'wb') do |file|
(1 .. 94000).step(50).to_a.each do |start|
dir_url = 'http://www.imdb.com/search/title?at=0&sort=num_votes,desc&start=%s&title_type=tv_series' % start.to_s
puts dir_url
# Open the target page
doc = Nokogiri::HTML(open(dir_url))
# Get all of the elements that contain TV data
rows = doc.css('.detailed')
# Get name, genre, rating, year
rows.each do |row|
name = row.css('.title').css('a')[0].text
year = row.css('.title').css('.year_type').text
rating = row.css('.title').css('.value').text
genre = row.css('.genre').text
file.puts(name + ' ||| ' + year + ' ||| ' + rating + ' ||| ' + genre)
puts name + ' ||| ' + year + ' ||| ' + rating + ' ||| ' + genre
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment