Skip to content

Instantly share code, notes, and snippets.

@ChuckO
Created April 5, 2012 20:26
Show Gist options
  • Select an option

  • Save ChuckO/2313829 to your computer and use it in GitHub Desktop.

Select an option

Save ChuckO/2313829 to your computer and use it in GitHub Desktop.
Find and clean duplicate files (Ruby)
#!/usr/bin/env ruby
require 'digest/md5'
# argument processing -- goes through Dir[], so can use '**/*' etc.
ARGV[0] ||= '.'
# build list of filenames, rejecting symlink, dirs & 0 len files
filenames = ARGV.collect{ |dir| Dir[File.directory?(dir) ? "#{dir}/*" : dir] }.flatten
filenames.reject!{ |fn| File.symlink?(fn) || File.directory?(fn) || File.size?(fn).nil? }
# create a hash of filename arrays indexed by md5 hashes
h_files = {}
filenames.each do |fn|
md5 = Digest::MD5.hexdigest(File.read(fn))
files = h_files[md5] || (h_files[md5] = [])
files << fn
end
dup_arrays = h_files.values.select{ |a| a.count > 1 }
dup_arrays.each do |dups|
keep = dups.shift
dups.each{ |fn| puts "rm #{fn}" if `cmp #{keep} #{fn}`.empty? }
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment