Skip to content

Instantly share code, notes, and snippets.

@shujishigenobu
Created August 16, 2012 04:33
Show Gist options
  • Select an option

  • Save shujishigenobu/3366873 to your computer and use it in GitHub Desktop.

Select an option

Save shujishigenobu/3366873 to your computer and use it in GitHub Desktop.
Reading tab-delimited tables, retrieve taxonomy information. It works only with NCBI nr database.
#===
# nr_taxinfo.rb
#
# Reading tab-delimited tables, retrieve taxonomy information.
# It works only with NCBI nr database.
# Three columns, tax_id, scientific_name and common_name, are added.
$infile = ARGV[0]
$blastdb = ARGV[1]
$keypos = (ARGV[2] || 1).to_i
cache = Hash.new
File.open($infile).each do |l|
next if /^#/.match(l)
a = l.chomp.split(/\t/)
target_id = a[$keypos]
# p target_id
unless cache[target_id]
cmd = "fastacmd -d #{$blastdb} -s \"#{target_id}\" -T"
# puts cmd
res = nil
IO.popen(cmd){|io| res = io.read}
# puts res
records = res.split(%r{\n\n})
r = records.first
## analyze only first record
## TODO: should be analyzed multiple records
if /^NCBI/.match(r)
tid = /^NCBI taxonomy id:\s*(\d+)/.match(r)[1]
tid = tid.to_i
cname = /^Common name:\s*(.+)\n/.match(r)[1]
sname = /^Scientific name:\s*(.+)$/.match(r)[1]
cache[target_id] = {'tid' => tid, 'cname' => cname, 'sname' => sname}
else
## malformed report
STDERR.puts "WARNING: taxinfo not found for #{target_id}"
cache[target_id] = {'tid' => nil, 'cname' => nil, 'sname' => nil}
end
end
data = cache[target_id]
outdat = a
outdat += [data['tid'], data['sname'], data['cname']]
puts outdat.flatten.join("\t")
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment