Idowu Olawoye idolawoye

idolawoye / filter_fasta.py

Created February 17, 2026 18:51

Filter multi FASTA file using a list of sequence IDs one per line

	#!/usr/bin/env python3
	"""
	Filter sequences from a multifasta file based on an exclusion list.

	Usage:
	python filter_fasta.py -i input.fasta -e exclude_ids.txt -o output.fasta

	The exclude list should have one sequence ID per line (without the leading '>').
	Matching is done against the first word of each FASTA header.
	"""

idolawoye / kraken_bash.txt

Last active December 7, 2022 20:08

One liner to print out percentage of reads from a Kraken report file

	grep -w -F -f taxon.txt *report.txt \| awk 'BEGIN{OFS="\t"}{ print $1,$2}' > tb.txt

	# taxon.txt is a file containing the Taxon name you want to summarize, e.g: Mycobacterium tuberculosis complex
	# *.report.txt is the wildcard for selecting multiple Kraken report files
	# tb.txt is the output TSV file

idolawoye / count_N_percentage.py

Created June 11, 2022 13:37

Python script to count number of Ns in a multifasta file

	#!/usr/bin/env python

	from Bio import SeqIO

	fasta = "the_fasta_file.fasta"

	for record in SeqIO.parse(fasta, "fasta"):
	print("ID: %s" % record.id)
	print("Sequence length: %s" % len(record))
	print("Number of Ns: %s" % record.seq.count('N'))

idolawoye / assembly_coverage.txt

Created October 2, 2020 08:30

Calculate average genome coverage on aligned BAM files

samtools depth CIV3724802_ref_bwa_sorted.bam | awk '{sum+=$3} END { print "Average = ",sum/NR}'

idolawoye / gist:0c219560f82e8981aefc716b78d1c019

Created April 8, 2019 11:24

Shell script for downloading bulk files

	list=`cat TEXT_FILE` # list of the record file IDs.
	for i in $list
	do echo $i
	SHELL COMMAND [OPTIONS] $i #Command with file id
	done

idolawoye / gist:069615f51911b1c64d985cf816fa04be

Last active January 30, 2019 13:42

BWA mapping of different samples against a reference genome

	total_files=`find -name '*.fastq' \| wc -l`
	arr=( $(ls *.fastq) )
	echo "mapping started" >> map.log
	echo "---------------" >> map.log

	for ((i=0; i<$total_files; i+=2))
	{
	ref_genome=../ref.gb
	sample_name=`echo ${arr[$i]} \| awk -F "_" '{print $1}'`
	echo "[mapping running for] $sample_name"