Skip to content

Instantly share code, notes, and snippets.

@decodebiology
Last active February 4, 2025 12:36
Show Gist options
  • Select an option

  • Save decodebiology/8c1b5382a48ae5b7c4751cbc7cf92b7d to your computer and use it in GitHub Desktop.

Select an option

Save decodebiology/8c1b5382a48ae5b7c4751cbc7cf92b7d to your computer and use it in GitHub Desktop.
Rscript GTF2Table.R ensembl http://ftp.ensembl.org/pub/release-106/gtf/mus_musculus/Mus_musculus.GRCm39.106.chr.gtf.gz ./
args = commandArgs(trailingOnly=TRUE)
gtf_source = args[1];
url = args[2];
outpath = args[3];
gtf <- rtracklayer::import(paste0(url))
gtf_df=as.data.frame(gtf)
gtf_df_gene=subset(gtf_df,gtf_df$type=="gene")
if(gtf_source=="ensembl"){
xanot <- gtf_df_gene[,c("gene_id","gene_name","gene_biotype","seqnames","start","end","strand")]
}
if(gtf_source=="gencode"){
xanot <- gtf_df_gene[,c("gene_id","gene_name","gene_type","seqnames","start","end","strand")]
}
colnames(xanot)[4] <- "chromosome"
colnames(xanot)[3] <- "gene_biotype"
colnames(xanot)[1] <- "Geneid"
xanot$chromosome <- gsub("chr","",xanot$chromosome)
write.table(xanot, paste0(outpath,"/",gtf_source,"_gene_annotation.txt"), sep="\t", quote=F, row.names=F)
trying URL 'http://ftp.ensembl.org/pub/release-106/gtf/mus_musculus/Mus_musculus.GRCm39.106.chr.gtf.gz'
Content type 'application/x-gzip' length 31324691 bytes (29.9 MB)
==================================================
downloaded 29.9 MB
Geneid gene_name gene_biotype chromosome start end strand
ENSMUSG00000102628 Gm37671 TEC 1 150956201 150958296 +
ENSMUSG00000100595 Gm19087 processed_pseudogene 1 150983666 150984611 +
ENSMUSG00000097426 Gm8941 processed_pseudogene 1 151012258 151013531 +
ENSMUSG00000104478 Gm38212 TEC 1 108344807 108347562 +
ENSMUSG00000104385 Gm7449 processed_pseudogene 1 6980784 6981446 +
ENSMUSG00000086053 Gm15178 lncRNA 1 75368775 75373007 -
ENSMUSG00000101231 Gm28283 processed_pseudogene 1 108540067 108540244 -
ENSMUSG00000102135 Gm37108 processed_pseudogene 1 6986783 6993812 +
ENSMUSG00000103282 Gm37275 processed_pseudogene 1 6999983 7000012 +
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment