Last active
February 4, 2025 12:36
-
-
Save decodebiology/8c1b5382a48ae5b7c4751cbc7cf92b7d to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Rscript GTF2Table.R ensembl http://ftp.ensembl.org/pub/release-106/gtf/mus_musculus/Mus_musculus.GRCm39.106.chr.gtf.gz ./ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| args = commandArgs(trailingOnly=TRUE) | |
| gtf_source = args[1]; | |
| url = args[2]; | |
| outpath = args[3]; | |
| gtf <- rtracklayer::import(paste0(url)) | |
| gtf_df=as.data.frame(gtf) | |
| gtf_df_gene=subset(gtf_df,gtf_df$type=="gene") | |
| if(gtf_source=="ensembl"){ | |
| xanot <- gtf_df_gene[,c("gene_id","gene_name","gene_biotype","seqnames","start","end","strand")] | |
| } | |
| if(gtf_source=="gencode"){ | |
| xanot <- gtf_df_gene[,c("gene_id","gene_name","gene_type","seqnames","start","end","strand")] | |
| } | |
| colnames(xanot)[4] <- "chromosome" | |
| colnames(xanot)[3] <- "gene_biotype" | |
| colnames(xanot)[1] <- "Geneid" | |
| xanot$chromosome <- gsub("chr","",xanot$chromosome) | |
| write.table(xanot, paste0(outpath,"/",gtf_source,"_gene_annotation.txt"), sep="\t", quote=F, row.names=F) | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| trying URL 'http://ftp.ensembl.org/pub/release-106/gtf/mus_musculus/Mus_musculus.GRCm39.106.chr.gtf.gz' | |
| Content type 'application/x-gzip' length 31324691 bytes (29.9 MB) | |
| ================================================== | |
| downloaded 29.9 MB |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Geneid gene_name gene_biotype chromosome start end strand | |
| ENSMUSG00000102628 Gm37671 TEC 1 150956201 150958296 + | |
| ENSMUSG00000100595 Gm19087 processed_pseudogene 1 150983666 150984611 + | |
| ENSMUSG00000097426 Gm8941 processed_pseudogene 1 151012258 151013531 + | |
| ENSMUSG00000104478 Gm38212 TEC 1 108344807 108347562 + | |
| ENSMUSG00000104385 Gm7449 processed_pseudogene 1 6980784 6981446 + | |
| ENSMUSG00000086053 Gm15178 lncRNA 1 75368775 75373007 - | |
| ENSMUSG00000101231 Gm28283 processed_pseudogene 1 108540067 108540244 - | |
| ENSMUSG00000102135 Gm37108 processed_pseudogene 1 6986783 6993812 + | |
| ENSMUSG00000103282 Gm37275 processed_pseudogene 1 6999983 7000012 + |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment