Skip to content

Instantly share code, notes, and snippets.

View EpiDemos82's full-sized avatar

Taj Azarian EpiDemos82

View GitHub Profile
Good Practices
1) It uniquely identifies the source of the sample (the location, the biological individual), or at least can be traced back to this information.
2) It is clear and cannot be confused with any other sample in the same project, lab, department, university, universe.
3) It is impossible to mis-write or mis-read.
4) It is short.
-Study (2-3 letter) or code as a study number
CF
CAR
IH
@EpiDemos82
EpiDemos82 / InterCladeGeneticDistance.R
Last active August 27, 2017 20:02
Mean SNP/genetic distance between groups/clades using R
library(ape)
library(vegan)
library(adegenet) #Might as well load if you are doing phyo stuff
setwd("/Users/../floder")
Alignment <- read.dna("mfa.fasta",format = "fasta") #Importing alignment in fasta format
fasta.seq.labels <- as.data.frame(labels(Alignment)) #Obtaining ordered taxa
colnames(fasta.seq.labels) <- "taxa"
clades <- as.data.frame(read_delim("~/clade_assignments_for_tax.txt")) #importing clade assigns for each isolate (in the future this could be automated using PCA)
@EpiDemos82
EpiDemos82 / Useful_BASH_commands.txt
Last active March 1, 2023 19:01
Useful BASH commands for working with NGS data
##GENERAL TEXT OR FILE MANIPULATION
#Find lines in a list (e.g. file names) that are not present in another list
#This is good for checking whether downstream files are present (i.e. pipeline ran susscessfully)
comm -23 <(sort All.txt) <(sort Finished.txt)
#Looping over anything
for f in $(cat names.txt); do whatever to ${f}; done
#renaming file extensions using bash code