Skip to content

Instantly share code, notes, and snippets.

View cory-weller's full-sized avatar
🏳️‍🌈

Cory cory-weller

🏳️‍🌈
View GitHub Profile
@cory-weller
cory-weller / Homo_sapiens_assembly38.haplotype_database.txt
Created July 31, 2024 12:13
Haplotype Map for GATK CrosscheckFingerprints
@HD VN:1.5 SO:unsorted
@SQ SN:chr1 LN:248956422 M5:6aef897c3d6ff0c78aff06ac189178dd AS:20 UR:/seq/references/kendrix/v0/kendrix.fasta SP:Homo sapiens
@SQ SN:chr2 LN:242193529 M5:f98db672eb0993dcfdabafe2a882905c AS:20 UR:/seq/references/kendrix/v0/kendrix.fasta SP:Homo sapiens
@SQ SN:chr3 LN:198295559 M5:76635a41ea913a405ded820447d067b0 AS:20 UR:/seq/references/kendrix/v0/kendrix.fasta SP:Homo sapiens
@SQ SN:chr4 LN:190214555 M5:3210fecf1eb92d5489da4346b3fddc6e AS:20 UR:/seq/references/kendrix/v0/kendrix.fasta SP:Homo sapiens
@SQ SN:chr5 LN:181538259 M5:a811b3dc9fe66af729dc0dddf7fa4f13 AS:20 UR:/seq/references/kendrix/v0/kendrix.fasta SP:Homo sapiens
@SQ SN:chr6 LN:170805979 M5:5691468a67c7e7a7b5f2a3a683792c29 AS:20 UR:/seq/references/kendrix/v0/kendrix.fasta SP:Homo sapiens
@SQ SN:chr7 LN:159345973 M5:cc044cc2256a1141212660fb07b6171e AS:20 UR:/seq/references/kendrix/v0/kendrix.fasta SP:Homo sapiens
@SQ SN:chr8 LN:145138636 M5:c67955b5f7815a9a1edfaa15893d3616 AS:20 UR:/seq/references/kendrix/v0/kendrix.fasta S
@cory-weller
cory-weller / # sim anchors.md
Last active September 8, 2023 12:55
sim_anchors

Overview

script_specific.sh is an example for extracting sequences between two anchors.

The output is a three-column tab-delimited table in the format

| anchor1 | insert | anchor2 |

Only lines that match the pattern are included.

min_insert is the minimum number of characters needed to be a match.

@cory-weller
cory-weller / packages.R
Created June 14, 2023 14:34
renv setup
library(abind)
library(annotate)
library(AnnotationDbi)
library(AnnotationFilter)
library(askpass)
library(Azimuth)
library(backports)
library(base64enc)
library(beachmat)
library(BH)
#!/usr/bin/env bash
# DETAILS: Functions by setting the field separator to a regular expression capturing ">", tab, and space.
# For each row in the file, if it is a header (begins with ">"), update the value of variable s
# to be equal to the first word in the header, and write a header line only containing that first word.
# Every line is written to a file <first_word_in_header>.fa, splitting the file into separate contigs.
awk -F "(^>|\t| )" '{if($0 ~ /^>/) {s=$2".fa"; print ">"$2 > s} else print > s}' ${inFasta}
@cory-weller
cory-weller / gcloud-install-singularity.sh
Last active June 9, 2023 13:48
Commands to install singularity on google cloud VM
#!/usr/bin/env bash
# Installs singularity
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
build-essential \
libssl-dev \
uuid-dev \

Instructions based on those from the Biowulf team

You must be on the NIH network on campus, or connected to VPN.

Then, start an interactive session from the command-line on Biowulf:

# On BIOWULF:
sinteractive --mem 20G --time 8:00:00 --gres lscratch:20 --tunnel
@cory-weller
cory-weller / ampliconsplit.sh
Created May 25, 2023 16:22
Reads through fastq, assigns reads to fasta database, and outputs text files of reads assigned to each fasta sequence.
#!/usr/bin/env bash
echo "Running ampliconsplit.sh $@"
## convenience functions for arg parsing
usage_error () { echo >&2 "ERROR: $1"; exit 2; }
assert_argument () { test "$1" != "$EOL" || usage_error "$2 requires an argument. Try --help"; }
# Parse Arguments
## import $@
@cory-weller
cory-weller / gene_table_GRCh38-2020-A.tsv
Created May 23, 2023 13:46
Generating a table containing features of SYMBOL and ENSEMBL (ENSG) IDs including Chr, start, and stop positions
We can't make this file beautiful and searchable because it's too large.
SYMBOL ENSEMBL seqnames start end width strand
ENSG00000243485 chr1 29554 31109 1556 +
FAM138A ENSG00000237613 chr1 34554 36081 1528 -
OR4F5 ENSG00000186092 chr1 65419 71585 6167 +
ENSG00000238009 chr1 89295 133723 44429 -
ENSG00000239945 chr1 89551 91105 1555 -
ENSG00000239906 chr1 139790 140339 550 -
ENSG00000241860 chr1 141474 173862 32389 -
ENSG00000241599 chr1 160446 161525 1080 +
ENSG00000286448 chr1 266855 268655 1801 +
@cory-weller
cory-weller / example_doublet_finder.R
Last active May 15, 2023 17:03
Seurat DoubletFinder initial example
#!/usr/bin/env Rscript
library(Seurat)
library(DoubletFinder)
## Standard pre-processing
seurat_obj <- CreateSeuratObject(kidney.data)
seurat_obj <- NormalizeData(seurat_obj)
seurat_obj <- FindVariableFeatures(seurat_obj, selection.method = "vst", nfeatures = 2000)
@cory-weller
cory-weller / extract_fasta_range.py
Last active July 14, 2022 17:28
finds a header within a given fasta file, then prints the desired nucleotide ranges to STDOUT
#!/usr/bin/env python
'''finds a header within a given fasta file, then prints the desired nucleotide ranges to STDOUT'''
import sys
import argparse
import regex as re
def wrap_fasta(seq):
'''wraps sequence every 80 characters with newlines'''
return '\n'.join([seq[x:x+80] for x in range(0,len(seq),80)])