Ensembl's VEP (Variant Effect Predictor) is popular for how it picks a single effect per gene as detailed here, its CLIA-compliant HGVS variant format, and Sequence Ontology nomenclature for variant effects.
Instead of the official instructions, we will use conda to install VEP and its dependencies. If you don't already have conda, install it into $HOME/miniconda3 as follows:
curl -sL https://repo.anaconda.com/miniconda/Miniconda3-py37_4.9.2-Linux-x86_64.sh -o /tmp/miniconda.sh
sh /tmp/miniconda.sh -bfp $HOME/miniconda3Add the conda bin folder into your $PATH so that all installed tools are accessible via command-line. You can also add this to your ~/.bashrc or ~/.profile for this to persist across logins:
export PATH=$HOME/miniconda3/bin:$PATHDownload and install VEP, its dependencies, and also samtools/bcftools/liftOver:
conda install -qy -c conda-forge -c bioconda -c defaults ensembl-vep==102.0 htslib==1.10.2 bcftools==1.10.2 samtools==1.10 ucsc-liftover==377Download VEP's offline cache for GRCh38, and the reference FASTA:
mkdir -p $HOME/.vep/homo_sapiens/102_GRCh38/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/release-102/variation/indexed_vep_cache/homo_sapiens_vep_102_GRCh38.tar.gz $HOME/.vep/
tar -zxf $HOME/.vep/homo_sapiens_vep_102_GRCh38.tar.gz -C $HOME/.vep/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/release-102/fasta/homo_sapiens/dna_index/ $HOME/.vep/homo_sapiens/102_GRCh38/(Optional) Download VEP's offline cache for GRCh37, and the reference FASTA which we must bgzip instead of gzip:
mkdir -p $HOME/.vep/homo_sapiens/102_GRCh37/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/release-102/variation/indexed_vep_cache/homo_sapiens_vep_102_GRCh37.tar.gz $HOME/.vep/
tar -zxf $HOME/.vep/homo_sapiens_vep_102_GRCh37.tar.gz -C $HOME/.vep/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/grch37/release-102/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.dna.toplevel.fa.gz $HOME/.vep/homo_sapiens/102_GRCh37/
gzip -d $HOME/.vep/homo_sapiens/102_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz
bgzip -i $HOME/.vep/homo_sapiens/102_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa
samtools faidx $HOME/.vep/homo_sapiens/102_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gzTest running VEP in offline mode on a GRCh38 VCF:
curl -sLO https://raw.githubusercontent.com/Ensembl/ensembl-vep/release/102/examples/homo_sapiens_GRCh38.vcf
vep --species homo_sapiens --assembly GRCh38 --offline --no_progress --no_stats --sift b --ccds --uniprot --hgvs --symbol --numbers --domains --gene_phenotype --canonical --protein --biotype --tsl --pubmed --variant_class --shift_hgvs 1 --check_existing --total_length --allele_number --no_escape --xref_refseq --failed 1 --vcf --minimal --flag_pick_allele --pick_order canonical,tsl,biotype,rank,ccds,length --dir $HOME/.vep --fasta $HOME/.vep/homo_sapiens/102_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz --input_file homo_sapiens_GRCh38.vcf --output_file homo_sapiens_GRCh38.vep.vcf --polyphen b --af --af_1kg --af_esp --regulatory
Great post! I was having issues with the rsync ftp and found out that the route is wrong I don't know if they changed it, but the path is without the "ensembl" (rsync://ftp.ensembl.org/ensembl/pub => rsync://ftp.ensembl.org/pub). So the code would look like this
Download VEP's offline cache for GRCh38, and the reference FASTA:
(Optional) Download VEP's offline cache for GRCh37, and the reference FASTA which we must bgzip instead of gzip:
Please update it because this gist is a great resource and it would be great for everyone who uses it in the future
Best wishes!