SnpEff-Sapelo2
Category
Bioinformatics
Program On
Sapelo2
Version
5.0e
Author / Distributor
Description
"SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants (such as amino acid changes)." More details are at snpEff
Running Program
The latest version of this application is available with Java 11 and either GCCcore-11.2.0 or GCCcore-11.3.0
To use these version, please load the module with
ml snpEff/5.0e-GCCcore-11.2.0-Java-11
or
ml snpEff/5.0e-GCCcore-11.3.0-Java-11
Note that we do not currently have any snpEff databases available centrally; users are instead encouraged to download (or create, as described here) whatever databases they need in their own space by first copying the snpEff config file (available at either /apps/eb/snpEff/5.0e-GCCcore-11.2.0-Java-11/snpEff.config or /apps/eb/snpEff/5.0e-GCCcore-11.3.0-Java-11/snpEff.config) to their own space and modifying it as needed, then using the '-c' option to specify the location of this custom config file when running the software as described below and at the snpEff website.
Here is an example of a shell script, sub.sh, to run on the batch queue:
#!/bin/bash
#SBATCH --job-name=j_snpEff
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=snpEff.%j.out
#SBATCH --error=snpEff.%j.err
cd $SLURM_SUBMIT_DIR
ml snpEff/5.0e-GCCcore-11.3.0-Java-11
snpEff [command ] [options] [files]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running_Jobs_on_Sapelo2.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
(Note, despite what the documentation states, it is NOT necessary to use "java -jar snpEff.jar" or "java -jar snpEff.jar command"...you can simply type "snpEff [command] -h")
ml snpEff/5.0e-GCCcore-11.3.0-Java-11 $ snpEff -h SnpEff version SnpEff 5.0e (build 2021-03-09 06:01), by Pablo Cingolani Usage: snpEff [command] [options] [files] Run 'java -jar snpEff.jar command' for help on each specific command Available commands: [eff|ann] : Annotate variants / calculate effects (you can use either 'ann' or 'eff', they mean the same). Default: ann (no command or 'ann'). build : Build a SnpEff database. buildNextProt : Build a SnpEff for NextProt (using NextProt's XML files). cds : Compare CDS sequences calculated form a SnpEff database to the one in a FASTA file. Used for checking databases correctness. closest : Annotate the closest genomic region. count : Count how many intervals (from a BAM, BED or VCF file) overlap with each genomic interval. databases : Show currently available databases (from local config file). download : Download a SnpEff database. dump : Dump to STDOUT a SnpEff database (mostly used for debugging). genes2bed : Create a bed file from a genes list. len : Calculate total genomic length for each marker type. pdb : Build interaction database (based on PDB data). protein : Compare protein sequences calculated form a SnpEff database to the one in a FASTA file. Used for checking databases correctness. seq : Show sequence (from command line) translation. show : Show a text representation of genes or transcripts coordiantes, DNA sequence and protein sequence. translocReport : Create a translocations report (from VCF file). Generic options: -c , -config : Specify config file -configOption name=value : Override a config file option -d , -debug : Debug mode (very verbose). -dataDir <path> : Override data_dir parameter from config file. -download : Download a SnpEff database, if not available locally. Default: true -nodownload : Do not download a SnpEff database, if not available locally. -h , -help : Show this help and exit -noLog : Do not report usage statistics to server -q , -quiet : Quiet mode (do not show any messages or errors) -v , -verbose : Verbose mode -version : Show version number and exit Database options: -canon : Only use canonical transcripts. -canonList <file> : Only use canonical transcripts, replace some transcripts using the 'gene_id transcript_id' entries in <file>. -interaction : Annotate using inteactions (requires interaciton database). Default: true -interval <file> : Use a custom intervals in TXT/BED/BigBed/VCF/GFF file (you may use this option many times) -maxTSL <TSL_number> : Only use transcripts having Transcript Support Level lower than <TSL_number>. -motif : Annotate using motifs (requires Motif database). Default: true -nextProt : Annotate using NextProt (requires NextProt database). -noGenome : Do not load any genomic database (e.g. annotate using custom files). -noExpandIUB : Disable IUB code expansion in input variants -noInteraction : Disable inteaction annotations -noMotif : Disable motif annotations. -noNextProt : Disable NextProt annotations. -onlyReg : Only use regulation tracks. -onlyProtein : Only use protein coding transcripts. Default: false -onlyTr <file.txt> : Only use the transcripts in this file. Format: One transcript ID per line. -reg <name> : Regulation track to use (this option can be used add several times). -ss , -spliceSiteSize <int> : Set size for splice sites (donor and acceptor) in bases. Default: 2 -spliceRegionExonSize <int> : Set size for splice site region within exons. Default: 3 bases -spliceRegionIntronMin <int> : Set minimum number of bases for splice site region within intron. Default: 3 bases -spliceRegionIntronMax <int> : Set maximum number of bases for splice site region within intron. Default: 8 bases -strict : Only use 'validated' transcripts (i.e. sequence has been checked). Default: false -ud , -upDownStreamLen <int> : Set upstream downstream interval length (in bases)
Installation
Source code is obtained from snpEff
System
64-bit Linux