Stampy-Sapelo2old

From Research Computing Center Wiki
Jump to navigation Jump to search

[Category:Sapelo2old]]

Category

Bioinformatics

Program On

Sapelo2

Version

1.0.31

Author / Distributor

Please see https://www.well.ox.ac.uk/research/research-groups/lunter-group/lunter-group/stampy

Description

From https://www.well.ox.ac.uk/research/research-groups/lunter-group/lunter-group/stampy: "Stampy is a package for the mapping of short reads from illumina sequencing machines onto a reference genome. It's recommended for most workflows, including those for genomic resequencing, RNA-Seq and Chip-seq."

Running Program

Also refer to Running Jobs on Sapelo2

For more information on Environment Modules on Sapelo2 please see the Lmod page.

  • Version 1.0.31, installed in /usr/local/apps/eb/Stampy/1.0.31-foss-2016b-Python-2.7.14

To use this version of Stampy, please first load the module with

module load Stampy/1.0.31-foss-2016b-Python-2.7.14

This module will automatically load Python/2.7.14-foss-2016b and other dependencies.


Sample job submission script (sub.sh) to run stampy.py:

#PBS -S /bin/bash
#PBS -q batch
#PBS -N jobname
#PBS -l nodes=1:ppn=1
#PBS -l walltime=4:00:00
#PBS -l mem=2gb

cd $PBS_O_WORKDIR

module load Stampy/1.0.31-foss-2016b-Python-2.7.14

stampy.py [options]

where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.

Submit the job to the queue with

qsub sub.sh

Documentation

See for example https://www.well.ox.ac.uk/files-library/readme.txt


module load Stampy/1.0.31-foss-2016b-Python-2.7.14

stampy.py --help

stampy v1.0.31 (r3759), <gerton.lunter@well.ox.ac.uk>

Usage: /usr/local/apps/eb/Stampy/1.0.31-foss-2016b-Python-2.7.14/stampy.py [options] [files]

Command options (one required)
 -G PREFIX, --build-genome=PREFIX   Build genome index PREFIX.stidx from fasta file(s) on command line
 -H PREFIX, --build-hash=PREFIX     Build hash PREFIX.sthash
 -M FILE, --map=FILE[,FILE]         Map fastq/fasta/BAM file(s)
 -S FILE, --simulate=FILE           Simulate reads following empirical qualities from read FILE
 -P FILE, --parse=FILE              Parse simulated SAM file, and report statistics
 -A FILE                            Convert qualities; strip adapters

Mapping options:
 -g PREFIX, --genome=PREFIX         Use genome index file PREFIX.stidx
 -h PREFIX, --hash=PREFIX           Use hash file PREFIX.sthash
 -t N, --threads=N                  Number of threads to use [1]
 --numrecords=N                     Number of records to process [all]
 --processpart=A/B                  Process the Ath read (pair) in every block of B read (pairs) [1/1]
 --minposterior=N                   Minimum read mapping phred posterior [no filtering]
 --maxscore=N                       Maximum read likelihood phred score [no filtering]
 --qualitybase=C                    Character or ASCII for quality score 0 in input [33 or !]
 --logit                            Qualities are logit (log p/1-p) rather than phred (log p) scores [phred]
 --solexa, --solexaold, --sanger    Short for: --qualitybase=@; --qualitybase=@ --logit; default
 --substitutionrate=F               Set substitution rate for mapping and simulation [0.001]
 --gapopen=N                        Gap open penalty (phred score) [40]
 --gapextend=N                      Gap extension penalty (phred score) [3]
 --insertsize=N                     (Initial) mean insert size for paired-end reads [250]
 --insertsd=N                       (Initial) standard deviation for insert size [60]
 --insertsize2=N                    (Initial) mean insert size for mate pairs [-2000]
 --insertsd2=N                      (Initial) standard deviation for mate pairs [-1 = deactivated]
 --maxpairseeds=N                   Number of ambiguous single mappings to take forward for paired realignment [25]
 --noautosense                      Do not auto-sense insert size distribution(s) [False]
 --sensitive                        More sensitive; about 25-50% slower
 --fast                             Less sensitive; about 50% (paired-end) to 100% (single-ended) faster
 --bwaoptions=opts                  Options and <prefix> for BWA pre-mapper (quote multiple options)
 --bwamaxmismatch=N                 Max number of mismatches for BWA maps; -1=auto [-1]
 --bwatmpdir=S                      Set directory for BWA temporary files
 --bwa=F                            Set BWA executable [default: bwa]
 --bwamark                          Include/mark BWA-mapped reads with XP:Z:BWA tag (produces more output lines)

Output options:
 -o FILE, --output=FILE             Write mapping output to FILE [stdout]
 -f FMT, --outputformat=FMT         Select mapping output format (sam,maqtxt,maqmap,maqmapN) [sam]
 --rightmost                        Flush gaps to rightmost position in reference coordinates [leftmost]
 --xa-max                           Number of alternative hits to output for single/concordant paired reads [0, max 20]
 --xa-max-discordant                Number of alternative hits to output for discordant paired reads [0]
 --overwrite                        Allow overwriting of output files
 --baq                              (SAM format) Compute base-alignment quality (BAQ; BQ tag)
 --alignquals                       (SAM format) Compute posterior alignment probabilities (YQ tag)
 --xcigar                           (SAM format) Compute the extended CIGAR string (XC tag)
 --readgroup=ID:id,tag:value,...    (SAM format) Set readgroup tags (ID,SM,LB,DS,PU,PI,CN,DT,PL); ",," quotes ","
 --norefoutput                      (SAM format) Use = signs at read positions that match the reference
 --comment=comment1%comment2%...    (SAM format) Add @COmments.  % separate lines; commas are converted into tabs
 --comment=@F                       (SAM format) Add @COmments from file F

Simulation options (use with -S):
 --seed=N                           Set random number seed [1]
 --simulate-minindellen=N           Minimum indel length to simulate; negative is deletion [0]
 --simulate-maxindellen=N           Maximum indel length to simulate; negative is deletion [0]
 --simulate-duplications            For insertions, simulate duplications rather than (default) random sequence insertion
 --simulate-numsubstitutions=N      Insert N substitutions in read (pairs) (default = Poisson distribution)

Advanced options:
 --maxcount=N                       (-H) Maximum multiplicity of repeat words [200]
 --assembly=S                       (-G) Set assembly identifier (appears in @SQ field (AS tag) in SAM output)
 --species=S                        (-G) Set species identifier (appears in @SQ field (SP tag) in SAM output)
 --referenceuri=S                   (-G) Set URI for the reference (appears in @SQ field (UR tag) in SAM output)
 --noparseNCBI                      (-G) Don't parse but just copy gi|nnn|ref|xxx| labels in NCBI fasta files
 --strip-adapter                    (-A) Strip adapters from fastq files
 --inputformat=FMT                  Read input format (fasta, fastq, sam, bam) [fastq]
 --banding=N                        Band size for banded alignment; -1=none [60]
 --fastaqual=N                      Base quality for fasta input [30]
 --maxbasequal=N                    Maximum accepted base quality [50]
 --svprior=N                        Prior probability of read pair bridging a SV (phred score) [55]
 --longindelprior=N                 Prior probability of read pair bridging a long indel (phred score) [40]
 --baseentropy=N                    Entropy of inserted nucleotides (phred score) [5]
 --remote=U                         Command to get input reads from URL %s; shortcuts are 'wget' and 'scp'
 --keepreforder                     (-M) (default) Use reference order in .fa/.stidx file, rather than alphabetical
 --alphabetical                     (-M) Order reference identifiers alphabetically
 --labelfilter=R                    (-M) Only map reads / read pairs whose label match the regular expression R
 --casava8                          (-M) Parse Casava v1.8 FastQ headers; include but do not map filtered reads
 --separator1=c                     (-M) Use 'c' as separator for Casava data columns [:]
 --separator2=c                     (-M) use 'c' as separator between first 7 columns, and the auxiliary columns [ ]
 --gatkcigarworkaround              (-M) Remove adjacent I/D CIGAR operators (valid in SAM spec, but trips up GATK)
 --unclipq                          (-M X.bam / -M --bamoptions) Undo soft clipping over low Q bases (-1=disable) [10]
 --bamsortprefix=S                  (-M X.bam) Temporary storage for sorting [--output argument, or /tmp/bamsort]
 --bamsortbuflen=N                  (-M X.bam) Entries in memory sort buffer [10000]
 --bamsortmemory=N                  (-M X.bam) Value for option -m to samtools sort [200000000] (samtools 1.3: 200M)
 --bamkeepgoodreads                 (-M X.bam) If set, do not re-map already well-mapped reads from BAM [False]

General options:
 -?, --help                         Display this help page
 -v N, --verbosity=N                Set verbosity level to N [2]
 --logfile=FILE                     Send logging information to FILE  [stderr]


Installation

System

64-bit Linux