Bowtie2-Teaching: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 9: Line 9:


=== Version ===
=== Version ===
2.3.4.1
2.2.3
   
   
=== Author / Distributor ===
=== Author / Distributor ===
Line 16: Line 16:
   
   
=== Description ===
=== Description ===
"Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes."
"Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads   to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s   of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes.   Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome,   its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes."
More details are at [http://bowtie-bio.sourceforge.net/bowtie2/index.shtml Bowtie2]
More details are at [http://bowtie-bio.sourceforge.net/bowtie2/index.shtml Bowtie2]


=== Running Program ===
=== Running Program ===


The last version of this application is at /usr/local/apps/eb/Bowtie2/2.3.4.1-foss-2016b
The last version of this application is at /usr/local/apps/eb/Bowtie2/2.2.3-foss-2016b


To use this version, please load the module with
To use this version, please load the module with
<pre class="gscript">
<pre class="gscript">
ml Bowtie2/2.3.4.1-foss-2016b  
ml Bowtie2/2.2.3-foss-2016b  
</pre>  
</pre>  


Line 43: Line 43:
   
   
cd $SLURM_SUBMIT_DIR<br>
cd $SLURM_SUBMIT_DIR<br>
ml Bowtie2/2.3.4.1-foss-2016b<br>     
ml Bowtie2/2.2.3-foss-2016b<br>     
bowtie2 <u>[options]</u><br>   
bowtie2 <u>[options]</u><br>   
</div>
</div>
Line 59: Line 59:
   
   
<pre  class="gcommand">
<pre  class="gcommand">
ml Bowtie2/2.3.4.1-foss-2016b  
ml Bowtie2/2.2.3-foss-2016b  
bowtie2 -h
bowtie2 -h
Bowtie 2 version 2.3.4.1 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea)
Bowtie 2 version 2.2.3 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea)
Usage:  
Usage:  
   bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r> | --interleaved <i>} [-S <sam>]
   bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r>} [-S <sam>]


   <bt2-idx>  Index filename prefix (minus trailing .X.bt2).
   <bt2-idx>  Index filename prefix (minus trailing .X.bt2).
Line 72: Line 72:
             Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
             Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
   <r>        Files with unpaired reads.
   <r>        Files with unpaired reads.
            Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
  <i>        Files with interleaved paired-end FASTQ reads
             Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
             Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
   <sam>      File for SAM output (default: stdout)
   <sam>      File for SAM output (default: stdout)
Line 84: Line 82:
  Input:
  Input:
   -q                query input files are FASTQ .fq/.fastq (default)
   -q                query input files are FASTQ .fq/.fastq (default)
  --tab5            query input files are TAB5 .tab5
  --tab6            query input files are TAB6 .tab6
   --qseq            query input files are in Illumina's qseq format
   --qseq            query input files are in Illumina's qseq format
   -f                query input files are (multi-)FASTA .fa/.mfa
   -f                query input files are (multi-)FASTA .fa/.mfa
   -r                query input files are raw one-sequence-per-line
   -r                query input files are raw one-sequence-per-line
  -F k:<int>,i:<int> query input files are continuous FASTA where reads
                    are substrings (k-mers) extracted from a FASTA file <s>
                    and aligned at offsets 1, 1+i, 1+2i ... end of reference
   -c                <m1>, <m2>, <r> are sequences themselves, not files
   -c                <m1>, <m2>, <r> are sequences themselves, not files
   -s/--skip <int>    skip the first <int> reads/pairs in the input (none)
   -s/--skip <int>    skip the first <int> reads/pairs in the input (none)
Line 156: Line 149:
   --no-mixed        suppress unpaired alignments for paired reads
   --no-mixed        suppress unpaired alignments for paired reads
   --no-discordant    suppress discordant alignments for paired reads
   --no-discordant    suppress discordant alignments for paired reads
   --dovetail         concordant when mates extend past each other
   --no-dovetail     not concordant when mates extend past each other
   --no-contain      not concordant when one mate alignment contains other
   --no-contain      not concordant when one mate alignment contains other
   --no-overlap      not concordant when mates overlap at all
   --no-overlap      not concordant when mates overlap at all
Line 162: Line 155:
  Output:
  Output:
   -t/--time          print wall-clock time taken by search phases
   -t/--time          print wall-clock time taken by search phases
   --un <path>       write unpaired reads that didn't align to <path>
   --un <path>           write unpaired reads that didn't align to <path>
   --al <path>       write unpaired reads that aligned at least once to <path>
   --al <path>           write unpaired reads that aligned at least once to <path>
   --un-conc <path>   write pairs that didn't align concordantly to <path>
   --un-conc <path>     write pairs that didn't align concordantly to <path>
   --al-conc <path>   write pairs that aligned concordantly at least once to <path>
   --al-conc <path>     write pairs that aligned concordantly at least once to <path>
    (Note: for --un, --al, --un-conc, or --al-conc, add '-gz' to the option name, e.g.
  (Note: for --un, --al, --un-conc, or --al-conc, add '-gz' to the option name, e.g.
    --un-gz <path>, to gzip compress output, or add '-bz2' to bzip2 compress output.)
  --un-gz <path>, to gzip compress output, or add '-bz2' to bzip2 compress output.)
   --quiet            print nothing to stderr except serious errors
   --quiet            print nothing to stderr except serious errors
   --met-file <path>  send metrics to file at <path> (off)
   --met-file <path>  send metrics to file at <path> (off)
   --met-stderr      send metrics to stderr (off)
   --met-stderr      send metrics to stderr (off)
   --met <int>        report internal counters & metrics every <int> secs (1)
   --met <int>        report internal counters & metrics every <int> secs (1)
  --no-unal          suppress SAM records for unaligned reads
   --no-head          supppress header lines, i.e. lines starting with @
   --no-head          suppress header lines, i.e. lines starting with @
   --no-sq            supppress @SQ header lines
   --no-sq            suppress @SQ header lines
   --rg-id <text>    set read group id, reflected in @RG line and RG:Z: opt field
   --rg-id <text>    set read group id, reflected in @RG line and RG:Z: opt field
   --rg <text>        add <text> ("lab:value") to @RG line of SAM header.
   --rg <text>        add <text> ("lab:value") to @RG line of SAM header.
                     Note: @RG line only printed when --rg-id is set.
                     Note: @RG line only printed when --rg-id is set.
   --omit-sec-seq    put '*' in SEQ and QUAL fields for secondary alignments.
   --omit-sec-seq    put '*' in SEQ and QUAL fields for secondary alignments.
  --sam-no-qname-trunc Suppress standard behavior of truncating readname at first whitespace
                      at the expense of generating non-standard SAM.
  --xeq              Use '='/'X', instead of 'M,' to specify matches/mismatches in SAM record.
  --soft-clipped-unmapped-tlen Exclude soft-clipped bases when reporting TLEN


  Performance:
  Performance:

Revision as of 15:26, 15 August 2018

Category

Bioinformatics

Program On

Teaching

Version

2.2.3

Author / Distributor

Bowtie2

Description

"Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes." More details are at Bowtie2

Running Program

The last version of this application is at /usr/local/apps/eb/Bowtie2/2.2.3-foss-2016b

To use this version, please load the module with

ml Bowtie2/2.2.3-foss-2016b 

Here is an example of a shell script, sub.sh, to run on the batch queue:

#!/bin/bash
#SBATCH --job-name=j_Bowtie2
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=Bowtie2.%j.out
#SBATCH --error=Bowtie2.%j.err

cd $SLURM_SUBMIT_DIR
ml Bowtie2/2.2.3-foss-2016b
bowtie2 [options]

In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.

Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.


Here is an example of job submission command:

sbatch ./sub.sh 

Documentation

ml Bowtie2/2.2.3-foss-2016b 
bowtie2 -h
Bowtie 2 version 2.2.3 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea)
Usage: 
  bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r>} [-S <sam>]

  <bt2-idx>  Index filename prefix (minus trailing .X.bt2).
             NOTE: Bowtie 1 and Bowtie 2 indexes are not compatible.
  <m1>       Files with #1 mates, paired with files in <m2>.
             Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
  <m2>       Files with #2 mates, paired with files in <m1>.
             Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
  <r>        Files with unpaired reads.
             Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
  <sam>      File for SAM output (default: stdout)

  <m1>, <m2>, <r> can be comma-separated lists (no whitespace) and can be
  specified many times.  E.g. '-U file1.fq,file2.fq -U file3.fq'.

Options (defaults in parentheses):

 Input:
  -q                 query input files are FASTQ .fq/.fastq (default)
  --qseq             query input files are in Illumina's qseq format
  -f                 query input files are (multi-)FASTA .fa/.mfa
  -r                 query input files are raw one-sequence-per-line
  -c                 <m1>, <m2>, <r> are sequences themselves, not files
  -s/--skip <int>    skip the first <int> reads/pairs in the input (none)
  -u/--upto <int>    stop after first <int> reads/pairs (no limit)
  -5/--trim5 <int>   trim <int> bases from 5'/left end of reads (0)
  -3/--trim3 <int>   trim <int> bases from 3'/right end of reads (0)
  --phred33          qualities are Phred+33 (default)
  --phred64          qualities are Phred+64
  --int-quals        qualities encoded as space-delimited integers

 Presets:                 Same as:
  For --end-to-end:
   --very-fast            -D 5 -R 1 -N 0 -L 22 -i S,0,2.50
   --fast                 -D 10 -R 2 -N 0 -L 22 -i S,0,2.50
   --sensitive            -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default)
   --very-sensitive       -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

  For --local:
   --very-fast-local      -D 5 -R 1 -N 0 -L 25 -i S,1,2.00
   --fast-local           -D 10 -R 2 -N 0 -L 22 -i S,1,1.75
   --sensitive-local      -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default)
   --very-sensitive-local -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

 Alignment:
  -N <int>           max # mismatches in seed alignment; can be 0 or 1 (0)
  -L <int>           length of seed substrings; must be >3, <32 (22)
  -i <func>          interval between seed substrings w/r/t read len (S,1,1.15)
  --n-ceil <func>    func for max # non-A/C/G/Ts permitted in aln (L,0,0.15)
  --dpad <int>       include <int> extra ref chars on sides of DP table (15)
  --gbar <int>       disallow gaps within <int> nucs of read extremes (4)
  --ignore-quals     treat all quality values as 30 on Phred scale (off)
  --nofw             do not align forward (original) version of read (off)
  --norc             do not align reverse-complement version of read (off)
  --no-1mm-upfront   do not allow 1 mismatch alignments before attempting to
                     scan for the optimal seeded alignments
  --end-to-end       entire read must align; no clipping (on)
   OR
  --local            local alignment; ends might be soft clipped (off)

 Scoring:
  --ma <int>         match bonus (0 for --end-to-end, 2 for --local) 
  --mp <int>         max penalty for mismatch; lower qual = lower penalty (6)
  --np <int>         penalty for non-A/C/G/Ts in read/ref (1)
  --rdg <int>,<int>  read gap open, extend penalties (5,3)
  --rfg <int>,<int>  reference gap open, extend penalties (5,3)
  --score-min <func> min acceptable alignment score w/r/t read length
                     (G,20,8 for local, L,-0.6,-0.6 for end-to-end)

 Reporting:
  (default)          look for multiple alignments, report best, with MAPQ
   OR
  -k <int>           report up to <int> alns per read; MAPQ not meaningful
   OR
  -a/--all           report all alignments; very slow, MAPQ not meaningful

 Effort:
  -D <int>           give up extending after <int> failed extends in a row (15)
  -R <int>           for reads w/ repetitive seeds, try <int> sets of seeds (2)

 Paired-end:
  -I/--minins <int>  minimum fragment length (0)
  -X/--maxins <int>  maximum fragment length (500)
  --fr/--rf/--ff     -1, -2 mates align fw/rev, rev/fw, fw/fw (--fr)
  --no-mixed         suppress unpaired alignments for paired reads
  --no-discordant    suppress discordant alignments for paired reads
  --no-dovetail      not concordant when mates extend past each other
  --no-contain       not concordant when one mate alignment contains other
  --no-overlap       not concordant when mates overlap at all

 Output:
  -t/--time          print wall-clock time taken by search phases
  --un <path>           write unpaired reads that didn't align to <path>
  --al <path>           write unpaired reads that aligned at least once to <path>
  --un-conc <path>      write pairs that didn't align concordantly to <path>
  --al-conc <path>      write pairs that aligned concordantly at least once to <path>
  (Note: for --un, --al, --un-conc, or --al-conc, add '-gz' to the option name, e.g.
  --un-gz <path>, to gzip compress output, or add '-bz2' to bzip2 compress output.)
  --quiet            print nothing to stderr except serious errors
  --met-file <path>  send metrics to file at <path> (off)
  --met-stderr       send metrics to stderr (off)
  --met <int>        report internal counters & metrics every <int> secs (1)
  --no-head          supppress header lines, i.e. lines starting with @
  --no-sq            supppress @SQ header lines
  --rg-id <text>     set read group id, reflected in @RG line and RG:Z: opt field
  --rg <text>        add <text> ("lab:value") to @RG line of SAM header.
                     Note: @RG line only printed when --rg-id is set.
  --omit-sec-seq     put '*' in SEQ and QUAL fields for secondary alignments.

 Performance:
  -p/--threads <int> number of alignment threads to launch (1)
  --reorder          force SAM output order to match order of input reads
  --mm               use memory-mapped I/O for index; many 'bowtie's can share

 Other:
  --qc-filter        filter out reads that are bad according to QSEQ filter
  --seed <int>       seed for random number generator (0)
  --non-deterministic seed rand. gen. arbitrarily instead of using read attributes
  --version          print version information and quit
  -h/--help          print this usage message

Back to Top

Installation

Source code is obtained from Bowtie2

System

64-bit Linux