Bowtie2-Teaching: Difference between revisions
No edit summary |
|||
(17 intermediate revisions by 2 users not shown) | |||
Line 9: | Line 9: | ||
=== Version === | === Version === | ||
2. | 2.4.1, 2.4.4, 2.4.5, 2.5.2 | ||
=== Author / Distributor === | === Author / Distributor === | ||
Line 16: | Line 16: | ||
=== Description === | === Description === | ||
"Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads | "Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes." | ||
More details are at [http://bowtie-bio.sourceforge.net/bowtie2/index.shtml Bowtie2] | More details are at [http://bowtie-bio.sourceforge.net/bowtie2/index.shtml Bowtie2] | ||
=== Running Program === | === Running Program === | ||
The last version of this application is at | The last version of this application is at /apps/eb/Bowtie2/2.5.2-GCC-11.3.0 | ||
To use this version, please | To use this version, please load the module with | ||
<pre class="gscript"> | <pre class="gscript"> | ||
ml Bowtie2/2. | ml Bowtie2/2.5.2-GCC-11.3.0 | ||
</pre> | </pre> | ||
Here is an example of a shell script, sub.sh, to run on | To use version 2.4.5, please load the module with | ||
<pre class="gscript"> | |||
ml Bowtie2/2.4.5-GCC-11.3.0 | |||
</pre> | |||
To use version 2.4.4, please load the module with | |||
<pre class="gscript"> | |||
ml Bowtie2/2.4.4-GCC-11.2.0 | |||
</pre> | |||
To use version 2.4.1, please load the module with | |||
<pre class="gscript"> | |||
ml Bowtie2/2.4.1-GCC-8.3.0 | |||
</pre> | |||
Here is an example of a shell script, sub.sh, to run on the batch queue: | |||
<div class="gscript2"> | <div class="gscript2"> | ||
Line 40: | Line 55: | ||
<nowiki>#</nowiki>SBATCH --time=<u>08:00:00</u><br> | <nowiki>#</nowiki>SBATCH --time=<u>08:00:00</u><br> | ||
<nowiki>#</nowiki>SBATCH --output=Bowtie2.%j.out<br> | <nowiki>#</nowiki>SBATCH --output=Bowtie2.%j.out<br> | ||
<nowiki>#</nowiki>SBATCH --error=Bowtie2.%j.err<br> | |||
cd $SLURM_SUBMIT_DIR<br> | cd $SLURM_SUBMIT_DIR<br> | ||
ml Bowtie2/2. | ml Bowtie2/2.4.5-GCC-11.3.0<br> | ||
bowtie2 <u>[options]</u><br> | bowtie2 <u>[options]</u><br> | ||
</div> | </div> | ||
Line 58: | Line 74: | ||
<pre class="gcommand"> | <pre class="gcommand"> | ||
ml Bowtie2/2. | ml Bowtie2/2.4.5-GCC-11.3.0 | ||
bowtie2 -h | |||
Bowtie 2 version 2. | |||
Bowtie 2 version 2.4.5 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea) | |||
Usage: | Usage: | ||
bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r> | --interleaved <i>} [-S <sam>] | bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r> | --interleaved <i> | -b <bam>} [-S <sam>] | ||
<bt2-idx> Index filename prefix (minus trailing .X.bt2). | <bt2-idx> Index filename prefix (minus trailing .X.bt2). | ||
Line 72: | Line 89: | ||
<r> Files with unpaired reads. | <r> Files with unpaired reads. | ||
Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2). | Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2). | ||
<i> Files with interleaved paired-end FASTQ reads | <i> Files with interleaved paired-end FASTQ/FASTA reads | ||
Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2). | Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2). | ||
<bam> Files are unaligned BAM sorted by read name. | |||
<sam> File for SAM output (default: stdout) | <sam> File for SAM output (default: stdout) | ||
Line 88: | Line 106: | ||
-f query input files are (multi-)FASTA .fa/.mfa | -f query input files are (multi-)FASTA .fa/.mfa | ||
-r query input files are raw one-sequence-per-line | -r query input files are raw one-sequence-per-line | ||
-F k:<int>,i:<int> query input files are continuous FASTA where reads | |||
are substrings (k-mers) extracted from a FASTA file <s> | |||
and aligned at offsets 1, 1+i, 1+2i ... end of reference | |||
-c <m1>, <m2>, <r> are sequences themselves, not files | -c <m1>, <m2>, <r> are sequences themselves, not files | ||
-s/--skip <int> skip the first <int> reads/pairs in the input (none) | -s/--skip <int> skip the first <int> reads/pairs in the input (none) | ||
Line 93: | Line 114: | ||
-5/--trim5 <int> trim <int> bases from 5'/left end of reads (0) | -5/--trim5 <int> trim <int> bases from 5'/left end of reads (0) | ||
-3/--trim3 <int> trim <int> bases from 3'/right end of reads (0) | -3/--trim3 <int> trim <int> bases from 3'/right end of reads (0) | ||
--trim-to [3:|5:]<int> trim reads exceeding <int> bases from either 3' or 5' end | |||
If the read end is not specified then it defaults to 3 (0) | |||
--phred33 qualities are Phred+33 (default) | --phred33 qualities are Phred+33 (default) | ||
--phred64 qualities are Phred+64 | --phred64 qualities are Phred+64 | ||
Line 155: | Line 178: | ||
--no-contain not concordant when one mate alignment contains other | --no-contain not concordant when one mate alignment contains other | ||
--no-overlap not concordant when mates overlap at all | --no-overlap not concordant when mates overlap at all | ||
BAM: | |||
--align-paired-reads | |||
Bowtie2 will, by default, attempt to align unpaired BAM reads. | |||
Use this option to align paired-end reads instead. | |||
--preserve-tags Preserve tags from the original BAM record by | |||
appending them to the end of the corresponding SAM output. | |||
Output: | Output: | ||
-t/--time print wall-clock time taken by search phases | -t/--time print wall-clock time taken by search phases | ||
--un <path> | --un <path> write unpaired reads that didn't align to <path> | ||
--al <path> | --al <path> write unpaired reads that aligned at least once to <path> | ||
--un-conc <path> | --un-conc <path> write pairs that didn't align concordantly to <path> | ||
--al-conc <path> | --al-conc <path> write pairs that aligned concordantly at least once to <path> | ||
(Note: for --un, --al, --un-conc, or --al-conc, add '-gz' to the option name, e.g. | |||
--un-gz <path>, to gzip compress output, or add '-bz2' to bzip2 compress output.) | |||
--quiet print nothing to stderr except serious errors | --quiet print nothing to stderr except serious errors | ||
--met-file <path> send metrics to file at <path> (off) | --met-file <path> send metrics to file at <path> (off) | ||
Line 175: | Line 205: | ||
Note: @RG line only printed when --rg-id is set. | Note: @RG line only printed when --rg-id is set. | ||
--omit-sec-seq put '*' in SEQ and QUAL fields for secondary alignments. | --omit-sec-seq put '*' in SEQ and QUAL fields for secondary alignments. | ||
--sam- | --sam-no-qname-trunc | ||
Suppress standard behavior of truncating readname at first whitespace | |||
at the expense of generating non-standard SAM. | |||
--xeq Use '='/'X', instead of 'M,' to specify matches/mismatches in SAM record. | --xeq Use '='/'X', instead of 'M,' to specify matches/mismatches in SAM record. | ||
--soft-clipped-unmapped-tlen Exclude soft-clipped bases when reporting TLEN | --soft-clipped-unmapped-tlen | ||
Exclude soft-clipped bases when reporting TLEN | |||
--sam-append-comment | |||
Append FASTA/FASTQ comment to SAM record | |||
Performance: | Performance: | ||
Line 188: | Line 222: | ||
--qc-filter filter out reads that are bad according to QSEQ filter | --qc-filter filter out reads that are bad according to QSEQ filter | ||
--seed <int> seed for random number generator (0) | --seed <int> seed for random number generator (0) | ||
--non-deterministic seed rand. gen. arbitrarily instead of using read attributes | --non-deterministic | ||
seed rand. gen. arbitrarily instead of using read attributes | |||
--version print version information and quit | --version print version information and quit | ||
-h/--help print this usage message | -h/--help print this usage message |
Latest revision as of 09:12, 9 May 2024
Category
Bioinformatics
Program On
Teaching
Version
2.4.1, 2.4.4, 2.4.5, 2.5.2
Author / Distributor
Description
"Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes." More details are at Bowtie2
Running Program
The last version of this application is at /apps/eb/Bowtie2/2.5.2-GCC-11.3.0
To use this version, please load the module with
ml Bowtie2/2.5.2-GCC-11.3.0
To use version 2.4.5, please load the module with
ml Bowtie2/2.4.5-GCC-11.3.0
To use version 2.4.4, please load the module with
ml Bowtie2/2.4.4-GCC-11.2.0
To use version 2.4.1, please load the module with
ml Bowtie2/2.4.1-GCC-8.3.0
Here is an example of a shell script, sub.sh, to run on the batch queue:
#!/bin/bash
#SBATCH --job-name=j_Bowtie2
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=Bowtie2.%j.out
#SBATCH --error=Bowtie2.%j.err
cd $SLURM_SUBMIT_DIR
ml Bowtie2/2.4.5-GCC-11.3.0
bowtie2 [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
ml Bowtie2/2.4.5-GCC-11.3.0 bowtie2 -h Bowtie 2 version 2.4.5 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea) Usage: bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r> | --interleaved <i> | -b <bam>} [-S <sam>] <bt2-idx> Index filename prefix (minus trailing .X.bt2). NOTE: Bowtie 1 and Bowtie 2 indexes are not compatible. <m1> Files with #1 mates, paired with files in <m2>. Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2). <m2> Files with #2 mates, paired with files in <m1>. Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2). <r> Files with unpaired reads. Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2). <i> Files with interleaved paired-end FASTQ/FASTA reads Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2). <bam> Files are unaligned BAM sorted by read name. <sam> File for SAM output (default: stdout) <m1>, <m2>, <r> can be comma-separated lists (no whitespace) and can be specified many times. E.g. '-U file1.fq,file2.fq -U file3.fq'. Options (defaults in parentheses): Input: -q query input files are FASTQ .fq/.fastq (default) --tab5 query input files are TAB5 .tab5 --tab6 query input files are TAB6 .tab6 --qseq query input files are in Illumina's qseq format -f query input files are (multi-)FASTA .fa/.mfa -r query input files are raw one-sequence-per-line -F k:<int>,i:<int> query input files are continuous FASTA where reads are substrings (k-mers) extracted from a FASTA file <s> and aligned at offsets 1, 1+i, 1+2i ... end of reference -c <m1>, <m2>, <r> are sequences themselves, not files -s/--skip <int> skip the first <int> reads/pairs in the input (none) -u/--upto <int> stop after first <int> reads/pairs (no limit) -5/--trim5 <int> trim <int> bases from 5'/left end of reads (0) -3/--trim3 <int> trim <int> bases from 3'/right end of reads (0) --trim-to [3:|5:]<int> trim reads exceeding <int> bases from either 3' or 5' end If the read end is not specified then it defaults to 3 (0) --phred33 qualities are Phred+33 (default) --phred64 qualities are Phred+64 --int-quals qualities encoded as space-delimited integers Presets: Same as: For --end-to-end: --very-fast -D 5 -R 1 -N 0 -L 22 -i S,0,2.50 --fast -D 10 -R 2 -N 0 -L 22 -i S,0,2.50 --sensitive -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default) --very-sensitive -D 20 -R 3 -N 0 -L 20 -i S,1,0.50 For --local: --very-fast-local -D 5 -R 1 -N 0 -L 25 -i S,1,2.00 --fast-local -D 10 -R 2 -N 0 -L 22 -i S,1,1.75 --sensitive-local -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default) --very-sensitive-local -D 20 -R 3 -N 0 -L 20 -i S,1,0.50 Alignment: -N <int> max # mismatches in seed alignment; can be 0 or 1 (0) -L <int> length of seed substrings; must be >3, <32 (22) -i <func> interval between seed substrings w/r/t read len (S,1,1.15) --n-ceil <func> func for max # non-A/C/G/Ts permitted in aln (L,0,0.15) --dpad <int> include <int> extra ref chars on sides of DP table (15) --gbar <int> disallow gaps within <int> nucs of read extremes (4) --ignore-quals treat all quality values as 30 on Phred scale (off) --nofw do not align forward (original) version of read (off) --norc do not align reverse-complement version of read (off) --no-1mm-upfront do not allow 1 mismatch alignments before attempting to scan for the optimal seeded alignments --end-to-end entire read must align; no clipping (on) OR --local local alignment; ends might be soft clipped (off) Scoring: --ma <int> match bonus (0 for --end-to-end, 2 for --local) --mp <int> max penalty for mismatch; lower qual = lower penalty (6) --np <int> penalty for non-A/C/G/Ts in read/ref (1) --rdg <int>,<int> read gap open, extend penalties (5,3) --rfg <int>,<int> reference gap open, extend penalties (5,3) --score-min <func> min acceptable alignment score w/r/t read length (G,20,8 for local, L,-0.6,-0.6 for end-to-end) Reporting: (default) look for multiple alignments, report best, with MAPQ OR -k <int> report up to <int> alns per read; MAPQ not meaningful OR -a/--all report all alignments; very slow, MAPQ not meaningful Effort: -D <int> give up extending after <int> failed extends in a row (15) -R <int> for reads w/ repetitive seeds, try <int> sets of seeds (2) Paired-end: -I/--minins <int> minimum fragment length (0) -X/--maxins <int> maximum fragment length (500) --fr/--rf/--ff -1, -2 mates align fw/rev, rev/fw, fw/fw (--fr) --no-mixed suppress unpaired alignments for paired reads --no-discordant suppress discordant alignments for paired reads --dovetail concordant when mates extend past each other --no-contain not concordant when one mate alignment contains other --no-overlap not concordant when mates overlap at all BAM: --align-paired-reads Bowtie2 will, by default, attempt to align unpaired BAM reads. Use this option to align paired-end reads instead. --preserve-tags Preserve tags from the original BAM record by appending them to the end of the corresponding SAM output. Output: -t/--time print wall-clock time taken by search phases --un <path> write unpaired reads that didn't align to <path> --al <path> write unpaired reads that aligned at least once to <path> --un-conc <path> write pairs that didn't align concordantly to <path> --al-conc <path> write pairs that aligned concordantly at least once to <path> (Note: for --un, --al, --un-conc, or --al-conc, add '-gz' to the option name, e.g. --un-gz <path>, to gzip compress output, or add '-bz2' to bzip2 compress output.) --quiet print nothing to stderr except serious errors --met-file <path> send metrics to file at <path> (off) --met-stderr send metrics to stderr (off) --met <int> report internal counters & metrics every <int> secs (1) --no-unal suppress SAM records for unaligned reads --no-head suppress header lines, i.e. lines starting with @ --no-sq suppress @SQ header lines --rg-id <text> set read group id, reflected in @RG line and RG:Z: opt field --rg <text> add <text> ("lab:value") to @RG line of SAM header. Note: @RG line only printed when --rg-id is set. --omit-sec-seq put '*' in SEQ and QUAL fields for secondary alignments. --sam-no-qname-trunc Suppress standard behavior of truncating readname at first whitespace at the expense of generating non-standard SAM. --xeq Use '='/'X', instead of 'M,' to specify matches/mismatches in SAM record. --soft-clipped-unmapped-tlen Exclude soft-clipped bases when reporting TLEN --sam-append-comment Append FASTA/FASTQ comment to SAM record Performance: -p/--threads <int> number of alignment threads to launch (1) --reorder force SAM output order to match order of input reads --mm use memory-mapped I/O for index; many 'bowtie's can share Other: --qc-filter filter out reads that are bad according to QSEQ filter --seed <int> seed for random number generator (0) --non-deterministic seed rand. gen. arbitrarily instead of using read attributes --version print version information and quit -h/--help print this usage message
Installation
Source code is obtained from Bowtie2
System
64-bit Linux