HISAT2-Teaching: Difference between revisions
(Created page with "Category:TeachingCategory:SoftwareCategory:Bioinformatics === Category === Bioinformatics === Program On === Teaching === Version === 2.1.0 === A...") |
No edit summary |
||
| Line 21: | Line 21: | ||
=== Running Program === | === Running Program === | ||
The last version of this application is at | The last version of this application is at /apps/eb/HISAT2/2.1.0-foss-2019b | ||
To use this version, please load the module with | To use this version, please load the module with | ||
<pre class="gscript"> | <pre class="gscript"> | ||
ml HISAT2/2.1.0-foss- | ml HISAT2/2.1.0-foss-2019b | ||
</pre> | </pre> | ||
| Line 43: | Line 43: | ||
cd $SLURM_SUBMIT_DIR<br> | cd $SLURM_SUBMIT_DIR<br> | ||
ml HISAT2/2.1.0-foss- | ml HISAT2/2.1.0-foss-2019b<br> | ||
hisat2 <u>[options]</u><br> | hisat2 <u>[options]</u><br> | ||
</div> | </div> | ||
| Line 59: | Line 59: | ||
<pre class="gcommand"> | <pre class="gcommand"> | ||
ml HISAT2/2.1.0-foss- | ml HISAT2/2.1.0-foss-2019b | ||
hisat2 -help | hisat2 -help | ||
HISAT2 version 2.1.0 by Daehwan Kim (infphilo@gmail.com, www.ccb.jhu.edu/people/infphilo) | HISAT2 version 2.1.0 by Daehwan Kim (infphilo@gmail.com, www.ccb.jhu.edu/people/infphilo) | ||
Revision as of 21:09, 27 August 2020
Category
Bioinformatics
Program On
Teaching
Version
2.1.0
Author / Distributor
Description
"HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) against the general human population (as well as against a single reference genome)." More details are at HISAT2
Running Program
The last version of this application is at /apps/eb/HISAT2/2.1.0-foss-2019b
To use this version, please load the module with
ml HISAT2/2.1.0-foss-2019b
Here is an example of a shell script, sub.sh, to run on the batch queue:
#!/bin/bash
#SBATCH --job-name=j_HISAT2
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=HISAT2.%j.out
#SBATCH --error=HISAT2.%j.err
cd $SLURM_SUBMIT_DIR
ml HISAT2/2.1.0-foss-2019b
hisat2 [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
ml HISAT2/2.1.0-foss-2019b
hisat2 -help
HISAT2 version 2.1.0 by Daehwan Kim (infphilo@gmail.com, www.ccb.jhu.edu/people/infphilo)
Usage:
hisat2 [options]* -x <ht2-idx> {-1 <m1> -2 <m2> | -U <r> | --sra-acc <SRA accession number>} [-S <sam>]
<ht2-idx> Index filename prefix (minus trailing .X.ht2).
<m1> Files with #1 mates, paired with files in <m2>.
Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
<m2> Files with #2 mates, paired with files in <m1>.
Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
<r> Files with unpaired reads.
Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
<SRA accession number> Comma-separated list of SRA accession numbers, e.g. --sra-acc SRR353653,SRR353654.
<sam> File for SAM output (default: stdout)
<m1>, <m2>, <r> can be comma-separated lists (no whitespace) and can be
specified many times. E.g. '-U file1.fq,file2.fq -U file3.fq'.
Options (defaults in parentheses):
Input:
-q query input files are FASTQ .fq/.fastq (default)
--qseq query input files are in Illumina's qseq format
-f query input files are (multi-)FASTA .fa/.mfa
-r query input files are raw one-sequence-per-line
-c <m1>, <m2>, <r> are sequences themselves, not files
-s/--skip <int> skip the first <int> reads/pairs in the input (none)
-u/--upto <int> stop after first <int> reads/pairs (no limit)
-5/--trim5 <int> trim <int> bases from 5'/left end of reads (0)
-3/--trim3 <int> trim <int> bases from 3'/right end of reads (0)
--phred33 qualities are Phred+33 (default)
--phred64 qualities are Phred+64
--int-quals qualities encoded as space-delimited integers
--sra-acc SRA accession ID
Alignment:
--n-ceil <func> func for max # non-A/C/G/Ts permitted in aln (L,0,0.15)
--ignore-quals treat all quality values as 30 on Phred scale (off)
--nofw do not align forward (original) version of read (off)
--norc do not align reverse-complement version of read (off)
Spliced Alignment:
--pen-cansplice <int> penalty for a canonical splice site (0)
--pen-noncansplice <int> penalty for a non-canonical splice site (12)
--pen-canintronlen <func> penalty for long introns (G,-8,1) with canonical splice sites
--pen-noncanintronlen <func> penalty for long introns (G,-8,1) with noncanonical splice sites
--min-intronlen <int> minimum intron length (20)
--max-intronlen <int> maximum intron length (500000)
--known-splicesite-infile <path> provide a list of known splice sites
--novel-splicesite-outfile <path> report a list of splice sites
--novel-splicesite-infile <path> provide a list of novel splice sites
--no-temp-splicesite disable the use of splice sites found
--no-spliced-alignment disable spliced alignment
--rna-strandness <string> specify strand-specific information (unstranded)
--tmo reports only those alignments within known transcriptome
--dta reports alignments tailored for transcript assemblers
--dta-cufflinks reports alignments tailored specifically for cufflinks
--avoid-pseudogene tries to avoid aligning reads to pseudogenes (experimental option)�
--no-templatelen-adjustment disables template length adjustment for RNA-seq reads
Scoring:
--mp <int>,<int> max and min penalties for mismatch; lower qual = lower penalty <6,2>
--sp <int>,<int> max and min penalties for soft-clipping; lower qual = lower penalty <2,1>
--no-softclip no soft-clipping
--np <int> penalty for non-A/C/G/Ts in read/ref (1)
--rdg <int>,<int> read gap open, extend penalties (5,3)
--rfg <int>,<int> reference gap open, extend penalties (5,3)
--score-min <func> min acceptable alignment score w/r/t read length
(L,0.0,-0.2)
Reporting:
-k <int> (default: 5) report up to <int> alns per read
Paired-end:
-I/--minins <int> minimum fragment length (0), only valid with --no-spliced-alignment
-X/--maxins <int> maximum fragment length (500), only valid with --no-spliced-alignment
--fr/--rf/--ff -1, -2 mates align fw/rev, rev/fw, fw/fw (--fr)
--no-mixed suppress unpaired alignments for paired reads
--no-discordant suppress discordant alignments for paired reads
Output:
-t/--time print wall-clock time taken by search phases
--un <path> write unpaired reads that didn't align to <path>
--al <path> write unpaired reads that aligned at least once to <path>
--un-conc <path> write pairs that didn't align concordantly to <path>
--al-conc <path> write pairs that aligned concordantly at least once to <path>
(Note: for --un, --al, --un-conc, or --al-conc, add '-gz' to the option name, e.g.
--un-gz <path>, to gzip compress output, or add '-bz2' to bzip2 compress output.)
--summary-file print alignment summary to this file.
--new-summary print alignment summary in a new style, which is more machine-friendly.
--quiet print nothing to stderr except serious errors
--met-file <path> send metrics to file at <path> (off)
--met-stderr send metrics to stderr (off)
--met <int> report internal counters & metrics every <int> secs (1)
--no-head supppress header lines, i.e. lines starting with @
--no-sq supppress @SQ header lines
--rg-id <text> set read group id, reflected in @RG line and RG:Z: opt field
--rg <text> add <text> ("lab:value") to @RG line of SAM header.
Note: @RG line only printed when --rg-id is set.
--omit-sec-seq put '*' in SEQ and QUAL fields for secondary alignments.
Performance:
-o/--offrate <int> override offrate of index; must be >= index's offrate
-p/--threads <int> number of alignment threads to launch (1)
--reorder force SAM output order to match order of input reads
--mm use memory-mapped I/O for index; many 'hisat2's can share
Other:
--qc-filter filter out reads that are bad according to QSEQ filter
--seed <int> seed for random number generator (0)
--non-deterministic seed rand. gen. arbitrarily instead of using read attributes
--remove-chrname remove 'chr' from reference names in alignment
--add-chrname add 'chr' to reference names in alignment
--version print version information and quit
-h/--help print this usage message
Installation
Source code is obtained from HISAT2
System
64-bit Linux