TopHat-Teaching
Category
Bioinformatics
Program On
Teaching
Version
2.1.1
Author / Distributor
Description
"TopHat is a fast splice junction mapper for RNA-Seq reads." More details are at TopHat
Running Program
The last version of this application is at /usr/local/apps/eb/TopHat/2.1.1-foss-2016b
To use this version, please load the module with
ml TopHat/2.1.1-foss-2016b
Here is an example of a shell script, sub.sh, to run on the batch queue:
#!/bin/bash
#SBATCH --job-name=j_TopHat
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=TopHat.%j.out
#SBATCH --error=TopHat.%j.err
cd $SLURM_SUBMIT_DIR
ml TopHat/2.1.1-foss-2016b
tophat [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
ml TopHat/2.1.1-foss-2016b
tophat --help
tophat:
TopHat maps short sequences from spliced transcripts to whole genomes.
Usage:
tophat [options] <bowtie_index> <reads1[,reads2,...]> [reads1[,reads2,...]] \
[quals1,[quals2,...]] [quals1[,quals2,...]]
Options:
-v/--version
-o/--output-dir <string> [ default: ./tophat_out ]
--bowtie1 [ default: bowtie2 ]
-N/--read-mismatches <int> [ default: 2 ]
--read-gap-length <int> [ default: 2 ]
--read-edit-dist <int> [ default: 2 ]
--read-realign-edit-dist <int> [ default: "read-edit-dist" + 1 ]
-a/--min-anchor <int> [ default: 8 ]
-m/--splice-mismatches <0-2> [ default: 0 ]
-i/--min-intron-length <int> [ default: 50 ]
-I/--max-intron-length <int> [ default: 500000 ]
-g/--max-multihits <int> [ default: 20 ]
--suppress-hits
-x/--transcriptome-max-hits <int> [ default: 60 ]
-M/--prefilter-multihits ( for -G/--GTF option, enable
an initial bowtie search
against the genome )
--max-insertion-length <int> [ default: 3 ]
--max-deletion-length <int> [ default: 3 ]
--solexa-quals
--solexa1.3-quals (same as phred64-quals)
--phred64-quals (same as solexa1.3-quals)
-Q/--quals
--integer-quals
-C/--color (Solid - color space)
--color-out
--library-type <string> (fr-unstranded, fr-firststrand,
fr-secondstrand)
-p/--num-threads <int> [ default: 1 ]
-R/--resume <out_dir> ( try to resume execution )
-G/--GTF <filename> (GTF/GFF with known transcripts)
--transcriptome-index <bwtidx> (transcriptome bowtie index)
-T/--transcriptome-only (map only to the transcriptome)
-j/--raw-juncs <filename>
--insertions <filename>
--deletions <filename>
-r/--mate-inner-dist <int> [ default: 50 ]
--mate-std-dev <int> [ default: 20 ]
--no-novel-juncs
--no-novel-indels
--no-gtf-juncs
--no-coverage-search
--coverage-search
--microexon-search
--keep-tmp
--tmp-dir <dirname> [ default: <output_dir>/tmp ]
-z/--zpacker <program> [ default: gzip ]
-X/--unmapped-fifo [use mkfifo to compress more temporary
files for color space reads]
Advanced Options:
--report-secondary-alignments
--no-discordant
--no-mixed
--segment-mismatches <int> [ default: 2 ]
--segment-length <int> [ default: 25 ]
--bowtie-n [ default: bowtie -v ]
--min-coverage-intron <int> [ default: 50 ]
--max-coverage-intron <int> [ default: 20000 ]
--min-segment-intron <int> [ default: 50 ]
--max-segment-intron <int> [ default: 500000 ]
--no-sort-bam (Output BAM is not coordinate-sorted)
--no-convert-bam (Do not output bam format.
Output is <output_dir>/accepted_hits.sam)
--keep-fasta-order
--allow-partial-mapping
Bowtie2 related options:
Preset options in --end-to-end mode (local alignment is not used in TopHat2)
--b2-very-fast
--b2-fast
--b2-sensitive
--b2-very-sensitive
Alignment options
--b2-N <int> [ default: 0 ]
--b2-L <int> [ default: 20 ]
--b2-i <func> [ default: S,1,1.25 ]
--b2-n-ceil <func> [ default: L,0,0.15 ]
--b2-gbar <int> [ default: 4 ]
Scoring options
--b2-mp <int>,<int> [ default: 6,2 ]
--b2-np <int> [ default: 1 ]
--b2-rdg <int>,<int> [ default: 5,3 ]
--b2-rfg <int>,<int> [ default: 5,3 ]
--b2-score-min <func> [ default: L,-0.6,-0.6 ]
Effort options
--b2-D <int> [ default: 15 ]
--b2-R <int> [ default: 2 ]
Fusion related options:
--fusion-search
--fusion-anchor-length <int> [ default: 20 ]
--fusion-min-dist <int> [ default: 10000000 ]
--fusion-read-mismatches <int> [ default: 2 ]
--fusion-multireads <int> [ default: 2 ]
--fusion-multipairs <int> [ default: 2 ]
--fusion-ignore-chromosomes <list> [ e.g, <chrM,chrX> ]
--fusion-do-not-resolve-conflicts [this is for test purposes ]
SAM Header Options (for embedding sequencing run metadata in output):
--rg-id <string> (read group ID)
--rg-sample <string> (sample ID)
--rg-library <string> (library ID)
--rg-description <string> (descriptive string, no tabs allowed)
--rg-platform-unit <string> (e.g Illumina lane ID)
--rg-center <string> (sequencing center name)
--rg-date <string> (ISO 8601 date of the sequencing run)
--rg-platform <string> (Sequencing platform descriptor)
for detailed help see http://ccb.jhu.edu/software/tophat/manual.shtml
Installation
Source code is obtained from TopHat
System
64-bit Linux