StringTie-Teaching
Category
Bioinformatics
Program On
Teaching
Version
2.1.1
Author / Distributor
Description
"StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts." More details are at StringTie
Running Program
Version 2.1.1 of this application is in /apps/eb/StringTie/2.1.1-GCC-8.3.0
To use this version, please load the module with
ml StringTie/2.1.1-GCC-8.3.0
Here is an example of a shell script, sub.sh, to run on the batch queue:
#!/bin/bash
#SBATCH --job-name=j_StringTie
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=StringTie.%j.out
#SBATCH --error=StringTie.%j.err
cd $SLURM_SUBMIT_DIR
ml StringTie/2.1.1-GCC-8.3.0
stringtie [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
ml StringTie/2.1.1-GCC-8.3.0
stringtie --help
StringTie v2.1.1 usage:
stringtie <input.bam ..> [-G <guide_gff>] [-l <label>] [-o <out_gtf>] [-p <cpus>]
[-v] [-a <min_anchor_len>] [-m <min_tlen>] [-j <min_anchor_cov>] [-f <min_iso>]
[-C <coverage_file_name>] [-c <min_bundle_cov>] [-g <bdist>] [-u] [-L]
[-e] [-x <seqid,..>] [-A <gene_abund.out>] [-h] {-B | -b <dir_path>}
Assemble RNA-Seq alignments into potential transcripts.
Options:
--version : print just the version at stdout and exit
--conservative : conservative transcriptome assembly, same as -t -c 1.5 -f 0.05
--rf assume stranded library fr-firststrand
--fr assume stranded library fr-secondstrand
-G reference annotation to use for guiding the assembly process (GTF/GFF3)
-o output path/file name for the assembled transcripts GTF (default: stdout)
-l name prefix for output transcripts (default: STRG)
-f minimum isoform fraction (default: 0.01)
-L use long reads settings (default:false)
-R if long reads are provided, just clean and collapse the reads but do not assemble
-m minimum assembled transcript length (default: 200)
-a minimum anchor length for junctions (default: 10)
-j minimum junction coverage (default: 1)
-t disable trimming of predicted transcripts based on coverage
(default: coverage trimming is enabled)
-c minimum reads per bp coverage to consider for multi-exon transcript
(default: 1)
-s minimum reads per bp coverage to consider for single-exon transcript
(default: 4.75)
-v verbose (log bundle processing details)
-g maximum gap allowed between read mappings (default: 50)
-M fraction of bundle allowed to be covered by multi-hit reads (default:1)
-p number of threads (CPUs) to use (default: 1)
-A gene abundance estimation output file
-B enable output of Ballgown table files which will be created in the
same directory as the output GTF (requires -G, -o recommended)
-b enable output of Ballgown table files but these files will be
created under the directory path given as <dir_path>
-e only estimate the abundance of given reference transcripts (requires -G)
-x do not assemble any transcripts on the given reference sequence(s)
-u no multi-mapping correction (default: correction enabled)
-h print this usage message and exit
Transcript merge usage mode:
stringtie --merge [Options] { gtf_list | strg1.gtf ...}
With this option StringTie will assemble transcripts from multiple
input files generating a unified non-redundant set of isoforms. In this mode
the following options are available:
-G <guide_gff> reference annotation to include in the merging (GTF/GFF3)
-o <out_gtf> output file name for the merged transcripts GTF
(default: stdout)
-m <min_len> minimum input transcript length to include in the merge
(default: 50)
-c <min_cov> minimum input transcript coverage to include in the merge
(default: 0)
-F <min_fpkm> minimum input transcript FPKM to include in the merge
(default: 1.0)
-T <min_tpm> minimum input transcript TPM to include in the merge
(default: 1.0)
-f <min_iso> minimum isoform fraction (default: 0.01)
-g <gap_len> gap between transcripts to merge together (default: 250)
-i keep merged transcripts with retained introns; by default
these are not kept unless there is strong evidence for them
-l <label> name prefix for output transcripts (default: MSTRG)
Installation
Source code is obtained from StringTie
System
64-bit Linux