StringTie-Teaching: Difference between revisions
No edit summary |
|||
(3 intermediate revisions by 2 users not shown) | |||
Line 9: | Line 9: | ||
=== Version === | === Version === | ||
2.2.1 | |||
=== Author / Distributor === | === Author / Distributor === | ||
Line 21: | Line 21: | ||
=== Running Program === | === Running Program === | ||
Version 2.2.1 of this application is in /apps/eb/StringTie/2.2.1-GCC-11.2.0 or /apps/eb/StringTie/2.2.1-GCC-11.3.0 | |||
To use this version, please load the module with | To use this version, please load the module with | ||
<pre class="gscript"> | <pre class="gscript"> | ||
ml StringTie/ | ml StringTie/2.2.1-GCC-11.2.0 | ||
</pre> | </pre> | ||
Line 43: | Line 43: | ||
cd $SLURM_SUBMIT_DIR<br> | cd $SLURM_SUBMIT_DIR<br> | ||
ml StringTie/ | ml StringTie/2.2.1-GCC-11.2.0<br> | ||
stringtie <u>[options]</u><br> | stringtie <u>[options]</u><br> | ||
</div> | </div> | ||
Line 59: | Line 59: | ||
<pre class="gcommand"> | <pre class="gcommand"> | ||
ml StringTie/ | ml StringTie/2.2.1-GCC-11.2.0 | ||
stringtie --help | stringtie --help | ||
StringTie | |||
StringTie v2.2.1 usage: | |||
stringtie <in.bam ..> [-G <guide_gff>] [-l <prefix>] [-o <out.gtf>] [-p <cpus>] | |||
[-v] [-a <min_anchor_len>] [-m <min_len>] [-j <min_anchor_cov>] [-f <min_iso>] | |||
[-c <min_bundle_cov>] [-g <bdist>] [-u] [-L] [-e] [--viral] [-E <err_margin>] | |||
[--ptf <f_tab>] [-x <seqid,..>] [-A <gene_abund.out>] [-h] {-B|-b <dir_path>} | |||
[--mix] [--conservative] [--rf] [--fr] | |||
Assemble RNA-Seq alignments into potential transcripts. | Assemble RNA-Seq alignments into potential transcripts. | ||
Options: | |||
--version : print just the version at stdout and exit | --version : print just the version at stdout and exit | ||
- | --conservative : conservative transcript assembly, same as -t -c 1.5 -f 0.05 | ||
--rf assume stranded library fr-firststrand | --mix : both short and long read data alignments are provided | ||
--fr assume stranded library fr-secondstrand | (long read alignments must be the 2nd BAM/CRAM input file) | ||
--rf : assume stranded library fr-firststrand | |||
--fr : assume stranded library fr-secondstrand | |||
-G reference annotation to use for guiding the assembly process (GTF/GFF) | |||
--ptf : load point-features from a given 4 column feature file <f_tab> | |||
-o output path/file name for the assembled transcripts GTF (default: stdout) | |||
-l name prefix for output transcripts (default: STRG) | -l name prefix for output transcripts (default: STRG) | ||
-f minimum isoform fraction (default: 0.1) | -f minimum isoform fraction (default: 0.01) | ||
-L long reads processing; also enforces -s 1.5 -g 0 (default:false) | |||
-R if long reads are provided, just clean and collapse the reads but | |||
do not assemble | |||
-m minimum assembled transcript length (default: 200) | -m minimum assembled transcript length (default: 200) | ||
-a minimum anchor length for junctions (default: 10) | -a minimum anchor length for junctions (default: 10) | ||
-j minimum junction coverage (default: 1) | -j minimum junction coverage (default: 1) | ||
-t disable trimming of predicted transcripts based on coverage | -t disable trimming of predicted transcripts based on coverage | ||
(default: coverage trimming is enabled) | (default: coverage trimming is enabled) | ||
-c minimum reads per bp coverage to consider for transcript | -c minimum reads per bp coverage to consider for multi-exon transcript | ||
(default: | (default: 1) | ||
-s minimum reads per bp coverage to consider for single-exon transcript | |||
(default: 4.75) | |||
-v verbose (log bundle processing details) | -v verbose (log bundle processing details) | ||
-g gap between read mappings | -g maximum gap allowed between read mappings (default: 50) | ||
-M fraction of bundle allowed to be covered by multi-hit reads (default:1) | |||
-M fraction of bundle allowed to be covered by multi-hit reads (default: | |||
-p number of threads (CPUs) to use (default: 1) | -p number of threads (CPUs) to use (default: 1) | ||
-A gene abundance estimation output file | -A gene abundance estimation output file | ||
-E define window around possibly erroneous splice sites from long reads to | |||
look out for correct splice sites (default: 25) | |||
-B enable output of Ballgown table files which will be created in the | -B enable output of Ballgown table files which will be created in the | ||
same directory as the output GTF (requires -G, -o recommended) | same directory as the output GTF (requires -G, -o recommended) | ||
Line 93: | Line 107: | ||
created under the directory path given as <dir_path> | created under the directory path given as <dir_path> | ||
-e only estimate the abundance of given reference transcripts (requires -G) | -e only estimate the abundance of given reference transcripts (requires -G) | ||
--viral : only relevant for long reads from viral data where splice sites | |||
do not follow consensus (default:false) | |||
-x do not assemble any transcripts on the given reference sequence(s) | -x do not assemble any transcripts on the given reference sequence(s) | ||
-u no multi-mapping correction (default: correction enabled) | -u no multi-mapping correction (default: correction enabled) | ||
-h print this usage message and exit | -h print this usage message and exit | ||
--ref/--cram-ref reference genome FASTA file for CRAM input | |||
Transcript merge usage mode: | Transcript merge usage mode: |
Latest revision as of 08:37, 14 May 2024
Category
Bioinformatics
Program On
Teaching
Version
2.2.1
Author / Distributor
Description
"StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts." More details are at StringTie
Running Program
Version 2.2.1 of this application is in /apps/eb/StringTie/2.2.1-GCC-11.2.0 or /apps/eb/StringTie/2.2.1-GCC-11.3.0
To use this version, please load the module with
ml StringTie/2.2.1-GCC-11.2.0
Here is an example of a shell script, sub.sh, to run on the batch queue:
#!/bin/bash
#SBATCH --job-name=j_StringTie
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=StringTie.%j.out
#SBATCH --error=StringTie.%j.err
cd $SLURM_SUBMIT_DIR
ml StringTie/2.2.1-GCC-11.2.0
stringtie [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
ml StringTie/2.2.1-GCC-11.2.0 stringtie --help StringTie v2.2.1 usage: stringtie <in.bam ..> [-G <guide_gff>] [-l <prefix>] [-o <out.gtf>] [-p <cpus>] [-v] [-a <min_anchor_len>] [-m <min_len>] [-j <min_anchor_cov>] [-f <min_iso>] [-c <min_bundle_cov>] [-g <bdist>] [-u] [-L] [-e] [--viral] [-E <err_margin>] [--ptf <f_tab>] [-x <seqid,..>] [-A <gene_abund.out>] [-h] {-B|-b <dir_path>} [--mix] [--conservative] [--rf] [--fr] Assemble RNA-Seq alignments into potential transcripts. Options: --version : print just the version at stdout and exit --conservative : conservative transcript assembly, same as -t -c 1.5 -f 0.05 --mix : both short and long read data alignments are provided (long read alignments must be the 2nd BAM/CRAM input file) --rf : assume stranded library fr-firststrand --fr : assume stranded library fr-secondstrand -G reference annotation to use for guiding the assembly process (GTF/GFF) --ptf : load point-features from a given 4 column feature file <f_tab> -o output path/file name for the assembled transcripts GTF (default: stdout) -l name prefix for output transcripts (default: STRG) -f minimum isoform fraction (default: 0.01) -L long reads processing; also enforces -s 1.5 -g 0 (default:false) -R if long reads are provided, just clean and collapse the reads but do not assemble -m minimum assembled transcript length (default: 200) -a minimum anchor length for junctions (default: 10) -j minimum junction coverage (default: 1) -t disable trimming of predicted transcripts based on coverage (default: coverage trimming is enabled) -c minimum reads per bp coverage to consider for multi-exon transcript (default: 1) -s minimum reads per bp coverage to consider for single-exon transcript (default: 4.75) -v verbose (log bundle processing details) -g maximum gap allowed between read mappings (default: 50) -M fraction of bundle allowed to be covered by multi-hit reads (default:1) -p number of threads (CPUs) to use (default: 1) -A gene abundance estimation output file -E define window around possibly erroneous splice sites from long reads to look out for correct splice sites (default: 25) -B enable output of Ballgown table files which will be created in the same directory as the output GTF (requires -G, -o recommended) -b enable output of Ballgown table files but these files will be created under the directory path given as <dir_path> -e only estimate the abundance of given reference transcripts (requires -G) --viral : only relevant for long reads from viral data where splice sites do not follow consensus (default:false) -x do not assemble any transcripts on the given reference sequence(s) -u no multi-mapping correction (default: correction enabled) -h print this usage message and exit --ref/--cram-ref reference genome FASTA file for CRAM input Transcript merge usage mode: stringtie --merge [Options] { gtf_list | strg1.gtf ...} With this option StringTie will assemble transcripts from multiple input files generating a unified non-redundant set of isoforms. In this mode the following options are available: -G <guide_gff> reference annotation to include in the merging (GTF/GFF3) -o <out_gtf> output file name for the merged transcripts GTF (default: stdout) -m <min_len> minimum input transcript length to include in the merge (default: 50) -c <min_cov> minimum input transcript coverage to include in the merge (default: 0) -F <min_fpkm> minimum input transcript FPKM to include in the merge (default: 1.0) -T <min_tpm> minimum input transcript TPM to include in the merge (default: 1.0) -f <min_iso> minimum isoform fraction (default: 0.01) -g <gap_len> gap between transcripts to merge together (default: 250) -i keep merged transcripts with retained introns; by default these are not kept unless there is strong evidence for them -l <label> name prefix for output transcripts (default: MSTRG)
Installation
Source code is obtained from StringTie
System
64-bit Linux