Trinity-Teaching
Category
Bioinformatics
Program On
Teaching
Version
2.6.6
Author / Distributor
Description
"Trinity represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-Seq reads." More details are at Trinity
Running Program
The last version of this application is at /usr/local/apps/eb/Trinity/2.6.6-foss-2016b
To use this version, please load the module with
ml Trinity/2.6.6-foss-2016b
Here is an example of a shell script, sub.sh, to run on the batch queue:
#!/bin/bash
#SBATCH --job-name=j_Trinity
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=Trinity.%j.out
#SBATCH --error=Trinity.%j.err
cd $SLURM_SUBMIT_DIR
ml Trinity/2.6.6-foss-2016b
Trinity [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
ml Trinity/2.6.6-foss-2016b
Trinity --show_full_usage_info
###############################################################################
#
______ ____ ____ ____ ____ ______ __ __
| || \ | || \ | || || | |
| || D ) | | | _ | | | | || | |
|_| |_|| / | | | | | | | |_| |_|| ~ |
| | | \ | | | | | | | | | |___, |
| | | . \ | | | | | | | | | | |
|__| |__|\_||____||__|__||____| |__| |____/
#
#
# Required:
#
# --seqType <string> :type of reads: ('fa' or 'fq')
#
# --max_memory <string> :suggested max memory to use by Trinity where limiting can be enabled. (jellyfish, sorting, etc)
# provided in Gb of RAM, ie. '--max_memory 10G'
#
# If paired reads:
# --left <string> :left reads, one or more file names (separated by commas, no spaces)
# --right <string> :right reads, one or more file names (separated by commas, no spaces)
#
# Or, if unpaired reads:
# --single <string> :single reads, one or more file names, comma-delimited (note, if single file contains pairs, can use flag: --run_as_paired )
#
# Or,
# --samples_file <string> tab-delimited text file indicating biological replicate relationships.
# ex.
# cond_A cond_A_rep1 A_rep1_left.fq A_rep1_right.fq
# cond_A cond_A_rep2 A_rep2_left.fq A_rep2_right.fq
# cond_B cond_B_rep1 B_rep1_left.fq B_rep1_right.fq
# cond_B cond_B_rep2 B_rep2_left.fq B_rep2_right.fq
#
# # if single-end instead of paired-end, then leave the 4th column above empty.
#
####################################
## Misc: #########################
#
# --SS_lib_type <string> :Strand-specific RNA-Seq read orientation.
# if paired: RF or FR,
# if single: F or R. (dUTP method = RF)
# See web documentation.
#
# --CPU <int> :number of CPUs to use, default: 2
# --min_contig_length <int> :minimum assembled contig length to report
# (def=200)
#
# --long_reads <string> :fasta file containing error-corrected or circular consensus (CCS) pac bio reads
# (** note: experimental parameter **, this functionality continues to be under development)
#
# --genome_guided_bam <string> :genome guided mode, provide path to coordinate-sorted bam file.
# (see genome-guided param section under --show_full_usage_info)
#
# --jaccard_clip :option, set if you have paired reads and
# you expect high gene density with UTR
# overlap (use FASTQ input file format
# for reads).
# (note: jaccard_clip is an expensive
# operation, so avoid using it unless
# necessary due to finding excessive fusion
# transcripts w/o it.)
#
# --trimmomatic :run Trimmomatic to quality trim reads
# see '--quality_trimming_params' under full usage info for tailored settings.
#
#
# --no_normalize_reads :Do *not* run in silico normalization of reads. Defaults to max. read coverage of 50.
# see '--normalize_max_read_cov' under full usage info for tailored settings.
# (note, as of Sept 21, 2016, normalization is on by default)
#
# --no_distributed_trinity_exec :do not run Trinity phase 2 (assembly of partitioned reads), and stop after generating command list.
#
#
# --output <string> :name of directory for output (will be
# created if it doesn't already exist)
# default( your current working directory: "/home/yhuang/projects/kpan/1.5.9/trinity_out_dir"
# note: must include 'trinity' in the name as a safety precaution! )
#
# --workdir <string> :where Trinity phase-2 assembly computation takes place (defaults to --output setting).
# (can set this to a node-local drive or RAM disk)
#
# --full_cleanup :only retain the Trinity fasta file, rename as ${output_dir}.Trinity.fasta
#
# --cite :show the Trinity literature citation
#
# --verbose :provide additional job status info during the run.
#
# --version :reports Trinity version (Trinity-v2.6.6) and exits.
#
# --show_full_usage_info :show the many many more options available for running Trinity (expert usage).
#
# --KMER_SIZE <int> :kmer length to use (default: 25) max=32
#
# --prep :Only prepare files (high I/O usage) and stop before kmer counting.
#
# --no_cleanup :retain all intermediate input files.
#
# --no_version_check :dont run a network check to determine if software updates are available.
#
# --monitoring :use collectl to monitor all steps of Trinity
# --monitor_sec <int> : number of seconds for each interval of runtime monitoring (default: 60)
#
####################################################
# Inchworm and K-mer counting-related options: #####
#
# --min_kmer_cov <int> :min count for K-mers to be assembled by
# Inchworm (default: 1)
# --inchworm_cpu <int> :number of CPUs to use for Inchworm, default is min(6, --CPU option)
#
# --no_run_inchworm :stop after running jellyfish, before inchworm. (phase 1, read clustering only)
#
###################################
# Chrysalis-related options: ######
#
# --max_reads_per_graph <int> :maximum number of reads to anchor within
# a single graph (default: 200000)
# --min_glue <int> :min number of reads needed to glue two inchworm contigs
# together. (default: 2)
#
# --no_bowtie :dont run bowtie to use pair info in chrysalis clustering.
#
# --no_run_chrysalis :stop after running inchworm, before chrysalis. (phase 1, read clustering only)
#
#####################################
### Butterfly-related options: ####
#
# --bfly_opts <string> :additional parameters to pass through to butterfly
# (see butterfly options: java -jar Butterfly.jar ).
# (note: only for expert or experimental use. Commonly used parameters are exposed through this Trinity menu here).
#
#
# Butterfly read-pair grouping settings (used to define 'pair paths'):
#
# --group_pairs_distance <int> :maximum length expected between fragment pairs (default: 500)
# (reads outside this distance are treated as single-end)
#
# ///////////////////////////////////////////////
# Butterfly default reconstruction mode settings.
#
# --path_reinforcement_distance <int> :minimum overlap of reads with growing transcript
# path (default: PE: 25, SE: 25)
# Set to 1 for the most lenient path extension requirements.
#
#
# /////////////////////////////////////////
# Butterfly transcript reduction settings:
#
# --no_path_merging : all final transcript candidates are output (including SNP variations, however, some SNPs may be unphased)
#
# By default, alternative transcript candidates are merged (in reality, discarded) if they are found to be too similar, according to the following logic:
#
# (identity=(numberOfMatches/shorterLen) > 95.0% or if we have <= 2 mismatches) and if we have internal gap lengths <= 10
#
# with parameters as:
#
# --min_per_id_same_path <int> default: 98 min percent identity for two paths to be merged into single paths
# --max_diffs_same_path <int> default: 2 max allowed differences encountered between path sequences to combine them
# --max_internal_gap_same_path <int> default: 10 maximum number of internal consecutive gap characters allowed for paths to be merged into single paths.
#
# If, in a comparison between two alternative transcripts, they are found too similar, the transcript with the greatest cumulative
# compatible read (pair-path) support is retained, and the other is discarded.
#
#
# //////////////////////////////////////////////
# Butterfly Java and parallel execution settings.
#
# --bflyHeapSpaceMax <string> :java max heap space setting for butterfly
# (default: 4G) => yields command
# 'java -Xmx4G -jar Butterfly.jar ... $bfly_opts'
# --bflyHeapSpaceInit <string> :java initial heap space settings for
# butterfly (default: 1G) => yields command
# 'java -Xms1G -jar Butterfly.jar ... $bfly_opts'
# --bflyGCThreads <int> :threads for garbage collection
# (default: 2))
# --bflyCPU <int> :CPUs to use (default will be normal
# number of CPUs; e.g., 2)
# --bflyCalculateCPU :Calculate CPUs based on 80% of max_memory
# divided by maxbflyHeapSpaceMax
#
# --bfly_jar <string> : /path/to/Butterfly.jar, otherwise default
# Trinity-installed version is used.
#
#
################################################################################
#### Quality Trimming Options ####
#
# --quality_trimming_params <string> defaults to: "ILLUMINACLIP:/usr/local/apps/eb/Trinity/2.6.6-foss-2016b/trinityrnaseq-Trinity-v2.6.6/trinity-plugins/Trimmomatic/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:25"
#
################################################################################
#### In silico Read Normalization Options ###
#
# --normalize_max_read_cov <int> defaults to 50
# --normalize_by_read_set run normalization separate for each pair of fastq files,
# then one final normalization that combines the individual normalized reads.
# Consider using this if RAM limitations are a consideration.
#
################################################################################
#### Genome-guided de novo assembly
#
# * required:
#
# --genome_guided_max_intron <int> :maximum allowed intron length (also maximum fragment span on genome)
#
# * optional:
#
# --genome_guided_min_coverage <int> :minimum read coverage for identifying and expressed region of the genome. (default: 1)
#
# --genome_guided_min_reads_per_partition <int> :default min of 10 reads per partition
#
#
#######################################################################
# Trinity phase 2 (parallel assembly of read clusters) Options: #######
#
# --grid_exec <string> :your command-line utility for submitting jobs to the grid.
# This should be a command line tool that accepts a single parameter:
# ${your_submission_tool} /path/to/file/containing/commands.txt
# and this submission tool should exit(0) upon successful
# completion of all commands.
#
# --grid_node_CPU <int> number of threads for each parallel process to leverage. (default: 1)
#
# --grid_node_max_memory <string> max memory targeted for each grid node. (default: 1G)
#
# The --grid_node_CPU and --grid_node_max_memory are applied as
# the --CPU and --max_memory parameters for the Trinity jobs run in
# Trinity Phase 2 (assembly of read clusters)
#
#
#
###############################################################################
#
# *Note, a typical Trinity command might be:
#
# Trinity --seqType fq --max_memory 50G --left reads_1.fq --right reads_2.fq --CPU 6
#
#
# and for Genome-guided Trinity:
#
# Trinity --genome_guided_bam rnaseq_alignments.csorted.bam --max_memory 50G
# --genome_guided_max_intron 10000 --CPU 6
#
# see: /usr/local/apps/eb/Trinity/2.6.6-foss-2016b/trinityrnaseq-Trinity-v2.6.6/sample_data/test_Trinity_Assembly/
# for sample data and 'runMe.sh' for example Trinity execution
#
# For more details, visit: http://trinityrnaseq.github.io
#
###############################################################################
Installation
Source code is obtained from Trinity
System
64-bit Linux