Trinity-Teaching
Category
Bioinformatics
Program On
Teaching
Version
2.6.6
Author / Distributor
Description
"Trinity represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-Seq reads." More details are at Trinity
Running Program
The last version of this application is at /usr/local/apps/eb/Trinity/2.6.6-foss-2016b
To use this version, please load the module with
ml Trinity/2.6.6-foss-2016b
Here is an example of a shell script, sub.sh, to run on the batch queue:
#!/bin/bash
#SBATCH --job-name=j_Trinity
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=Trinity.%j.out
#SBATCH --error=Trinity.%j.err
cd $SLURM_SUBMIT_DIR
ml Trinity/2.6.6-foss-2016b
Trinity [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
ml Trinity/2.6.6-foss-2016b Trinity --show_full_usage_info ############################################################################### # ______ ____ ____ ____ ____ ______ __ __ | || \ | || \ | || || | | | || D ) | | | _ | | | | || | | |_| |_|| / | | | | | | | |_| |_|| ~ | | | | \ | | | | | | | | | |___, | | | | . \ | | | | | | | | | | | |__| |__|\_||____||__|__||____| |__| |____/ # # # Required: # # --seqType <string> :type of reads: ('fa' or 'fq') # # --max_memory <string> :suggested max memory to use by Trinity where limiting can be enabled. (jellyfish, sorting, etc) # provided in Gb of RAM, ie. '--max_memory 10G' # # If paired reads: # --left <string> :left reads, one or more file names (separated by commas, no spaces) # --right <string> :right reads, one or more file names (separated by commas, no spaces) # # Or, if unpaired reads: # --single <string> :single reads, one or more file names, comma-delimited (note, if single file contains pairs, can use flag: --run_as_paired ) # # Or, # --samples_file <string> tab-delimited text file indicating biological replicate relationships. # ex. # cond_A cond_A_rep1 A_rep1_left.fq A_rep1_right.fq # cond_A cond_A_rep2 A_rep2_left.fq A_rep2_right.fq # cond_B cond_B_rep1 B_rep1_left.fq B_rep1_right.fq # cond_B cond_B_rep2 B_rep2_left.fq B_rep2_right.fq # # # if single-end instead of paired-end, then leave the 4th column above empty. # #################################### ## Misc: ######################### # # --SS_lib_type <string> :Strand-specific RNA-Seq read orientation. # if paired: RF or FR, # if single: F or R. (dUTP method = RF) # See web documentation. # # --CPU <int> :number of CPUs to use, default: 2 # --min_contig_length <int> :minimum assembled contig length to report # (def=200) # # --long_reads <string> :fasta file containing error-corrected or circular consensus (CCS) pac bio reads # (** note: experimental parameter **, this functionality continues to be under development) # # --genome_guided_bam <string> :genome guided mode, provide path to coordinate-sorted bam file. # (see genome-guided param section under --show_full_usage_info) # # --jaccard_clip :option, set if you have paired reads and # you expect high gene density with UTR # overlap (use FASTQ input file format # for reads). # (note: jaccard_clip is an expensive # operation, so avoid using it unless # necessary due to finding excessive fusion # transcripts w/o it.) # # --trimmomatic :run Trimmomatic to quality trim reads # see '--quality_trimming_params' under full usage info for tailored settings. # # # --no_normalize_reads :Do *not* run in silico normalization of reads. Defaults to max. read coverage of 50. # see '--normalize_max_read_cov' under full usage info for tailored settings. # (note, as of Sept 21, 2016, normalization is on by default) # # --no_distributed_trinity_exec :do not run Trinity phase 2 (assembly of partitioned reads), and stop after generating command list. # # # --output <string> :name of directory for output (will be # created if it doesn't already exist) # default( your current working directory: "/home/yhuang/projects/kpan/1.5.9/trinity_out_dir" # note: must include 'trinity' in the name as a safety precaution! ) # # --workdir <string> :where Trinity phase-2 assembly computation takes place (defaults to --output setting). # (can set this to a node-local drive or RAM disk) # # --full_cleanup :only retain the Trinity fasta file, rename as ${output_dir}.Trinity.fasta # # --cite :show the Trinity literature citation # # --verbose :provide additional job status info during the run. # # --version :reports Trinity version (Trinity-v2.6.6) and exits. # # --show_full_usage_info :show the many many more options available for running Trinity (expert usage). # # --KMER_SIZE <int> :kmer length to use (default: 25) max=32 # # --prep :Only prepare files (high I/O usage) and stop before kmer counting. # # --no_cleanup :retain all intermediate input files. # # --no_version_check :dont run a network check to determine if software updates are available. # # --monitoring :use collectl to monitor all steps of Trinity # --monitor_sec <int> : number of seconds for each interval of runtime monitoring (default: 60) # #################################################### # Inchworm and K-mer counting-related options: ##### # # --min_kmer_cov <int> :min count for K-mers to be assembled by # Inchworm (default: 1) # --inchworm_cpu <int> :number of CPUs to use for Inchworm, default is min(6, --CPU option) # # --no_run_inchworm :stop after running jellyfish, before inchworm. (phase 1, read clustering only) # ################################### # Chrysalis-related options: ###### # # --max_reads_per_graph <int> :maximum number of reads to anchor within # a single graph (default: 200000) # --min_glue <int> :min number of reads needed to glue two inchworm contigs # together. (default: 2) # # --no_bowtie :dont run bowtie to use pair info in chrysalis clustering. # # --no_run_chrysalis :stop after running inchworm, before chrysalis. (phase 1, read clustering only) # ##################################### ### Butterfly-related options: #### # # --bfly_opts <string> :additional parameters to pass through to butterfly # (see butterfly options: java -jar Butterfly.jar ). # (note: only for expert or experimental use. Commonly used parameters are exposed through this Trinity menu here). # # # Butterfly read-pair grouping settings (used to define 'pair paths'): # # --group_pairs_distance <int> :maximum length expected between fragment pairs (default: 500) # (reads outside this distance are treated as single-end) # # /////////////////////////////////////////////// # Butterfly default reconstruction mode settings. # # --path_reinforcement_distance <int> :minimum overlap of reads with growing transcript # path (default: PE: 25, SE: 25) # Set to 1 for the most lenient path extension requirements. # # # ///////////////////////////////////////// # Butterfly transcript reduction settings: # # --no_path_merging : all final transcript candidates are output (including SNP variations, however, some SNPs may be unphased) # # By default, alternative transcript candidates are merged (in reality, discarded) if they are found to be too similar, according to the following logic: # # (identity=(numberOfMatches/shorterLen) > 95.0% or if we have <= 2 mismatches) and if we have internal gap lengths <= 10 # # with parameters as: # # --min_per_id_same_path <int> default: 98 min percent identity for two paths to be merged into single paths # --max_diffs_same_path <int> default: 2 max allowed differences encountered between path sequences to combine them # --max_internal_gap_same_path <int> default: 10 maximum number of internal consecutive gap characters allowed for paths to be merged into single paths. # # If, in a comparison between two alternative transcripts, they are found too similar, the transcript with the greatest cumulative # compatible read (pair-path) support is retained, and the other is discarded. # # # ////////////////////////////////////////////// # Butterfly Java and parallel execution settings. # # --bflyHeapSpaceMax <string> :java max heap space setting for butterfly # (default: 4G) => yields command # 'java -Xmx4G -jar Butterfly.jar ... $bfly_opts' # --bflyHeapSpaceInit <string> :java initial heap space settings for # butterfly (default: 1G) => yields command # 'java -Xms1G -jar Butterfly.jar ... $bfly_opts' # --bflyGCThreads <int> :threads for garbage collection # (default: 2)) # --bflyCPU <int> :CPUs to use (default will be normal # number of CPUs; e.g., 2) # --bflyCalculateCPU :Calculate CPUs based on 80% of max_memory # divided by maxbflyHeapSpaceMax # # --bfly_jar <string> : /path/to/Butterfly.jar, otherwise default # Trinity-installed version is used. # # ################################################################################ #### Quality Trimming Options #### # # --quality_trimming_params <string> defaults to: "ILLUMINACLIP:/usr/local/apps/eb/Trinity/2.6.6-foss-2016b/trinityrnaseq-Trinity-v2.6.6/trinity-plugins/Trimmomatic/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:25" # ################################################################################ #### In silico Read Normalization Options ### # # --normalize_max_read_cov <int> defaults to 50 # --normalize_by_read_set run normalization separate for each pair of fastq files, # then one final normalization that combines the individual normalized reads. # Consider using this if RAM limitations are a consideration. # ################################################################################ #### Genome-guided de novo assembly # # * required: # # --genome_guided_max_intron <int> :maximum allowed intron length (also maximum fragment span on genome) # # * optional: # # --genome_guided_min_coverage <int> :minimum read coverage for identifying and expressed region of the genome. (default: 1) # # --genome_guided_min_reads_per_partition <int> :default min of 10 reads per partition # # ####################################################################### # Trinity phase 2 (parallel assembly of read clusters) Options: ####### # # --grid_exec <string> :your command-line utility for submitting jobs to the grid. # This should be a command line tool that accepts a single parameter: # ${your_submission_tool} /path/to/file/containing/commands.txt # and this submission tool should exit(0) upon successful # completion of all commands. # # --grid_node_CPU <int> number of threads for each parallel process to leverage. (default: 1) # # --grid_node_max_memory <string> max memory targeted for each grid node. (default: 1G) # # The --grid_node_CPU and --grid_node_max_memory are applied as # the --CPU and --max_memory parameters for the Trinity jobs run in # Trinity Phase 2 (assembly of read clusters) # # # ############################################################################### # # *Note, a typical Trinity command might be: # # Trinity --seqType fq --max_memory 50G --left reads_1.fq --right reads_2.fq --CPU 6 # # # and for Genome-guided Trinity: # # Trinity --genome_guided_bam rnaseq_alignments.csorted.bam --max_memory 50G # --genome_guided_max_intron 10000 --CPU 6 # # see: /usr/local/apps/eb/Trinity/2.6.6-foss-2016b/trinityrnaseq-Trinity-v2.6.6/sample_data/test_Trinity_Assembly/ # for sample data and 'runMe.sh' for example Trinity execution # # For more details, visit: http://trinityrnaseq.github.io # ###############################################################################
Installation
Source code is obtained from Trinity
System
64-bit Linux