Trinity-Teaching

From Research Computing Center Wiki
Revision as of 11:24, 15 August 2018 by Yhuang (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Category

Bioinformatics

Program On

Teaching

Version

2.6.6

Author / Distributor

Trinity

Description

"Trinity represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-Seq reads." More details are at Trinity

Running Program

The last version of this application is at /usr/local/apps/eb/Trinity/2.6.6-foss-2016b

To use this version, please load the module with

ml Trinity/2.6.6-foss-2016b 

Here is an example of a shell script, sub.sh, to run on the batch queue:

#!/bin/bash
#SBATCH --job-name=j_Trinity
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=Trinity.%j.out
#SBATCH --error=Trinity.%j.err

cd $SLURM_SUBMIT_DIR
ml Trinity/2.6.6-foss-2016b
Trinity [options]

In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.

Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.


Here is an example of job submission command:

sbatch ./sub.sh 

Documentation

ml Trinity/2.6.6-foss-2016b 
Trinity --show_full_usage_info



###############################################################################
#

     ______  ____   ____  ____   ____  ______  __ __
    |      ||    \ |    ||    \ |    ||      ||  |  |
    |      ||  D  ) |  | |  _  | |  | |      ||  |  |
    |_|  |_||    /  |  | |  |  | |  | |_|  |_||  ~  |
      |  |  |    \  |  | |  |  | |  |   |  |  |___, |
      |  |  |  .  \ |  | |  |  | |  |   |  |  |     |
      |__|  |__|\_||____||__|__||____|  |__|  |____/

#
#
# Required:
#
#  --seqType <string>      :type of reads: ('fa' or 'fq')
#
#  --max_memory <string>      :suggested max memory to use by Trinity where limiting can be enabled. (jellyfish, sorting, etc)
#                            provided in Gb of RAM, ie.  '--max_memory 10G'
#
#  If paired reads:
#      --left  <string>    :left reads, one or more file names (separated by commas, no spaces)
#      --right <string>    :right reads, one or more file names (separated by commas, no spaces)
#
#  Or, if unpaired reads:
#      --single <string>   :single reads, one or more file names, comma-delimited (note, if single file contains pairs, can use flag: --run_as_paired )
#
#  Or,
#      --samples_file <string>         tab-delimited text file indicating biological replicate relationships.
#                                   ex.
#                                        cond_A    cond_A_rep1    A_rep1_left.fq    A_rep1_right.fq
#                                        cond_A    cond_A_rep2    A_rep2_left.fq    A_rep2_right.fq
#                                        cond_B    cond_B_rep1    B_rep1_left.fq    B_rep1_right.fq
#                                        cond_B    cond_B_rep2    B_rep2_left.fq    B_rep2_right.fq
#
#                      # if single-end instead of paired-end, then leave the 4th column above empty.
#
####################################
##  Misc:  #########################
#
#  --SS_lib_type <string>          :Strand-specific RNA-Seq read orientation.
#                                   if paired: RF or FR,
#                                   if single: F or R.   (dUTP method = RF)
#                                   See web documentation.
#
#  --CPU <int>                     :number of CPUs to use, default: 2
#  --min_contig_length <int>       :minimum assembled contig length to report
#                                   (def=200)
#
#  --long_reads <string>           :fasta file containing error-corrected or circular consensus (CCS) pac bio reads
#                                   (** note: experimental parameter **, this functionality continues to be under development)
#
#  --genome_guided_bam <string>    :genome guided mode, provide path to coordinate-sorted bam file.
#                                   (see genome-guided param section under --show_full_usage_info)
#
#  --jaccard_clip                  :option, set if you have paired reads and
#                                   you expect high gene density with UTR
#                                   overlap (use FASTQ input file format
#                                   for reads).
#                                   (note: jaccard_clip is an expensive
#                                   operation, so avoid using it unless
#                                   necessary due to finding excessive fusion
#                                   transcripts w/o it.)
#
#  --trimmomatic                   :run Trimmomatic to quality trim reads
#                                        see '--quality_trimming_params' under full usage info for tailored settings.
#                                  
#
#  --no_normalize_reads            :Do *not* run in silico normalization of reads. Defaults to max. read coverage of 50.
#                                       see '--normalize_max_read_cov' under full usage info for tailored settings.
#                                       (note, as of Sept 21, 2016, normalization is on by default)
#     
#  --no_distributed_trinity_exec   :do not run Trinity phase 2 (assembly of partitioned reads), and stop after generating command list.
#
#
#  --output <string>               :name of directory for output (will be
#                                   created if it doesn't already exist)
#                                   default( your current working directory: "/home/yhuang/projects/kpan/1.5.9/trinity_out_dir" 
#                                    note: must include 'trinity' in the name as a safety precaution! )
#             
#  --workdir <string>              :where Trinity phase-2 assembly computation takes place (defaults to --output setting).
#                                  (can set this to a node-local drive or RAM disk)     
#  
#  --full_cleanup                  :only retain the Trinity fasta file, rename as ${output_dir}.Trinity.fasta
#
#  --cite                          :show the Trinity literature citation
#
#  --verbose                       :provide additional job status info during the run.
#
#  --version                       :reports Trinity version (Trinity-v2.6.6) and exits.
#
#  --show_full_usage_info          :show the many many more options available for running Trinity (expert usage).

#
#  --KMER_SIZE <int>               :kmer length to use (default: 25)  max=32
#
#  --prep                          :Only prepare files (high I/O usage) and stop before kmer counting.
#
#  --no_cleanup                    :retain all intermediate input files.
#
#  --no_version_check              :dont run a network check to determine if software updates are available.
#
#  --monitoring                    :use collectl to monitor all steps of Trinity
#     --monitor_sec <int>          : number of seconds for each interval of runtime monitoring (default: 60)
#  
####################################################
# Inchworm and K-mer counting-related options: #####
#
#  --min_kmer_cov <int>           :min count for K-mers to be assembled by
#                                  Inchworm (default: 1)
#  --inchworm_cpu <int>           :number of CPUs to use for Inchworm, default is min(6, --CPU option)
#
#  --no_run_inchworm              :stop after running jellyfish, before inchworm. (phase 1, read clustering only)
#
###################################
# Chrysalis-related options: ######
#
#  --max_reads_per_graph <int>    :maximum number of reads to anchor within
#                                  a single graph (default: 200000)
#  --min_glue <int>               :min number of reads needed to glue two inchworm contigs
#                                  together. (default: 2) 
#
#  --no_bowtie                    :dont run bowtie to use pair info in chrysalis clustering.
#
#  --no_run_chrysalis             :stop after running inchworm, before chrysalis. (phase 1, read clustering only)
#
#####################################
###  Butterfly-related options:  ####
#
#  --bfly_opts <string>            :additional parameters to pass through to butterfly
#                                   (see butterfly options: java -jar Butterfly.jar ).
#                                   (note: only for expert or experimental use.  Commonly used parameters are exposed through this Trinity menu here).
#
#
#  Butterfly read-pair grouping settings (used to define 'pair paths'):
#
#  --group_pairs_distance <int>    :maximum length expected between fragment pairs (default: 500)
#                                   (reads outside this distance are treated as single-end)
#
#  ///////////////////////////////////////////////
#  Butterfly default reconstruction mode settings.
#                                   
#  --path_reinforcement_distance <int>   :minimum overlap of reads with growing transcript 
#                                         path (default: PE: 25, SE: 25)
#                                         Set to 1 for the most lenient path extension requirements.
#
#
#  /////////////////////////////////////////
#  Butterfly transcript reduction settings:
#
#  --no_path_merging            : all final transcript candidates are output (including SNP variations, however, some SNPs may be unphased)  
#
#  By default, alternative transcript candidates are merged (in reality, discarded) if they are found to be too similar, according to the following logic:
#
#  (identity=(numberOfMatches/shorterLen) > 95.0% or if we have <= 2 mismatches) and if we have internal gap lengths <= 10
#
#  with parameters as:
#      
#      --min_per_id_same_path <int>          default: 98     min percent identity for two paths to be merged into single paths
#      --max_diffs_same_path <int>           default: 2      max allowed differences encountered between path sequences to combine them
#      --max_internal_gap_same_path <int>    default: 10     maximum number of internal consecutive gap characters allowed for paths to be merged into single paths.
#
#      If, in a comparison between two alternative transcripts, they are found too similar, the transcript with the greatest cumulative 
#      compatible read (pair-path) support is retained, and the other is discarded.
#
#
#  //////////////////////////////////////////////
#  Butterfly Java and parallel execution settings.
#
#  --bflyHeapSpaceMax <string>     :java max heap space setting for butterfly
#                                   (default: 4G) => yields command
#                  'java -Xmx4G -jar Butterfly.jar ... $bfly_opts'
#  --bflyHeapSpaceInit <string>    :java initial heap space settings for
#                                   butterfly (default: 1G) => yields command
#                  'java -Xms1G -jar Butterfly.jar ... $bfly_opts'
#  --bflyGCThreads <int>           :threads for garbage collection
#                                   (default: 2))
#  --bflyCPU <int>                 :CPUs to use (default will be normal 
#                                   number of CPUs; e.g., 2)
#  --bflyCalculateCPU              :Calculate CPUs based on 80% of max_memory
#                                   divided by maxbflyHeapSpaceMax
#
#  --bfly_jar <string>             : /path/to/Butterfly.jar, otherwise default
#                                    Trinity-installed version is used. 
#                                    
#
################################################################################
#### Quality Trimming Options ####  
# 
#  --quality_trimming_params <string>   defaults to: "ILLUMINACLIP:/usr/local/apps/eb/Trinity/2.6.6-foss-2016b/trinityrnaseq-Trinity-v2.6.6/trinity-plugins/Trimmomatic/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:25"
#
################################################################################
####  In silico Read Normalization Options ###
#
#  --normalize_max_read_cov <int>       defaults to 50
#  --normalize_by_read_set              run normalization separate for each pair of fastq files,
#                                       then one final normalization that combines the individual normalized reads.
#                                       Consider using this if RAM limitations are a consideration.
#
################################################################################
#### Genome-guided de novo assembly
# 
#  * required:
#
# --genome_guided_max_intron <int>     :maximum allowed intron length (also maximum fragment span on genome)
#
#  * optional:
#
# --genome_guided_min_coverage <int>   :minimum read coverage for identifying and expressed region of the genome. (default: 1)
#
# --genome_guided_min_reads_per_partition <int>   :default min of 10 reads per partition
#
#
#######################################################################
# Trinity phase 2 (parallel assembly of read clusters) Options: #######
#
#  --grid_exec <string>                 :your command-line utility for submitting jobs to the grid.
#                                        This should be a command line tool that accepts a single parameter:
#                                        ${your_submission_tool} /path/to/file/containing/commands.txt
#                                        and this submission tool should exit(0) upon successful 
#                                        completion of all commands.
#
#  --grid_node_CPU <int>                number of threads for each parallel process to leverage. (default: 1)
#
#  --grid_node_max_memory <string>         max memory targeted for each grid node. (default: 1G)
#
#            The --grid_node_CPU and --grid_node_max_memory are applied as 
#              the --CPU and --max_memory parameters for the Trinity jobs run in 
#              Trinity Phase 2 (assembly of read clusters)
#
    #
#
###############################################################################
#
#  *Note, a typical Trinity command might be:
#
#        Trinity --seqType fq --max_memory 50G --left reads_1.fq  --right reads_2.fq --CPU 6
#
#
#    and for Genome-guided Trinity:
#
#        Trinity --genome_guided_bam rnaseq_alignments.csorted.bam --max_memory 50G
#                --genome_guided_max_intron 10000 --CPU 6
#
#     see: /usr/local/apps/eb/Trinity/2.6.6-foss-2016b/trinityrnaseq-Trinity-v2.6.6/sample_data/test_Trinity_Assembly/
#          for sample data and 'runMe.sh' for example Trinity execution
#
#     For more details, visit: http://trinityrnaseq.github.io
#
###############################################################################


    

Back to Top

Installation

Source code is obtained from Trinity

System

64-bit Linux