Trinity-Teaching: Difference between revisions
No edit summary |
No edit summary |
||
(One intermediate revision by the same user not shown) | |||
Line 40: | Line 40: | ||
<nowiki>#</nowiki>SBATCH --time=<u>08:00:00</u><br> | <nowiki>#</nowiki>SBATCH --time=<u>08:00:00</u><br> | ||
<nowiki>#</nowiki>SBATCH --output=Trinity.%j.out<br> | <nowiki>#</nowiki>SBATCH --output=Trinity.%j.out<br> | ||
<nowiki>#</nowiki>SBATCH --error=Trinity.%j.err<br> | |||
cd $SLURM_SUBMIT_DIR<br> | cd $SLURM_SUBMIT_DIR<br> | ||
ml Trinity/2.6.6-foss-2016b<br> | ml Trinity/2.6.6-foss-2016b<br> | ||
Trinity <u>[options]</u><br> | |||
</div> | </div> | ||
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values. | In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values. | ||
Line 59: | Line 60: | ||
<pre class="gcommand"> | <pre class="gcommand"> | ||
ml Trinity/2.6.6-foss-2016b | ml Trinity/2.6.6-foss-2016b | ||
Trinity --show_full_usage_info | |||
############################################################################### | |||
# | |||
______ ____ ____ ____ ____ ______ __ __ | |||
| || \ | || \ | || || | | | |||
| || D ) | | | _ | | | | || | | | |||
|_| |_|| / | | | | | | | |_| |_|| ~ | | |||
| | | \ | | | | | | | | | |___, | | |||
| | | . \ | | | | | | | | | | | | |||
|__| |__|\_||____||__|__||____| |__| |____/ | |||
# | |||
# | |||
# Required: | |||
# | |||
# --seqType <string> :type of reads: ('fa' or 'fq') | |||
# | |||
# --max_memory <string> :suggested max memory to use by Trinity where limiting can be enabled. (jellyfish, sorting, etc) | |||
# provided in Gb of RAM, ie. '--max_memory 10G' | |||
# | |||
# If paired reads: | |||
# --left <string> :left reads, one or more file names (separated by commas, no spaces) | |||
# --right <string> :right reads, one or more file names (separated by commas, no spaces) | |||
# | |||
# Or, if unpaired reads: | |||
# --single <string> :single reads, one or more file names, comma-delimited (note, if single file contains pairs, can use flag: --run_as_paired ) | |||
# | |||
# Or, | |||
# --samples_file <string> tab-delimited text file indicating biological replicate relationships. | |||
# ex. | |||
# cond_A cond_A_rep1 A_rep1_left.fq A_rep1_right.fq | |||
# cond_A cond_A_rep2 A_rep2_left.fq A_rep2_right.fq | |||
# cond_B cond_B_rep1 B_rep1_left.fq B_rep1_right.fq | |||
# cond_B cond_B_rep2 B_rep2_left.fq B_rep2_right.fq | |||
# | |||
# # if single-end instead of paired-end, then leave the 4th column above empty. | |||
# | |||
#################################### | |||
## Misc: ######################### | |||
# | |||
# --SS_lib_type <string> :Strand-specific RNA-Seq read orientation. | |||
# if paired: RF or FR, | |||
# if single: F or R. (dUTP method = RF) | |||
# See web documentation. | |||
# | |||
# --CPU <int> :number of CPUs to use, default: 2 | |||
# --min_contig_length <int> :minimum assembled contig length to report | |||
# (def=200) | |||
# | |||
# --long_reads <string> :fasta file containing error-corrected or circular consensus (CCS) pac bio reads | |||
# (** note: experimental parameter **, this functionality continues to be under development) | |||
# | |||
# --genome_guided_bam <string> :genome guided mode, provide path to coordinate-sorted bam file. | |||
# (see genome-guided param section under --show_full_usage_info) | |||
# | |||
# --jaccard_clip :option, set if you have paired reads and | |||
# you expect high gene density with UTR | |||
# overlap (use FASTQ input file format | |||
# for reads). | |||
# (note: jaccard_clip is an expensive | |||
# operation, so avoid using it unless | |||
# necessary due to finding excessive fusion | |||
# transcripts w/o it.) | |||
# | |||
# --trimmomatic :run Trimmomatic to quality trim reads | |||
# see '--quality_trimming_params' under full usage info for tailored settings. | |||
# | |||
# | |||
# --no_normalize_reads :Do *not* run in silico normalization of reads. Defaults to max. read coverage of 50. | |||
# see '--normalize_max_read_cov' under full usage info for tailored settings. | |||
# (note, as of Sept 21, 2016, normalization is on by default) | |||
# | |||
# --no_distributed_trinity_exec :do not run Trinity phase 2 (assembly of partitioned reads), and stop after generating command list. | |||
# | |||
# | |||
# --output <string> :name of directory for output (will be | |||
# created if it doesn't already exist) | |||
# default( your current working directory: "/home/yhuang/projects/kpan/1.5.9/trinity_out_dir" | |||
# note: must include 'trinity' in the name as a safety precaution! ) | |||
# | |||
# --workdir <string> :where Trinity phase-2 assembly computation takes place (defaults to --output setting). | |||
# (can set this to a node-local drive or RAM disk) | |||
# | |||
# --full_cleanup :only retain the Trinity fasta file, rename as ${output_dir}.Trinity.fasta | |||
# | |||
# --cite :show the Trinity literature citation | |||
# | |||
# --verbose :provide additional job status info during the run. | |||
# | |||
# --version :reports Trinity version (Trinity-v2.6.6) and exits. | |||
# | |||
# --show_full_usage_info :show the many many more options available for running Trinity (expert usage). | |||
# | |||
# --KMER_SIZE <int> :kmer length to use (default: 25) max=32 | |||
# | |||
# --prep :Only prepare files (high I/O usage) and stop before kmer counting. | |||
# | |||
# --no_cleanup :retain all intermediate input files. | |||
# | |||
# --no_version_check :dont run a network check to determine if software updates are available. | |||
# | |||
# --monitoring :use collectl to monitor all steps of Trinity | |||
# --monitor_sec <int> : number of seconds for each interval of runtime monitoring (default: 60) | |||
# | |||
#################################################### | |||
# Inchworm and K-mer counting-related options: ##### | |||
# | |||
# --min_kmer_cov <int> :min count for K-mers to be assembled by | |||
# Inchworm (default: 1) | |||
# --inchworm_cpu <int> :number of CPUs to use for Inchworm, default is min(6, --CPU option) | |||
# | |||
# --no_run_inchworm :stop after running jellyfish, before inchworm. (phase 1, read clustering only) | |||
# | |||
################################### | |||
# Chrysalis-related options: ###### | |||
# | |||
# --max_reads_per_graph <int> :maximum number of reads to anchor within | |||
# a single graph (default: 200000) | |||
# --min_glue <int> :min number of reads needed to glue two inchworm contigs | |||
# together. (default: 2) | |||
# | |||
# --no_bowtie :dont run bowtie to use pair info in chrysalis clustering. | |||
# | |||
# --no_run_chrysalis :stop after running inchworm, before chrysalis. (phase 1, read clustering only) | |||
# | |||
##################################### | |||
### Butterfly-related options: #### | |||
# | |||
# --bfly_opts <string> :additional parameters to pass through to butterfly | |||
# (see butterfly options: java -jar Butterfly.jar ). | |||
# (note: only for expert or experimental use. Commonly used parameters are exposed through this Trinity menu here). | |||
# | |||
# | |||
# Butterfly read-pair grouping settings (used to define 'pair paths'): | |||
# | |||
# --group_pairs_distance <int> :maximum length expected between fragment pairs (default: 500) | |||
# (reads outside this distance are treated as single-end) | |||
# | |||
# /////////////////////////////////////////////// | |||
# Butterfly default reconstruction mode settings. | |||
# | |||
# --path_reinforcement_distance <int> :minimum overlap of reads with growing transcript | |||
# path (default: PE: 25, SE: 25) | |||
# Set to 1 for the most lenient path extension requirements. | |||
# | |||
# | |||
# ///////////////////////////////////////// | |||
# Butterfly transcript reduction settings: | |||
# | |||
# --no_path_merging : all final transcript candidates are output (including SNP variations, however, some SNPs may be unphased) | |||
# | |||
# By default, alternative transcript candidates are merged (in reality, discarded) if they are found to be too similar, according to the following logic: | |||
# | |||
# (identity=(numberOfMatches/shorterLen) > 95.0% or if we have <= 2 mismatches) and if we have internal gap lengths <= 10 | |||
# | |||
# with parameters as: | |||
# | |||
# --min_per_id_same_path <int> default: 98 min percent identity for two paths to be merged into single paths | |||
# --max_diffs_same_path <int> default: 2 max allowed differences encountered between path sequences to combine them | |||
# --max_internal_gap_same_path <int> default: 10 maximum number of internal consecutive gap characters allowed for paths to be merged into single paths. | |||
# | |||
# If, in a comparison between two alternative transcripts, they are found too similar, the transcript with the greatest cumulative | |||
# compatible read (pair-path) support is retained, and the other is discarded. | |||
# | |||
# | |||
# ////////////////////////////////////////////// | |||
# Butterfly Java and parallel execution settings. | |||
# | |||
# --bflyHeapSpaceMax <string> :java max heap space setting for butterfly | |||
# (default: 4G) => yields command | |||
# 'java -Xmx4G -jar Butterfly.jar ... $bfly_opts' | |||
# --bflyHeapSpaceInit <string> :java initial heap space settings for | |||
# butterfly (default: 1G) => yields command | |||
# 'java -Xms1G -jar Butterfly.jar ... $bfly_opts' | |||
# --bflyGCThreads <int> :threads for garbage collection | |||
# (default: 2)) | |||
# --bflyCPU <int> :CPUs to use (default will be normal | |||
# number of CPUs; e.g., 2) | |||
# --bflyCalculateCPU :Calculate CPUs based on 80% of max_memory | |||
# divided by maxbflyHeapSpaceMax | |||
# | |||
# --bfly_jar <string> : /path/to/Butterfly.jar, otherwise default | |||
# Trinity-installed version is used. | |||
# | |||
# | |||
################################################################################ | |||
#### Quality Trimming Options #### | |||
# | |||
# --quality_trimming_params <string> defaults to: "ILLUMINACLIP:/usr/local/apps/eb/Trinity/2.6.6-foss-2016b/trinityrnaseq-Trinity-v2.6.6/trinity-plugins/Trimmomatic/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:25" | |||
# | |||
################################################################################ | |||
#### In silico Read Normalization Options ### | |||
# | |||
# --normalize_max_read_cov <int> defaults to 50 | |||
# --normalize_by_read_set run normalization separate for each pair of fastq files, | |||
# then one final normalization that combines the individual normalized reads. | |||
# Consider using this if RAM limitations are a consideration. | |||
# | |||
################################################################################ | |||
#### Genome-guided de novo assembly | |||
# | |||
# * required: | |||
# | |||
# --genome_guided_max_intron <int> :maximum allowed intron length (also maximum fragment span on genome) | |||
# | |||
# * optional: | |||
# | |||
# --genome_guided_min_coverage <int> :minimum read coverage for identifying and expressed region of the genome. (default: 1) | |||
# | |||
# --genome_guided_min_reads_per_partition <int> :default min of 10 reads per partition | |||
# | |||
# | |||
####################################################################### | |||
# Trinity phase 2 (parallel assembly of read clusters) Options: ####### | |||
# | |||
# --grid_exec <string> :your command-line utility for submitting jobs to the grid. | |||
# This should be a command line tool that accepts a single parameter: | |||
# ${your_submission_tool} /path/to/file/containing/commands.txt | |||
# and this submission tool should exit(0) upon successful | |||
# completion of all commands. | |||
# | |||
# --grid_node_CPU <int> number of threads for each parallel process to leverage. (default: 1) | |||
# | |||
# --grid_node_max_memory <string> max memory targeted for each grid node. (default: 1G) | |||
# | |||
# The --grid_node_CPU and --grid_node_max_memory are applied as | |||
# the --CPU and --max_memory parameters for the Trinity jobs run in | |||
# Trinity Phase 2 (assembly of read clusters) | |||
# | |||
# | |||
# | |||
############################################################################### | |||
# | |||
# *Note, a typical Trinity command might be: | |||
# | |||
# Trinity --seqType fq --max_memory 50G --left reads_1.fq --right reads_2.fq --CPU 6 | |||
# | |||
# | |||
# and for Genome-guided Trinity: | |||
# | |||
# Trinity --genome_guided_bam rnaseq_alignments.csorted.bam --max_memory 50G | |||
# --genome_guided_max_intron 10000 --CPU 6 | |||
# | |||
# see: /usr/local/apps/eb/Trinity/2.6.6-foss-2016b/trinityrnaseq-Trinity-v2.6.6/sample_data/test_Trinity_Assembly/ | |||
# for sample data and 'runMe.sh' for example Trinity execution | |||
# | |||
# For more details, visit: http://trinityrnaseq.github.io | |||
# | |||
############################################################################### | |||
</pre> | </pre> |
Latest revision as of 11:24, 15 August 2018
Category
Bioinformatics
Program On
Teaching
Version
2.6.6
Author / Distributor
Description
"Trinity represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-Seq reads." More details are at Trinity
Running Program
The last version of this application is at /usr/local/apps/eb/Trinity/2.6.6-foss-2016b
To use this version, please load the module with
ml Trinity/2.6.6-foss-2016b
Here is an example of a shell script, sub.sh, to run on the batch queue:
#!/bin/bash
#SBATCH --job-name=j_Trinity
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=Trinity.%j.out
#SBATCH --error=Trinity.%j.err
cd $SLURM_SUBMIT_DIR
ml Trinity/2.6.6-foss-2016b
Trinity [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
ml Trinity/2.6.6-foss-2016b Trinity --show_full_usage_info ############################################################################### # ______ ____ ____ ____ ____ ______ __ __ | || \ | || \ | || || | | | || D ) | | | _ | | | | || | | |_| |_|| / | | | | | | | |_| |_|| ~ | | | | \ | | | | | | | | | |___, | | | | . \ | | | | | | | | | | | |__| |__|\_||____||__|__||____| |__| |____/ # # # Required: # # --seqType <string> :type of reads: ('fa' or 'fq') # # --max_memory <string> :suggested max memory to use by Trinity where limiting can be enabled. (jellyfish, sorting, etc) # provided in Gb of RAM, ie. '--max_memory 10G' # # If paired reads: # --left <string> :left reads, one or more file names (separated by commas, no spaces) # --right <string> :right reads, one or more file names (separated by commas, no spaces) # # Or, if unpaired reads: # --single <string> :single reads, one or more file names, comma-delimited (note, if single file contains pairs, can use flag: --run_as_paired ) # # Or, # --samples_file <string> tab-delimited text file indicating biological replicate relationships. # ex. # cond_A cond_A_rep1 A_rep1_left.fq A_rep1_right.fq # cond_A cond_A_rep2 A_rep2_left.fq A_rep2_right.fq # cond_B cond_B_rep1 B_rep1_left.fq B_rep1_right.fq # cond_B cond_B_rep2 B_rep2_left.fq B_rep2_right.fq # # # if single-end instead of paired-end, then leave the 4th column above empty. # #################################### ## Misc: ######################### # # --SS_lib_type <string> :Strand-specific RNA-Seq read orientation. # if paired: RF or FR, # if single: F or R. (dUTP method = RF) # See web documentation. # # --CPU <int> :number of CPUs to use, default: 2 # --min_contig_length <int> :minimum assembled contig length to report # (def=200) # # --long_reads <string> :fasta file containing error-corrected or circular consensus (CCS) pac bio reads # (** note: experimental parameter **, this functionality continues to be under development) # # --genome_guided_bam <string> :genome guided mode, provide path to coordinate-sorted bam file. # (see genome-guided param section under --show_full_usage_info) # # --jaccard_clip :option, set if you have paired reads and # you expect high gene density with UTR # overlap (use FASTQ input file format # for reads). # (note: jaccard_clip is an expensive # operation, so avoid using it unless # necessary due to finding excessive fusion # transcripts w/o it.) # # --trimmomatic :run Trimmomatic to quality trim reads # see '--quality_trimming_params' under full usage info for tailored settings. # # # --no_normalize_reads :Do *not* run in silico normalization of reads. Defaults to max. read coverage of 50. # see '--normalize_max_read_cov' under full usage info for tailored settings. # (note, as of Sept 21, 2016, normalization is on by default) # # --no_distributed_trinity_exec :do not run Trinity phase 2 (assembly of partitioned reads), and stop after generating command list. # # # --output <string> :name of directory for output (will be # created if it doesn't already exist) # default( your current working directory: "/home/yhuang/projects/kpan/1.5.9/trinity_out_dir" # note: must include 'trinity' in the name as a safety precaution! ) # # --workdir <string> :where Trinity phase-2 assembly computation takes place (defaults to --output setting). # (can set this to a node-local drive or RAM disk) # # --full_cleanup :only retain the Trinity fasta file, rename as ${output_dir}.Trinity.fasta # # --cite :show the Trinity literature citation # # --verbose :provide additional job status info during the run. # # --version :reports Trinity version (Trinity-v2.6.6) and exits. # # --show_full_usage_info :show the many many more options available for running Trinity (expert usage). # # --KMER_SIZE <int> :kmer length to use (default: 25) max=32 # # --prep :Only prepare files (high I/O usage) and stop before kmer counting. # # --no_cleanup :retain all intermediate input files. # # --no_version_check :dont run a network check to determine if software updates are available. # # --monitoring :use collectl to monitor all steps of Trinity # --monitor_sec <int> : number of seconds for each interval of runtime monitoring (default: 60) # #################################################### # Inchworm and K-mer counting-related options: ##### # # --min_kmer_cov <int> :min count for K-mers to be assembled by # Inchworm (default: 1) # --inchworm_cpu <int> :number of CPUs to use for Inchworm, default is min(6, --CPU option) # # --no_run_inchworm :stop after running jellyfish, before inchworm. (phase 1, read clustering only) # ################################### # Chrysalis-related options: ###### # # --max_reads_per_graph <int> :maximum number of reads to anchor within # a single graph (default: 200000) # --min_glue <int> :min number of reads needed to glue two inchworm contigs # together. (default: 2) # # --no_bowtie :dont run bowtie to use pair info in chrysalis clustering. # # --no_run_chrysalis :stop after running inchworm, before chrysalis. (phase 1, read clustering only) # ##################################### ### Butterfly-related options: #### # # --bfly_opts <string> :additional parameters to pass through to butterfly # (see butterfly options: java -jar Butterfly.jar ). # (note: only for expert or experimental use. Commonly used parameters are exposed through this Trinity menu here). # # # Butterfly read-pair grouping settings (used to define 'pair paths'): # # --group_pairs_distance <int> :maximum length expected between fragment pairs (default: 500) # (reads outside this distance are treated as single-end) # # /////////////////////////////////////////////// # Butterfly default reconstruction mode settings. # # --path_reinforcement_distance <int> :minimum overlap of reads with growing transcript # path (default: PE: 25, SE: 25) # Set to 1 for the most lenient path extension requirements. # # # ///////////////////////////////////////// # Butterfly transcript reduction settings: # # --no_path_merging : all final transcript candidates are output (including SNP variations, however, some SNPs may be unphased) # # By default, alternative transcript candidates are merged (in reality, discarded) if they are found to be too similar, according to the following logic: # # (identity=(numberOfMatches/shorterLen) > 95.0% or if we have <= 2 mismatches) and if we have internal gap lengths <= 10 # # with parameters as: # # --min_per_id_same_path <int> default: 98 min percent identity for two paths to be merged into single paths # --max_diffs_same_path <int> default: 2 max allowed differences encountered between path sequences to combine them # --max_internal_gap_same_path <int> default: 10 maximum number of internal consecutive gap characters allowed for paths to be merged into single paths. # # If, in a comparison between two alternative transcripts, they are found too similar, the transcript with the greatest cumulative # compatible read (pair-path) support is retained, and the other is discarded. # # # ////////////////////////////////////////////// # Butterfly Java and parallel execution settings. # # --bflyHeapSpaceMax <string> :java max heap space setting for butterfly # (default: 4G) => yields command # 'java -Xmx4G -jar Butterfly.jar ... $bfly_opts' # --bflyHeapSpaceInit <string> :java initial heap space settings for # butterfly (default: 1G) => yields command # 'java -Xms1G -jar Butterfly.jar ... $bfly_opts' # --bflyGCThreads <int> :threads for garbage collection # (default: 2)) # --bflyCPU <int> :CPUs to use (default will be normal # number of CPUs; e.g., 2) # --bflyCalculateCPU :Calculate CPUs based on 80% of max_memory # divided by maxbflyHeapSpaceMax # # --bfly_jar <string> : /path/to/Butterfly.jar, otherwise default # Trinity-installed version is used. # # ################################################################################ #### Quality Trimming Options #### # # --quality_trimming_params <string> defaults to: "ILLUMINACLIP:/usr/local/apps/eb/Trinity/2.6.6-foss-2016b/trinityrnaseq-Trinity-v2.6.6/trinity-plugins/Trimmomatic/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:25" # ################################################################################ #### In silico Read Normalization Options ### # # --normalize_max_read_cov <int> defaults to 50 # --normalize_by_read_set run normalization separate for each pair of fastq files, # then one final normalization that combines the individual normalized reads. # Consider using this if RAM limitations are a consideration. # ################################################################################ #### Genome-guided de novo assembly # # * required: # # --genome_guided_max_intron <int> :maximum allowed intron length (also maximum fragment span on genome) # # * optional: # # --genome_guided_min_coverage <int> :minimum read coverage for identifying and expressed region of the genome. (default: 1) # # --genome_guided_min_reads_per_partition <int> :default min of 10 reads per partition # # ####################################################################### # Trinity phase 2 (parallel assembly of read clusters) Options: ####### # # --grid_exec <string> :your command-line utility for submitting jobs to the grid. # This should be a command line tool that accepts a single parameter: # ${your_submission_tool} /path/to/file/containing/commands.txt # and this submission tool should exit(0) upon successful # completion of all commands. # # --grid_node_CPU <int> number of threads for each parallel process to leverage. (default: 1) # # --grid_node_max_memory <string> max memory targeted for each grid node. (default: 1G) # # The --grid_node_CPU and --grid_node_max_memory are applied as # the --CPU and --max_memory parameters for the Trinity jobs run in # Trinity Phase 2 (assembly of read clusters) # # # ############################################################################### # # *Note, a typical Trinity command might be: # # Trinity --seqType fq --max_memory 50G --left reads_1.fq --right reads_2.fq --CPU 6 # # # and for Genome-guided Trinity: # # Trinity --genome_guided_bam rnaseq_alignments.csorted.bam --max_memory 50G # --genome_guided_max_intron 10000 --CPU 6 # # see: /usr/local/apps/eb/Trinity/2.6.6-foss-2016b/trinityrnaseq-Trinity-v2.6.6/sample_data/test_Trinity_Assembly/ # for sample data and 'runMe.sh' for example Trinity execution # # For more details, visit: http://trinityrnaseq.github.io # ###############################################################################
Installation
Source code is obtained from Trinity
System
64-bit Linux