Difference between revisions of "Trinity-Sapelo2"

From Research Computing Center Wiki
Jump to navigation Jump to search
 
(40 intermediate revisions by 2 users not shown)
Line 10: Line 10:
 
=== Version ===
 
=== Version ===
  
2.5.1, 2.8.4, 2.8.5, 2.9.1, 2.10.0
+
2.5.1, 2.8.4, 2.8.5, 2.15.1
 
   
 
   
 
=== Author / Distributor ===
 
=== Author / Distributor ===
Line 29: Line 29:
  
 
Butterfly then processes the individual graphs in parallel, tracing the paths that reads and pairs of reads take within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and teasing apart transcripts that corresponds to paralogous genes."
 
Butterfly then processes the individual graphs in parallel, tracing the paths that reads and pairs of reads take within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and teasing apart transcripts that corresponds to paralogous genes."
 +
 +
[[#top|Back to Top]]
  
 
=== Running Program ===
 
=== Running Program ===
+
 
 +
====General Instructions====
 +
 
 
Also refer to [[Running Jobs on Sapelo2]]
 
Also refer to [[Running Jobs on Sapelo2]]
  
Trinity 2.5.1, 2.8.4, 2.8.5, 2.9.1, and 2.10.0 are installed at Sapelo2.  
+
Trinity v2.5.1, v2.8.4, v2.8.5, and v2.15.1 are installed at Sapelo2.
  
 
* Mostly trinity needs to run at large memory queue, namely highmem_p, as in the sample script below.
 
* Mostly trinity needs to run at large memory queue, namely highmem_p, as in the sample script below.
 +
* '''<u>Trinity is best run utilizing /lscratch, please click [[Trinity-Sapelo2#Utilizing .2Flscratch in Trinity Job Submission Script|here]] or scroll down to see how to configure your job submission script to use /lscratch</u>'''
  
 
* Here is a post for memory estimates. For a 4 billion base mouse, it uses about 50 GB memory at peak. [http://trinityrnaseq.github.io/performance/index.html performance]
 
* Here is a post for memory estimates. For a 4 billion base mouse, it uses about 50 GB memory at peak. [http://trinityrnaseq.github.io/performance/index.html performance]
  
* Do not ask for more than 24 CPUs at the command and double the quantity of requesting CPU from queue. e.g. at the following, command ask for 8 CPU and at the header, it asks for ppn=16.   
+
* Do not ask for more than 24 CPU cores at the command and double the quantity of requesting CPU from queue. e.g. at the following, command ask for 8 CPU cores and at the header, it asks for --cpus-per-task=8.   
  
 
* Using --normalize_reads could tremendously reduce the needs of memory. For this feature, please ensure there is no space in sequence name and quality score names, and adding "/1", "/2" to sequence name to make each seq name unique for pair reads in fasta /fq header.  
 
* Using --normalize_reads could tremendously reduce the needs of memory. For this feature, please ensure there is no space in sequence name and quality score names, and adding "/1", "/2" to sequence name to make each seq name unique for pair reads in fasta /fq header.  
Line 48: Line 53:
 
* If previous jobs left dir trinity_out_dir, remove it before start another trinity job.
 
* If previous jobs left dir trinity_out_dir, remove it before start another trinity job.
  
 
+
====Trinity v2.5.1, v2.8.4 and v2.8.5 Singularity Container on Sapelo2====
====Trinity 2.5.1, 2.8.4, and 2.8.5 are installed as a singularity container on Sapelo2====
 
  
 
On the Sapelo2 cluster, singularity containers have access to the users home directory ($HOME), scratch directory (/scratch), lscratch directory (/lscratch), /tmp directory (/tmp) inside the container.
 
On the Sapelo2 cluster, singularity containers have access to the users home directory ($HOME), scratch directory (/scratch), lscratch directory (/lscratch), /tmp directory (/tmp) inside the container.
Line 55: Line 59:
 
All environment variables set before executing singularity command is available inside the container.
 
All environment variables set before executing singularity command is available inside the container.
  
To run Trinity 2.8.4, sample command is as below:
+
To run Trinity v2.5.1, sample command is as below:
 +
 
 +
<pre class="gcommand">
 +
singularity exec /apps/singularity-images/trinity-2.5.1.simg COMMAND
 +
</pre>
 +
 
 +
where COMMAND should be replaced by the command you want to use.
 +
 
 +
To run Trinity v2.8.4, sample command is as below:
  
 
<pre class="gcommand">
 
<pre class="gcommand">
 
singularity exec /apps/singularity-images/trinity-2.8.4.simg COMMAND
 
singularity exec /apps/singularity-images/trinity-2.8.4.simg COMMAND
 
</pre>
 
</pre>
 +
 +
To run Trinity v2.8.5, sample command is as below:
 +
 +
<pre class="gcommand">
 +
singularity exec /apps/singularity-images/trinity-2.8.5.simg COMMAND
 +
</pre>
 +
 
where COMMAND should be replaced by the command you want to use.
 
where COMMAND should be replaced by the command you want to use.
  
Example of a shell script sub.sh to run on the batch queue:  
+
Example of a shell script sub.sh to run Trinity v2.8.4 on the batch partition:  
  
 
<pre class="gscript">
 
<pre class="gscript">
Line 68: Line 87:
 
#SBATCH --job-name=j_Trinity # Job name (j_Trinity)
 
#SBATCH --job-name=j_Trinity # Job name (j_Trinity)
 
#SBATCH --partition=batch # Partition name (batch or highmem_p)
 
#SBATCH --partition=batch # Partition name (batch or highmem_p)
#SBATCH --ntasks=1 # Run job in single task, by default using 1 CPU core on a single node
+
#SBATCH --ntasks=1 # Run job in single task
#SBATCH --cpus-per-task=8 # CPU core count per task, by default 1 CPU core per task
+
#SBATCH --cpus-per-task=8 # CPU core count per task
#SBATCH --mem=100G # Memory per node (4GB); by default using M as unit
+
#SBATCH --mem=100G # Memory per node (100GB)
#SBATCH --time=1:00:00              # Time limit hrs:min:sec or days-hours:minutes:seconds
+
#SBATCH --time=48:00:00              # Time limit hrs:min:sec or days-hours:minutes:seconds
 
#SBATCH --export=NONE                  # Do not export any user’s explicit environment variables to compute node
 
#SBATCH --export=NONE                  # Do not export any user’s explicit environment variables to compute node
#SBATCH --output=%x_%j.out # Standard output log
+
#SBATCH --output=log.%j.out # Standard output log
#SBATCH --error=%x_%j.err # Standard error log
+
#SBATCH --error=log.%j.err # Standard error log
 +
 
 
#SBATCH --mail-user=username@uga.edu    # Where to send mail
 
#SBATCH --mail-user=username@uga.edu    # Where to send mail
#SBATCH --mail-type=END,FAIL         # Mail events (BEGIN, END, FAIL, ALL)
+
#SBATCH --mail-type=ALL         # Mail events (BEGIN, END, FAIL, ALL)
  
 
cd $SLURM_SUBMIT_DIR
 
cd $SLURM_SUBMIT_DIR
Line 83: Line 103:
 
</pre>  
 
</pre>  
  
Submit the job to the queue with
+
Example to run Trinity script align_and_estimate_abundance.pl:
 +
 
 +
<pre class="gscript">
 +
#!/bin/bash
 +
#SBATCH --job-name=j_Trinity # Job name (j_Trinity)
 +
#SBATCH --partition=batch # Partition name (batch or highmem_p)
 +
#SBATCH --ntasks=1 # Run job in single task
 +
#SBATCH --cpus-per-task=1 # CPU core count per task
 +
#SBATCH --mem=20G # Memory per node (100GB)
 +
#SBATCH --time=48:00:00              # Time limit hrs:min:sec or days-hours:minutes:seconds
 +
#SBATCH --export=NONE                  # Do not export any user’s explicit environment variables to compute node
 +
#SBATCH --output=%x_%j.out # Standard output log
 +
#SBATCH --error=%x_%j.err # Standard error log
 +
 
 +
#SBATCH --mail-user=username@uga.edu    # Where to send mail
 +
#SBATCH --mail-type=ALL          # Mail events (BEGIN, END, FAIL, ALL)
 +
 
 +
cd $SLURM_SUBMIT_DIR
  
<pre  class="gcommand">
+
singularity exec /apps/singularity-images/trinity-2.8.4.simg /usr/local/bin/trinityrnaseq/util/align_and_estimate_abundance.pl [options]
qsub  ./sub.sh
 
 
</pre>
 
</pre>
  
Example to run script of rsem
+
Where [options] need to be added as appropriate. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number cores per node, and the job name need to be modified appropriately as well.
<pre class="gscript">
+
 
#!/bin/bash
+
[[#top|Back to Top]]
 +
 
 +
====Trinity v2.15.1 Software Module on Sapelo2 ====
 +
 
 +
*version 2.15.1 running with Python3 is installed at /apps/eb/Trinity/2.15.1-foss-2022a
  
#PBS -N j_s_rsem
+
To run Trinity v2.15.1, please load the module:
#PBS -q batch
 
#PBS -l nodes=1:ppn=1
 
#PBS -l walltime=480:00:00
 
#PBS -l mem=100gb
 
  
cd $PBS_O_WORKDIR
+
<pre class="gcommand">
singularity exec /usr/local/singularity-images/trinity-2.5.1--0.simg /usr/local/bin/trinityrnaseq/util/align_and_estimate_abundance.pl [-parameters]
+
module load Trinity/2.15.1-foss-2022a
 
</pre>
 
</pre>
  
====Trinity 2.6.6 is installed at /usr/local/apps/eb/Trinity/2.6.6-foss-2016b====
+
Example of a shell script sub.sh to run Trinity v2.15.1 on the batch partition:
  
Here is an example of a shell script sub.sh to run on at the batch queue:
 
 
<pre class="gscript">
 
<pre class="gscript">
#PBS -S /bin/bash
+
#!/bin/bash
#PBS -N j_trinity
+
#SBATCH --job-name=j_Trinity # Job name (j_Trinity)
#PBS -q highmem_q
+
#SBATCH --partition=batch # Partition name (batch or highmem_p)
#PBS -l nodes=1:ppn=16
+
#SBATCH --ntasks=1 # Run job in single task
#PBS -l walltime=480:00:00
+
#SBATCH --cpus-per-task=8 # CPU core count per task
#PBS -l mem=100gb
+
#SBATCH --mem=100G # Memory per node (100GB)
 +
#SBATCH --time=48:00:00             # Time limit hrs:min:sec or days-hours:minutes:seconds
 +
#SBATCH --export=NONE                  # Do not export any user’s explicit environment variables to compute node
 +
#SBATCH --output=log.%j.out # Standard output log
 +
#SBATCH --error=log.%j.err # Standard error log
 +
 
 +
#SBATCH --mail-user=username@uga.edu    # Where to send mail
 +
#SBATCH --mail-type=ALL          # Mail events (BEGIN, END, FAIL, ALL)
 +
 
 +
cd $SLURM_SUBMIT_DIR
  
cd $PBS_O_WORKDIR
+
module load Trinity/2.15.1-foss-2022a
module load Trinity/2.6.6-foss-2016b
 
  
Trinity --seqType <string> --max_memory 100G --CPU 8 --no_version_check --full_cleanup --normalize_reads   [options]
+
Trinity --seqType <string> --max_memory 100G --CPU 8 --no_version_check --full_cleanup --normalize_reads  
 
</pre>  
 
</pre>  
  
Example to run script of rsem
+
Example to run Trinity script align_and_estimate_abundance.pl:
 +
 
 
<pre class="gscript">
 
<pre class="gscript">
 
#!/bin/bash
 
#!/bin/bash
 +
#SBATCH --job-name=j_Trinity # Job name (j_Trinity)
 +
#SBATCH --partition=batch # Partition name (batch or highmem_p)
 +
#SBATCH --ntasks=1 # Run job in single task
 +
#SBATCH --cpus-per-task=1 # CPU core count per task
 +
#SBATCH --mem=20G # Memory per node (100GB)
 +
#SBATCH --time=48:00:00              # Time limit hrs:min:sec or days-hours:minutes:seconds
 +
#SBATCH --export=NONE                  # Do not export any user’s explicit environment variables to compute node
 +
#SBATCH --output=%x_%j.out # Standard output log
 +
#SBATCH --error=%x_%j.err # Standard error log
  
#PBS -N j_rsem
+
#SBATCH --mail-user=username@uga.edu    # Where to send mail
#PBS -q batch
+
#SBATCH --mail-type=ALL          # Mail events (BEGIN, END, FAIL, ALL)
#PBS -l nodes=1:ppn=1
 
#PBS -l walltime=480:00:00
 
#PBS -l mem=100gb
 
  
cd $PBS_O_WORKDIR
+
cd $SLURM_SUBMIT_DIR
module load Trinity/2.6.6-foss-2016b
 
/usr/local/apps/eb/Trinity/2.6.6-foss-2016b/trinityrnaseq-Trinity-v2.6.6/util/align_and_estimate_abundance.pl  [-parameters]
 
</pre>
 
  
Where options need to be added as appropriate. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number cores per node, and the job name need to be modified appropriately as well.  
+
module load Trinity/2.15.1-foss-2022a
  
Here is an example of job submission
+
${EBROOTTRINITY}/trinityrnaseq-v2.10.0/util/align_and_estimate_abundance.pl [options]
<pre  class="gcommand">
 
qsub  ./sub.sh
 
 
</pre>
 
</pre>
  
====Trinity 2.5.1 is installed as a singularity container on Sapelo2====
+
Where EBROOTTRINITY is the env variable storing Trinity installation pat, i.e., /apps/eb/Trinity/2.15.1-foss-2022a ; [options] need to be added as appropriate.  Other parameters of the job, such as the maximum wall clock time, maximum memory, the number cores per node, and the job name need to be modified appropriately as well.
  
For information on Singularity please visit: http://singularity.lbl.gov/
+
[[#top|Back to Top]]
  
On the Sapelo2 cluster, singularity containers have access to the users home directory ($HOME), lustre1 directory (/lustre1), lscratch directory (/lscratch), /tmp directory (/tmp) inside the container.
+
===Utilizing /lscratch in Trinity Job Submission Script===
 +
* Utilizing /lscratch allows Trinity jobs to run much faster and smoother and also negates the effects of heavy IO traffic.
 +
* This /lscratch directory resides on the local hard drive of the compute node that your job gets allocated to (which means you cannot access this directory outside the job submission script).
 +
* Below is a sample job submission script including steps so you can see what you need to add to your job submission script in order to make your Trinity job utilize /lscratch.
 +
** As well as adding the 6 steps below, please also add the Slurm header --gres=lscratch:___ which requests space in /lscratch. The default units for this is GB and in the example submission script below, we are requesting 200GB of space with the line '''''#SBATCH --gres=lscratch:200''''' (it is the last Slurm header). Please only request as much space in /lscratch as is needed for your job.
  
All environment variables set before executing singularity command is available inside the container.
+
<pre class="gscript">
 +
#!/bin/bash
 +
#SBATCH --job-name=j_Trinity # Job name (j_Trinity)
 +
#SBATCH --partition=batch # Partition name (batch or highmem_p)
 +
#SBATCH --ntasks=1 # Run job in single task
 +
#SBATCH --cpus-per-task=36 # CPU core count per task
 +
#SBATCH --mem=128G # Memory per node (100GB)
 +
#SBATCH --time=48:00:00              # Time limit hrs:min:sec or days-hours:minutes:seconds
 +
#SBATCH --export=NONE                  # Do not export any user’s explicit environment variables to compute node
 +
#SBATCH --output=log.%j.out # Standard output log
 +
#SBATCH --error=log.%j.err # Standard error log
 +
#SBATCH --mail-user=username@uga.edu    # Where to send mail
 +
#SBATCH --mail-type=ALL          # Mail events (BEGIN, END, FAIL, ALL)
 +
#SBATCH --gres=lscratch:200
  
To run Trinity 2.5.1, sample command is as below:
+
cd $SLURM_SUBMIT_DIR
 +
 +
# Step 1: create a directory in /lscratch
  
<pre class="gcommand">
+
mkdir -p /lscratch/${USER}/${SLURM_JOB_ID}/trinity_outputs
singularity exec /usr/local/singularity-images/trinity-2.5.1--0.simg COMMAND
 
</pre>
 
where COMMAND should be replaced by the command you want to use.
 
  
Example of a shell script sub.sh to run on the batch queue:
 
  
<pre class="gscript">
+
# Step 2: copy over any input files.
#!/bin/bash
 
  
#PBS -N j_s_trinity
+
cp file1.fastq.gz /lscratch/${USER}/${SLURM_JOB_ID}/trinity_outputs
#PBS -q highmem_q
+
cp file2.fastq.gz /lscratch/${USER}/${SLURM_JOB_ID}/trinity_outputs
#PBS -l nodes=1:ppn=16
+
cp file3.bam /lscratch/${USER}/${SLURM_JOB_ID}/trinity_outputs
#PBS -l walltime=480:00:00
 
#PBS -l mem=100gb
 
 
   
 
   
cd $PBS_O_WORKDIR
 
  
singularity exec /usr/local/singularity-images/trinity-2.5.1--0.simg Trinity --seqType <string> --max_memory 100G --CPU 8 --no_version_check --full_cleanup --normalize_reads   
+
# Step 3: change directories into /lscratch
</pre>
+
 
 +
cd /lscratch/${USER}/${SLURM_JOB_ID}/trinity_outputs
 +
 
 +
 
 +
# Step 4: your normal job lines (loading Trinity and running Trinity command)
 +
 
 +
module load Trinity/2.15.1-foss-2022a
 +
 
 +
Trinity --seqType fq --left 'file1.fastq.gz' --right 'file2.fastq.gz' --CPU 36 --max_memory 120G --output '/lscratch/${USER}/${SLURM_JOB_ID}/trinity_outputs/trinity'
  
Submit the job to the queue with
+
Trinity --genome_guided_bam 'file3.bam' --genome_guided_max_intron 10000 --CPU 36 --max_memory 120G --output '/lscratch/${USER}/${SLURM_JOB_ID}/trinity_outputs/trinity'
  
<pre  class="gcommand">
+
### NOTE: the directory specified in --output is the directory created in step 1 with the addition of /trinity at the end. This is because Trinity writes some files in the --output dir and some right above it.
qsub  ./sub.sh
 
</pre>
 
  
Example to run script of rsem
 
<pre class="gscript">
 
#!/bin/bash
 
  
#PBS -N j_s_rsem
+
# Step 5: copy output files back over to a certain location in /scratch which you can change below
#PBS -q batch
 
#PBS -l nodes=1:ppn=1
 
#PBS -l walltime=480:00:00
 
#PBS -l mem=100gb
 
  
cd $PBS_O_WORKDIR
+
cp -r /lscratch/${USER}/* /scratch/${USER}/some/directory
singularity exec /usr/local/singularity-images/trinity-2.5.1--0.simg align_and_estimate_abundance.pl [-parameters]
+
</pre>
 
  
====Trinity r20140717 is installed at /usr/local/apps/gb/trinity/r20140717====
+
# Step 6: clean up /lscratch directory **VERY IMPORTANT STEP**
  
=== Documentation ===
+
rm -rf /lscratch/${USER}/${SLURM_JOB_ID}
 +
</pre>
 
   
 
   
More details at [http://trinityrnaseq.github.io/ Trinity]
+
* Please feel free to submit a ticket to us if you would like further help, explanations of how /lscratch works, or to even look over your submission script to ensure it is correctly utilizing /lscratch!
 +
 
 +
===Job Submission===
  
<pre  class="gcommand">
+
Submit a job submission script (sub.sh) to Sapelo2:
[shtsai@n204 ~]$ singularity exec /usr/local/singularity-images/trinity-2.5.1--0.simg Trinity --show_full_usage_info
 
perl: warning: Setting locale failed.
 
perl: warning: Please check that your locale settings:
 
LANGUAGE = (unset),
 
LC_ALL = (unset),
 
LANG = "en_US.UTF-8"
 
    are supported and installed on your system.
 
perl: warning: Falling back to the standard locale ("C").
 
BusyBox v1.22.1 (2014-05-23 01:24:27 UTC) multi-call binary.
 
  
Usage: sort [-nrugMcszbdfimSTokt] [-o FILE] [-k start[.offset][opts][,end[.offset][opts]] [-t CHAR] [FILE]...
+
<pre class="gcommand">
 +
sbatch  sub.sh
 +
</pre>
  
Sort lines of text
+
[[#top|Back to Top]]
  
-b Ignore leading blanks
+
===Documentation ===
-c Check whether input is sorted
+
-d Dictionary order (blank or alphanumeric only)
+
More details at [http://trinityrnaseq.github.io/ Trinity]
-f Ignore case
 
-g General numerical sort
 
-i Ignore unprintable characters
 
-k Sort key
 
-M Sort month
 
-n Sort numbers
 
-o Output to file
 
-k Sort by key
 
-t CHAR Key separator
 
-r Reverse sort order
 
-s Stable (don't sort ties alphabetically)
 
-u Suppress duplicate lines
 
-z Lines are terminated by NUL, not newline
 
-mST Ignored for GNU compatibility
 
  
 +
<pre class="gcommand">
 +
[cft07037@b1-24 ~]$ ml Trinity/2.15.1-foss-2022a
 +
To execute picard run: java -jar $EBROOTPICARD/picard.jar
 +
[cft07037@b1-24 ~]$ Trinity --show_full_usage_info
  
  
Line 245: Line 285:
 
       |  |  |  .  \ |  | |  |  | |  |  |  |  |    |
 
       |  |  |  .  \ |  | |  |  | |  |  |  |  |    |
 
       |__|  |__|\_||____||__|__||____|  |__|  |____/
 
       |__|  |__|\_||____||__|__||____|  |__|  |____/
 +
 +
    Trinity-v2.15.1
 +
  
 
#
 
#
Line 282: Line 325:
 
#  --CPU <int>                    :number of CPUs to use, default: 2
 
#  --CPU <int>                    :number of CPUs to use, default: 2
 
#  --min_contig_length <int>      :minimum assembled contig length to report
 
#  --min_contig_length <int>      :minimum assembled contig length to report
#                                  (def=200)
+
#                                  (def=200, must be >= 100)
 
#
 
#
 
#  --long_reads <string>          :fasta file containing error-corrected or circular consensus (CCS) pac bio reads
 
#  --long_reads <string>          :fasta file containing error-corrected or circular consensus (CCS) pac bio reads
Line 289: Line 332:
 
#  --genome_guided_bam <string>    :genome guided mode, provide path to coordinate-sorted bam file.
 
#  --genome_guided_bam <string>    :genome guided mode, provide path to coordinate-sorted bam file.
 
#                                  (see genome-guided param section under --show_full_usage_info)
 
#                                  (see genome-guided param section under --show_full_usage_info)
 +
#
 +
#  --long_reads_bam <string>      :long reads to include for genome-guided Trinity
 +
#                                  (bam file consists of error-corrected or circular consensus (CCS) pac bio read aligned to the genome)
 
#
 
#
 
#  --jaccard_clip                  :option, set if you have paired reads and
 
#  --jaccard_clip                  :option, set if you have paired reads and
Line 301: Line 347:
 
#  --trimmomatic                  :run Trimmomatic to quality trim reads
 
#  --trimmomatic                  :run Trimmomatic to quality trim reads
 
#                                        see '--quality_trimming_params' under full usage info for tailored settings.
 
#                                        see '--quality_trimming_params' under full usage info for tailored settings.
#                                 
 
#
 
#  --no_normalize_reads            :Do *not* run in silico normalization of reads. Defaults to max. read coverage of 50.
 
#                                      see '--normalize_max_read_cov' under full usage info for tailored settings.
 
#                                      (note, as of Sept 21, 2016, normalization is on by default)
 
#   
 
#  --no_distributed_trinity_exec  :do not run Trinity phase 2 (assembly of partitioned reads), and stop after generating command list.
 
#
 
 
#
 
#
 
#  --output <string>              :name of directory for output (will be
 
#  --output <string>              :name of directory for output (will be
 
#                                  created if it doesn't already exist)
 
#                                  created if it doesn't already exist)
#                                  default( your current working directory: "/home/shtsai/trinity_out_dir"  
+
#                                  default( your current working directory: "/home/cft07037/trinity_out_dir"  
 
#                                    note: must include 'trinity' in the name as a safety precaution! )
 
#                                    note: must include 'trinity' in the name as a safety precaution! )
#           
 
#  --workdir <string>              :where Trinity phase-2 assembly computation takes place (defaults to --output setting).
 
#                                  (can set this to a node-local drive or RAM disk)   
 
 
#   
 
#   
 
#  --full_cleanup                  :only retain the Trinity fasta file, rename as ${output_dir}.Trinity.fasta
 
#  --full_cleanup                  :only retain the Trinity fasta file, rename as ${output_dir}.Trinity.fasta
Line 324: Line 359:
 
#  --verbose                      :provide additional job status info during the run.
 
#  --verbose                      :provide additional job status info during the run.
 
#
 
#
#  --version                      :reports Trinity version (Trinity-v2.5.1) and exits.
+
#  --version                      :reports Trinity version (Trinity-v2.15.1) and exits.
 
#
 
#
 
#  --show_full_usage_info          :show the many many more options available for running Trinity (expert usage).
 
#  --show_full_usage_info          :show the many many more options available for running Trinity (expert usage).
  
 
#
 
#
#  --KMER_SIZE <int>              :kmer length to use (default: 25)  max=32
+
#  --no_super_reads                :turn off super-reads mode
 
#
 
#
 
#  --prep                          :Only prepare files (high I/O usage) and stop before kmer counting.
 
#  --prep                          :Only prepare files (high I/O usage) and stop before kmer counting.
Line 336: Line 371:
 
#
 
#
 
#  --no_version_check              :dont run a network check to determine if software updates are available.
 
#  --no_version_check              :dont run a network check to determine if software updates are available.
 +
#
 +
#  --no_symlink                    :dont symlink, just copy files instead (sets env var NO_SYMLINK=TRUE)
 
#
 
#
 
#  --monitoring                    :use collectl to monitor all steps of Trinity
 
#  --monitoring                    :use collectl to monitor all steps of Trinity
 
#    --monitor_sec <int>          : number of seconds for each interval of runtime monitoring (default: 60)
 
#    --monitor_sec <int>          : number of seconds for each interval of runtime monitoring (default: 60)
 
#   
 
#   
 +
#  --no_distributed_trinity_exec  :do not run Trinity phase 2 (assembly of partitioned reads), and stop after generating command list.
 +
#
 +
#  --workdir <string>              :where Trinity phase-2 assembly computation takes place (defaults to --output setting).
 +
#                                  (can set this to a node-local drive or RAM disk)   
 +
#
 
####################################################
 
####################################################
 
# Inchworm and K-mer counting-related options: #####
 
# Inchworm and K-mer counting-related options: #####
Line 356: Line 398:
 
#  --min_glue <int>              :min number of reads needed to glue two inchworm contigs
 
#  --min_glue <int>              :min number of reads needed to glue two inchworm contigs
 
#                                  together. (default: 2)  
 
#                                  together. (default: 2)  
 +
#
 +
#  --max_chrysalis_cluster_size <int>  :max number of Inchworm contigs to be included in a single Chrysalis cluster. (default: 25)
 
#
 
#
 
#  --no_bowtie                    :dont run bowtie to use pair info in chrysalis clustering.
 
#  --no_bowtie                    :dont run bowtie to use pair info in chrysalis clustering.
Line 363: Line 407:
 
#####################################
 
#####################################
 
###  Butterfly-related options:  ####
 
###  Butterfly-related options:  ####
 +
#
 +
#  --bfly_algorithm <string>      : assembly algorithm to use. Options: ORIGINAL PASAFLY
 
#
 
#
 
#  --bfly_opts <string>            :additional parameters to pass through to butterfly
 
#  --bfly_opts <string>            :additional parameters to pass through to butterfly
Line 389: Line 435:
 
#  By default, alternative transcript candidates are merged (in reality, discarded) if they are found to be too similar, according to the following logic:
 
#  By default, alternative transcript candidates are merged (in reality, discarded) if they are found to be too similar, according to the following logic:
 
#
 
#
#  (identity=(numberOfMatches/shorterLen) > 95.0% or if we have <= 2 mismatches) and if we have internal gap lengths <= 10
+
#  (identity=(numberOfMatches/shorterLen) > 98.0% or if we have <= 2 mismatches) and if we have internal gap lengths <= 10
 
#
 
#
 
#  with parameters as:
 
#  with parameters as:
Line 405: Line 451:
 
#
 
#
 
#  --bflyHeapSpaceMax <string>    :java max heap space setting for butterfly
 
#  --bflyHeapSpaceMax <string>    :java max heap space setting for butterfly
#                                  (default: 4G) => yields command
+
#                                  (default: 10G) => yields command
#                  'java -Xmx4G -jar Butterfly.jar ... $bfly_opts'
+
#                  'java -Xmx10G -jar Butterfly.jar ... $bfly_opts'
 
#  --bflyHeapSpaceInit <string>    :java initial heap space settings for
 
#  --bflyHeapSpaceInit <string>    :java initial heap space settings for
 
#                                  butterfly (default: 1G) => yields command
 
#                                  butterfly (default: 1G) => yields command
Line 424: Line 470:
 
#### Quality Trimming Options ####   
 
#### Quality Trimming Options ####   
 
#  
 
#  
#  --quality_trimming_params <string>  defaults to: "ILLUMINACLIP:/usr/local/share/trimmomatic/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:25"
+
#  --quality_trimming_params <string>  defaults to: "ILLUMINACLIP:/apps/eb/Trinity/2.15.1-foss-2022a/trinityrnaseq-v2.15.1/trinity-plugins/Trimmomatic/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:25"
 
#
 
#
 
################################################################################
 
################################################################################
 
####  In silico Read Normalization Options ###
 
####  In silico Read Normalization Options ###
 
#
 
#
#  --normalize_max_read_cov <int>      defaults to 50
+
#  --normalize_max_read_cov <int>      defaults to 200
 
#  --normalize_by_read_set              run normalization separate for each pair of fastq files,
 
#  --normalize_by_read_set              run normalization separate for each pair of fastq files,
 
#                                      then one final normalization that combines the individual normalized reads.
 
#                                      then one final normalization that combines the individual normalized reads.
 
#                                      Consider using this if RAM limitations are a consideration.
 
#                                      Consider using this if RAM limitations are a consideration.
 
#
 
#
################################################################################
+
# --just_normalize_reads              stop after performing read normalization
 +
#
 +
#  --no_normalize_reads            :Do *not* run in silico normalization of reads. Defaults to max. read coverage of 200.
 +
#                                      see '--normalize_max_read_cov' under full usage info for tailored settings.
 +
#                                      (Note, as of Sept 21, 2016, normalization is on by default)
 +
#                                      (*Turning off normalization is not recommended for most applications)
 +
#   
 +
#  --no_parallel_norm_stats            :Do not try to run the high-mem normalization stats generator in parallel for paired-end fastqs.
 +
#
 +
###############################################################################
 
#### Genome-guided de novo assembly
 
#### Genome-guided de novo assembly
 
#  
 
#  
Line 465: Line 520:
 
#              Trinity Phase 2 (assembly of read clusters)
 
#              Trinity Phase 2 (assembly of read clusters)
 
#
 
#
 +
#  --FORCE                              ignore failed commands from earlier run, continue on.
 +
#                                          (Note, this should only be used after you've
 +
#                                          already dealt with these failed commands directly as needed)
 +
#
 +
########################################################################
 +
# Singularity-related options
 +
#
 +
# --singularity_img <string>        :path to a Trinity singularity image to use
 +
#
 +
# --singularity_extra_params <string>  :additional parameters to include for the singularity command execution
 +
#
 +
#
 +
 
     #
 
     #
 
#
 
#
Line 473: Line 541:
 
#        Trinity --seqType fq --max_memory 50G --left reads_1.fq  --right reads_2.fq --CPU 6
 
#        Trinity --seqType fq --max_memory 50G --left reads_1.fq  --right reads_2.fq --CPU 6
 
#
 
#
 +
#            (if you have multiple samples, use --samples_file ... see above for details)
 
#
 
#
#    and for Genome-guided Trinity:
+
#    and for Genome-guided Trinity, provide a coordinate-sorted bam:
 
#
 
#
 
#        Trinity --genome_guided_bam rnaseq_alignments.csorted.bam --max_memory 50G
 
#        Trinity --genome_guided_bam rnaseq_alignments.csorted.bam --max_memory 50G
 
#                --genome_guided_max_intron 10000 --CPU 6
 
#                --genome_guided_max_intron 10000 --CPU 6
 
#
 
#
#    see: /usr/local/opt/trinity-2.5.1/sample_data/test_Trinity_Assembly/
+
#    see: /apps/eb/Trinity/2.15.1-foss-2022a/trinityrnaseq-v2.15.1/sample_data/test_Trinity_Assembly/
 
#          for sample data and 'runMe.sh' for example Trinity execution
 
#          for sample data and 'runMe.sh' for example Trinity execution
 
#
 
#
Line 485: Line 554:
 
#
 
#
 
###############################################################################
 
###############################################################################
 
 
</pre>
 
</pre>
 
[[#top|Back to Top]]
 
[[#top|Back to Top]]
  
=== Installation ===
+
=== Installation===
 
   
 
   
Installed as a singularity image: /usr/local/singularity-images/trinity-2.5.1--0.simg
+
Sources are downloaded from [https://trinityrnaseq.github.io Trinity]
+
 
=== System ===
+
===System===
 
64-bit Linux
 
64-bit Linux
 +
 +
[[#top|Back to Top]]

Latest revision as of 14:23, 10 January 2024

Category

Bioinformatics

Program On

Sapelo2

Version

2.5.1, 2.8.4, 2.8.5, 2.15.1

Author / Distributor

Trinity is now published online at Nature Biotechnology. The Broad Institute’s blog has a story on how the Trinity project came together.

More details at Trinity

Description

From Trinity:

"Trinity, developed at the Broad Institute, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-Seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. Briefly, the process works like so:

Inchworm assembles the RNA-Seq data into the unique sequences of transcripts, often generating full-length transcripts for a dominant isoform, but then reports just the unique portions of alternatively spliced transcripts.

Chrysalis clusters the Inchworm contigs into clusters and constructs complete de Bruijn graphs for each cluster. Each cluster represents the full transcriptonal complexity for a given gene (or sets of genes that share sequences in common). Chrysalis then partitions the full read set among these disjoint graphs.

Butterfly then processes the individual graphs in parallel, tracing the paths that reads and pairs of reads take within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and teasing apart transcripts that corresponds to paralogous genes."

Back to Top

Running Program

General Instructions

Also refer to Running Jobs on Sapelo2

Trinity v2.5.1, v2.8.4, v2.8.5, and v2.15.1 are installed at Sapelo2.

  • Mostly trinity needs to run at large memory queue, namely highmem_p, as in the sample script below.
  • Trinity is best run utilizing /lscratch, please click here or scroll down to see how to configure your job submission script to use /lscratch
  • Here is a post for memory estimates. For a 4 billion base mouse, it uses about 50 GB memory at peak. performance
  • Do not ask for more than 24 CPU cores at the command and double the quantity of requesting CPU from queue. e.g. at the following, command ask for 8 CPU cores and at the header, it asks for --cpus-per-task=8.
  • Using --normalize_reads could tremendously reduce the needs of memory. For this feature, please ensure there is no space in sequence name and quality score names, and adding "/1", "/2" to sequence name to make each seq name unique for pair reads in fasta /fq header.
  • Please use the --full_cleanup option to make sure Trinity cleans up after itself. This helps a lot in keeping the number of files on Lustre storage under control.
  • If previous jobs left dir trinity_out_dir, remove it before start another trinity job.

Trinity v2.5.1, v2.8.4 and v2.8.5 Singularity Container on Sapelo2

On the Sapelo2 cluster, singularity containers have access to the users home directory ($HOME), scratch directory (/scratch), lscratch directory (/lscratch), /tmp directory (/tmp) inside the container.

All environment variables set before executing singularity command is available inside the container.

To run Trinity v2.5.1, sample command is as below:

singularity exec /apps/singularity-images/trinity-2.5.1.simg COMMAND

where COMMAND should be replaced by the command you want to use.

To run Trinity v2.8.4, sample command is as below:

singularity exec /apps/singularity-images/trinity-2.8.4.simg COMMAND

To run Trinity v2.8.5, sample command is as below:

singularity exec /apps/singularity-images/trinity-2.8.5.simg COMMAND

where COMMAND should be replaced by the command you want to use.

Example of a shell script sub.sh to run Trinity v2.8.4 on the batch partition:

#!/bin/bash
#SBATCH --job-name=j_Trinity		# Job name (j_Trinity)
#SBATCH --partition=batch		# Partition name (batch or highmem_p)
#SBATCH --ntasks=1			# Run job in single task
#SBATCH --cpus-per-task=8	 	# CPU core count per task
#SBATCH --mem=100G			# Memory per node (100GB)
#SBATCH --time=48:00:00              	# Time limit hrs:min:sec or days-hours:minutes:seconds
#SBATCH --export=NONE                   # Do not export any user’s explicit environment variables to compute node
#SBATCH --output=log.%j.out		# Standard output log
#SBATCH --error=log.%j.err		# Standard error log

#SBATCH --mail-user=username@uga.edu    # Where to send mail
#SBATCH --mail-type=ALL          	# Mail events (BEGIN, END, FAIL, ALL)

cd $SLURM_SUBMIT_DIR

singularity exec /apps/singularity-images/trinity-2.8.4.simg Trinity --seqType <string> --max_memory 100G --CPU 8 --no_version_check --full_cleanup --normalize_reads    

Example to run Trinity script align_and_estimate_abundance.pl:

#!/bin/bash
#SBATCH --job-name=j_Trinity		# Job name (j_Trinity)
#SBATCH --partition=batch		# Partition name (batch or highmem_p)
#SBATCH --ntasks=1			# Run job in single task
#SBATCH --cpus-per-task=1	 	# CPU core count per task
#SBATCH --mem=20G			# Memory per node (100GB)
#SBATCH --time=48:00:00              	# Time limit hrs:min:sec or days-hours:minutes:seconds
#SBATCH --export=NONE                   # Do not export any user’s explicit environment variables to compute node
#SBATCH --output=%x_%j.out		# Standard output log
#SBATCH --error=%x_%j.err		# Standard error log

#SBATCH --mail-user=username@uga.edu    # Where to send mail
#SBATCH --mail-type=ALL          	# Mail events (BEGIN, END, FAIL, ALL)

cd $SLURM_SUBMIT_DIR

singularity exec /apps/singularity-images/trinity-2.8.4.simg /usr/local/bin/trinityrnaseq/util/align_and_estimate_abundance.pl [options]

Where [options] need to be added as appropriate. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number cores per node, and the job name need to be modified appropriately as well.

Back to Top

Trinity v2.15.1 Software Module on Sapelo2

  • version 2.15.1 running with Python3 is installed at /apps/eb/Trinity/2.15.1-foss-2022a

To run Trinity v2.15.1, please load the module:

module load Trinity/2.15.1-foss-2022a 

Example of a shell script sub.sh to run Trinity v2.15.1 on the batch partition:

#!/bin/bash
#SBATCH --job-name=j_Trinity		# Job name (j_Trinity)
#SBATCH --partition=batch		# Partition name (batch or highmem_p)
#SBATCH --ntasks=1			# Run job in single task
#SBATCH --cpus-per-task=8	 	# CPU core count per task
#SBATCH --mem=100G			# Memory per node (100GB)
#SBATCH --time=48:00:00              	# Time limit hrs:min:sec or days-hours:minutes:seconds
#SBATCH --export=NONE                   # Do not export any user’s explicit environment variables to compute node
#SBATCH --output=log.%j.out		# Standard output log
#SBATCH --error=log.%j.err		# Standard error log

#SBATCH --mail-user=username@uga.edu    # Where to send mail
#SBATCH --mail-type=ALL          	# Mail events (BEGIN, END, FAIL, ALL)

cd $SLURM_SUBMIT_DIR

module load Trinity/2.15.1-foss-2022a 

Trinity --seqType <string> --max_memory 100G --CPU 8 --no_version_check --full_cleanup --normalize_reads    

Example to run Trinity script align_and_estimate_abundance.pl:

#!/bin/bash
#SBATCH --job-name=j_Trinity		# Job name (j_Trinity)
#SBATCH --partition=batch		# Partition name (batch or highmem_p)
#SBATCH --ntasks=1			# Run job in single task
#SBATCH --cpus-per-task=1	 	# CPU core count per task
#SBATCH --mem=20G			# Memory per node (100GB)
#SBATCH --time=48:00:00              	# Time limit hrs:min:sec or days-hours:minutes:seconds
#SBATCH --export=NONE                   # Do not export any user’s explicit environment variables to compute node
#SBATCH --output=%x_%j.out		# Standard output log
#SBATCH --error=%x_%j.err		# Standard error log

#SBATCH --mail-user=username@uga.edu    # Where to send mail
#SBATCH --mail-type=ALL          	# Mail events (BEGIN, END, FAIL, ALL)

cd $SLURM_SUBMIT_DIR

module load Trinity/2.15.1-foss-2022a 

${EBROOTTRINITY}/trinityrnaseq-v2.10.0/util/align_and_estimate_abundance.pl [options]

Where EBROOTTRINITY is the env variable storing Trinity installation pat, i.e., /apps/eb/Trinity/2.15.1-foss-2022a ; [options] need to be added as appropriate. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number cores per node, and the job name need to be modified appropriately as well.

Back to Top

Utilizing /lscratch in Trinity Job Submission Script

  • Utilizing /lscratch allows Trinity jobs to run much faster and smoother and also negates the effects of heavy IO traffic.
  • This /lscratch directory resides on the local hard drive of the compute node that your job gets allocated to (which means you cannot access this directory outside the job submission script).
  • Below is a sample job submission script including steps so you can see what you need to add to your job submission script in order to make your Trinity job utilize /lscratch.
    • As well as adding the 6 steps below, please also add the Slurm header --gres=lscratch:___ which requests space in /lscratch. The default units for this is GB and in the example submission script below, we are requesting 200GB of space with the line #SBATCH --gres=lscratch:200 (it is the last Slurm header). Please only request as much space in /lscratch as is needed for your job.
#!/bin/bash
#SBATCH --job-name=j_Trinity		# Job name (j_Trinity)
#SBATCH --partition=batch		# Partition name (batch or highmem_p)
#SBATCH --ntasks=1			# Run job in single task
#SBATCH --cpus-per-task=36	 	# CPU core count per task
#SBATCH --mem=128G			# Memory per node (100GB)
#SBATCH --time=48:00:00              	# Time limit hrs:min:sec or days-hours:minutes:seconds
#SBATCH --export=NONE                   # Do not export any user’s explicit environment variables to compute node
#SBATCH --output=log.%j.out		# Standard output log
#SBATCH --error=log.%j.err		# Standard error log
#SBATCH --mail-user=username@uga.edu    # Where to send mail
#SBATCH --mail-type=ALL          	# Mail events (BEGIN, END, FAIL, ALL)
#SBATCH --gres=lscratch:200

cd $SLURM_SUBMIT_DIR
 
# Step 1: create a directory in /lscratch

mkdir -p /lscratch/${USER}/${SLURM_JOB_ID}/trinity_outputs


# Step 2: copy over any input files. 

cp file1.fastq.gz /lscratch/${USER}/${SLURM_JOB_ID}/trinity_outputs
cp file2.fastq.gz /lscratch/${USER}/${SLURM_JOB_ID}/trinity_outputs
cp file3.bam /lscratch/${USER}/${SLURM_JOB_ID}/trinity_outputs
 

# Step 3: change directories into /lscratch

cd /lscratch/${USER}/${SLURM_JOB_ID}/trinity_outputs


# Step 4: your normal job lines (loading Trinity and running Trinity command)

module load Trinity/2.15.1-foss-2022a 

Trinity --seqType fq --left 'file1.fastq.gz' --right 'file2.fastq.gz' --CPU 36 --max_memory 120G --output '/lscratch/${USER}/${SLURM_JOB_ID}/trinity_outputs/trinity'

Trinity --genome_guided_bam 'file3.bam' --genome_guided_max_intron 10000 --CPU 36 --max_memory 120G --output '/lscratch/${USER}/${SLURM_JOB_ID}/trinity_outputs/trinity'

### NOTE: the directory specified in --output is the directory created in step 1 with the addition of /trinity at the end. This is because Trinity writes some files in the --output dir and some right above it.


# Step 5: copy output files back over to a certain location in /scratch which you can change below

cp -r /lscratch/${USER}/* /scratch/${USER}/some/directory
 

# Step 6: clean up /lscratch directory **VERY IMPORTANT STEP**

rm -rf /lscratch/${USER}/${SLURM_JOB_ID}
  • Please feel free to submit a ticket to us if you would like further help, explanations of how /lscratch works, or to even look over your submission script to ensure it is correctly utilizing /lscratch!

Job Submission

Submit a job submission script (sub.sh) to Sapelo2:

sbatch  sub.sh

Back to Top

Documentation

More details at Trinity

[cft07037@b1-24 ~]$ ml Trinity/2.15.1-foss-2022a 
To execute picard run: java -jar $EBROOTPICARD/picard.jar
[cft07037@b1-24 ~]$ Trinity --show_full_usage_info



###############################################################################
#

     ______  ____   ____  ____   ____  ______  __ __
    |      ||    \ |    ||    \ |    ||      ||  |  |
    |      ||  D  ) |  | |  _  | |  | |      ||  |  |
    |_|  |_||    /  |  | |  |  | |  | |_|  |_||  ~  |
      |  |  |    \  |  | |  |  | |  |   |  |  |___, |
      |  |  |  .  \ |  | |  |  | |  |   |  |  |     |
      |__|  |__|\_||____||__|__||____|  |__|  |____/

    Trinity-v2.15.1


#
#
# Required:
#
#  --seqType <string>      :type of reads: ('fa' or 'fq')
#
#  --max_memory <string>      :suggested max memory to use by Trinity where limiting can be enabled. (jellyfish, sorting, etc)
#                            provided in Gb of RAM, ie.  '--max_memory 10G'
#
#  If paired reads:
#      --left  <string>    :left reads, one or more file names (separated by commas, no spaces)
#      --right <string>    :right reads, one or more file names (separated by commas, no spaces)
#
#  Or, if unpaired reads:
#      --single <string>   :single reads, one or more file names, comma-delimited (note, if single file contains pairs, can use flag: --run_as_paired )
#
#  Or,
#      --samples_file <string>         tab-delimited text file indicating biological replicate relationships.
#                                   ex.
#                                        cond_A    cond_A_rep1    A_rep1_left.fq    A_rep1_right.fq
#                                        cond_A    cond_A_rep2    A_rep2_left.fq    A_rep2_right.fq
#                                        cond_B    cond_B_rep1    B_rep1_left.fq    B_rep1_right.fq
#                                        cond_B    cond_B_rep2    B_rep2_left.fq    B_rep2_right.fq
#
#                      # if single-end instead of paired-end, then leave the 4th column above empty.
#
####################################
##  Misc:  #########################
#
#  --SS_lib_type <string>          :Strand-specific RNA-Seq read orientation.
#                                   if paired: RF or FR,
#                                   if single: F or R.   (dUTP method = RF)
#                                   See web documentation.
#
#  --CPU <int>                     :number of CPUs to use, default: 2
#  --min_contig_length <int>       :minimum assembled contig length to report
#                                   (def=200, must be >= 100)
#
#  --long_reads <string>           :fasta file containing error-corrected or circular consensus (CCS) pac bio reads
#                                   (** note: experimental parameter **, this functionality continues to be under development)
#
#  --genome_guided_bam <string>    :genome guided mode, provide path to coordinate-sorted bam file.
#                                   (see genome-guided param section under --show_full_usage_info)
#
#  --long_reads_bam <string>       :long reads to include for genome-guided Trinity
#                                  (bam file consists of error-corrected or circular consensus (CCS) pac bio read aligned to the genome)
#
#  --jaccard_clip                  :option, set if you have paired reads and
#                                   you expect high gene density with UTR
#                                   overlap (use FASTQ input file format
#                                   for reads).
#                                   (note: jaccard_clip is an expensive
#                                   operation, so avoid using it unless
#                                   necessary due to finding excessive fusion
#                                   transcripts w/o it.)
#
#  --trimmomatic                   :run Trimmomatic to quality trim reads
#                                        see '--quality_trimming_params' under full usage info for tailored settings.
#
#  --output <string>               :name of directory for output (will be
#                                   created if it doesn't already exist)
#                                   default( your current working directory: "/home/cft07037/trinity_out_dir" 
#                                    note: must include 'trinity' in the name as a safety precaution! )
#  
#  --full_cleanup                  :only retain the Trinity fasta file, rename as ${output_dir}.Trinity.fasta
#
#  --cite                          :show the Trinity literature citation
#
#  --verbose                       :provide additional job status info during the run.
#
#  --version                       :reports Trinity version (Trinity-v2.15.1) and exits.
#
#  --show_full_usage_info          :show the many many more options available for running Trinity (expert usage).

#
#  --no_super_reads                :turn off super-reads mode
#
#  --prep                          :Only prepare files (high I/O usage) and stop before kmer counting.
#
#  --no_cleanup                    :retain all intermediate input files.
#
#  --no_version_check              :dont run a network check to determine if software updates are available.
#
#  --no_symlink                    :dont symlink, just copy files instead (sets env var NO_SYMLINK=TRUE)
#
#  --monitoring                    :use collectl to monitor all steps of Trinity
#     --monitor_sec <int>          : number of seconds for each interval of runtime monitoring (default: 60)
#  
#  --no_distributed_trinity_exec   :do not run Trinity phase 2 (assembly of partitioned reads), and stop after generating command list.
#
#  --workdir <string>              :where Trinity phase-2 assembly computation takes place (defaults to --output setting).
#                                  (can set this to a node-local drive or RAM disk)     
#
####################################################
# Inchworm and K-mer counting-related options: #####
#
#  --min_kmer_cov <int>           :min count for K-mers to be assembled by
#                                  Inchworm (default: 1)
#  --inchworm_cpu <int>           :number of CPUs to use for Inchworm, default is min(6, --CPU option)
#
#  --no_run_inchworm              :stop after running jellyfish, before inchworm. (phase 1, read clustering only)
#
###################################
# Chrysalis-related options: ######
#
#  --max_reads_per_graph <int>    :maximum number of reads to anchor within
#                                  a single graph (default: 200000)
#  --min_glue <int>               :min number of reads needed to glue two inchworm contigs
#                                  together. (default: 2) 
#
#  --max_chrysalis_cluster_size <int>  :max number of Inchworm contigs to be included in a single Chrysalis cluster. (default: 25)
#
#  --no_bowtie                    :dont run bowtie to use pair info in chrysalis clustering.
#
#  --no_run_chrysalis             :stop after running inchworm, before chrysalis. (phase 1, read clustering only)
#
#####################################
###  Butterfly-related options:  ####
#
#  --bfly_algorithm <string>       : assembly algorithm to use. Options: ORIGINAL PASAFLY
#
#  --bfly_opts <string>            :additional parameters to pass through to butterfly
#                                   (see butterfly options: java -jar Butterfly.jar ).
#                                   (note: only for expert or experimental use.  Commonly used parameters are exposed through this Trinity menu here).
#
#
#  Butterfly read-pair grouping settings (used to define 'pair paths'):
#
#  --group_pairs_distance <int>    :maximum length expected between fragment pairs (default: 500)
#                                   (reads outside this distance are treated as single-end)
#
#  ///////////////////////////////////////////////
#  Butterfly default reconstruction mode settings.
#                                   
#  --path_reinforcement_distance <int>   :minimum overlap of reads with growing transcript 
#                                         path (default: PE: 25, SE: 25)
#                                         Set to 1 for the most lenient path extension requirements.
#
#
#  /////////////////////////////////////////
#  Butterfly transcript reduction settings:
#
#  --no_path_merging            : all final transcript candidates are output (including SNP variations, however, some SNPs may be unphased)  
#
#  By default, alternative transcript candidates are merged (in reality, discarded) if they are found to be too similar, according to the following logic:
#
#  (identity=(numberOfMatches/shorterLen) > 98.0% or if we have <= 2 mismatches) and if we have internal gap lengths <= 10
#
#  with parameters as:
#      
#      --min_per_id_same_path <int>          default: 98     min percent identity for two paths to be merged into single paths
#      --max_diffs_same_path <int>           default: 2      max allowed differences encountered between path sequences to combine them
#      --max_internal_gap_same_path <int>    default: 10     maximum number of internal consecutive gap characters allowed for paths to be merged into single paths.
#
#      If, in a comparison between two alternative transcripts, they are found too similar, the transcript with the greatest cumulative 
#      compatible read (pair-path) support is retained, and the other is discarded.
#
#
#  //////////////////////////////////////////////
#  Butterfly Java and parallel execution settings.
#
#  --bflyHeapSpaceMax <string>     :java max heap space setting for butterfly
#                                   (default: 10G) => yields command
#                  'java -Xmx10G -jar Butterfly.jar ... $bfly_opts'
#  --bflyHeapSpaceInit <string>    :java initial heap space settings for
#                                   butterfly (default: 1G) => yields command
#                  'java -Xms1G -jar Butterfly.jar ... $bfly_opts'
#  --bflyGCThreads <int>           :threads for garbage collection
#                                   (default: 2))
#  --bflyCPU <int>                 :CPUs to use (default will be normal 
#                                   number of CPUs; e.g., 2)
#  --bflyCalculateCPU              :Calculate CPUs based on 80% of max_memory
#                                   divided by maxbflyHeapSpaceMax
#
#  --bfly_jar <string>             : /path/to/Butterfly.jar, otherwise default
#                                    Trinity-installed version is used. 
#                                    
#
################################################################################
#### Quality Trimming Options ####  
# 
#  --quality_trimming_params <string>   defaults to: "ILLUMINACLIP:/apps/eb/Trinity/2.15.1-foss-2022a/trinityrnaseq-v2.15.1/trinity-plugins/Trimmomatic/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:25"
#
################################################################################
####  In silico Read Normalization Options ###
#
#  --normalize_max_read_cov <int>       defaults to 200 
#  --normalize_by_read_set              run normalization separate for each pair of fastq files,
#                                       then one final normalization that combines the individual normalized reads.
#                                       Consider using this if RAM limitations are a consideration.
#
#  --just_normalize_reads               stop after performing read normalization
#
#  --no_normalize_reads            :Do *not* run in silico normalization of reads. Defaults to max. read coverage of 200.
#                                       see '--normalize_max_read_cov' under full usage info for tailored settings.
#                                       (Note, as of Sept 21, 2016, normalization is on by default)
#                                       (*Turning off normalization is not recommended for most applications)
#     
#  --no_parallel_norm_stats            :Do not try to run the high-mem normalization stats generator in parallel for paired-end fastqs.
#
###############################################################################
#### Genome-guided de novo assembly
# 
#  * required:
#
# --genome_guided_max_intron <int>     :maximum allowed intron length (also maximum fragment span on genome)
#
#  * optional:
#
# --genome_guided_min_coverage <int>   :minimum read coverage for identifying and expressed region of the genome. (default: 1)
#
# --genome_guided_min_reads_per_partition <int>   :default min of 10 reads per partition
#
#
#######################################################################
# Trinity phase 2 (parallel assembly of read clusters) Options: #######
#
#  --grid_exec <string>                 :your command-line utility for submitting jobs to the grid.
#                                        This should be a command line tool that accepts a single parameter:
#                                        ${your_submission_tool} /path/to/file/containing/commands.txt
#                                        and this submission tool should exit(0) upon successful 
#                                        completion of all commands.
#
#  --grid_node_CPU <int>                number of threads for each parallel process to leverage. (default: 1)
#
#  --grid_node_max_memory <string>         max memory targeted for each grid node. (default: 1G)
#
#            The --grid_node_CPU and --grid_node_max_memory are applied as 
#              the --CPU and --max_memory parameters for the Trinity jobs run in 
#              Trinity Phase 2 (assembly of read clusters)
#
#  --FORCE                               ignore failed commands from earlier run, continue on. 
#                                          (Note, this should only be used after you've
#                                           already dealt with these failed commands directly as needed)
#
########################################################################
# Singularity-related options
#
# --singularity_img <string>         :path to a Trinity singularity image to use
#
# --singularity_extra_params <string>   :additional parameters to include for the singularity command execution
#
#

    #
#
###############################################################################
#
#  *Note, a typical Trinity command might be:
#
#        Trinity --seqType fq --max_memory 50G --left reads_1.fq  --right reads_2.fq --CPU 6
#
#            (if you have multiple samples, use --samples_file ... see above for details)
#
#    and for Genome-guided Trinity, provide a coordinate-sorted bam:
#
#        Trinity --genome_guided_bam rnaseq_alignments.csorted.bam --max_memory 50G
#                --genome_guided_max_intron 10000 --CPU 6
#
#     see: /apps/eb/Trinity/2.15.1-foss-2022a/trinityrnaseq-v2.15.1/sample_data/test_Trinity_Assembly/
#          for sample data and 'runMe.sh' for example Trinity execution
#
#     For more details, visit: http://trinityrnaseq.github.io
#
###############################################################################

Back to Top

Installation

Sources are downloaded from Trinity

System

64-bit Linux

Back to Top