Trinity-HpcGridRunner

From Research Computing Center Wiki
Jump to navigation Jump to search

Description

Instruction on how to modify a Trinity script to run in conjunction with HpcGridRunner

Running Program

Step 1: Create normal Trinity script

  • Note: normal Trinity jobs should be run utilizing /lscratch, however, this method running with HpcGridRunner does not benefit from /lscratch so this example will not have any /lscratch components in it. For more information on running normal Trinity jobs with /lscratch, please see here.
  • For more information on creating a Trinity job (without utilizing /lscratch) please see here.

Step 2: add a line to load HpcGridRunner module AND add the --grid_exec flag in your Trinity command (see how to format --grid exec below)

  • Your grid_exec flag should look exactly like the one below, where the only part you'll change is the location to your config.conf file after --grid_conf. Note the location of the quotation marks as they are necessary.
#!/bin/bash
#SBATCH --job-name=Trinity_HpcGridRunner
#SBATCH --partition=batch		
#SBATCH --ntasks=1			
#SBATCH --cpus-per-task=8	 	
#SBATCH --mem=200G			
#SBATCH --time=48:00:00              	
#SBATCH --output=log.%j.out		
#SBATCH --error=log.%j.err		

cd $SLURM_SUBMIT_DIR

ml Trinity/2.15.1-foss-2022a 
ml HpcGridRunner/1.0.2

Trinity --seqType <string> --max_memory 100G \
        --CPU 8 \
        --left reads.left.fq.gz \
        --right reads.right.fq.gz \
        --output /scratch/cft07037/trinity_tests/testing/${SLURM_JOB_ID}/outputs/trinity/ \
        --full_cleanup \
        --grid_exec "/apps/eb/HpcGridRunner/1.0.2/hpc_cmds_GridRunner.pl --grid_conf /scratch/path/to/your/configfile/config.conf -c"


Step 3: create config.conf file

# grid type:
grid=SLURM

# template for a grid submission:
cmd=sbatch -p batch --mem 80gb -n 1 -t 01:00:00

##########################################################################################
# settings below configure the Trinity job submission system, not tied to the grid itself.
##########################################################################################

# number of grid submissions to be maintained at steady state by the Trinity submission system
max_nodes=5

# number of commands that are batched into a single grid submission job.
cmds_per_node=3
  • The main things to change in your config.conf file are the max_nodes (how many individual grid jobs will be allowed to run at the same time) and the cmds_per_node (number of commands per grid job), which together determine how the grid will run on the cluster. You can also alter the cmd, which is the command that will be used to submit the individual grid jobs.
  • The number of grid jobs that will be submitted depends on how many recursive Trinity commands you have and what variables you choose for max_nodes and cmds_per_node.
    • Increasing the number of cmds_per_node and max_nodes will lower the total time it takes for all of the recursive Trinity commands to finish running, however it is important to note that though you may put a large number for max_nodes, the amount actually used will be determined by SLURM and ultimately depends on what resources are currently available and how many jobs you already have running. Similarly, setting a large number for the variable cmds_per_node can make the overall job complete more quickly, but then those individual jobs (submitted by the cmd variable) may need more memory.


Other Resources

https://github.com/trinityrnaseq/trinityrnaseq/wiki/Running-Trinity#optional-adapting-trinity-to-a-computing-grid-for-massively-parallel-processing-of-embarrassingly-parallel-steps