HpcGridRunner-Sapelo2: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 44: Line 44:
#SBATCH --mem=3gb
#SBATCH --mem=3gb


ml hpcgridrunner/1.0.2
ml HpcGridRunner/1.0.2
ml BLAST+/2.10.1-gompi-2019b
ml BLAST+/2.10.1-gompi-2019b



Revision as of 10:39, 8 December 2020


Category

Tools

Program On

Sapelo2

Version

1.0.2

Author / Distributor

Please see https://github.com/HpcGridRunner/HpcGridRunner

Description

This tool will help in executing a file full of commands in parallel using a compute cluster.

Running Program

Also refer to Running Jobs on Sapelo2.

  • Version 1.0.2, installed in /apps/eb/HpcGridRunner/1.0.2

To use this version of HpcGridRunner, please first load the module with

ml HpcGridRunner/1.0.2

Running a blast job in parallel by splitting the fasta file: You will need to change the YOUR_QUERY_FASTA.fasta to the name of your fasta file. The cmd_template can be change to reflect your blast command. Leave the -query parameter set to __QUERY_FILE__ value and hpc_FASTA_Gridrunner will fill it dynamically. You can also change the values for outfmt, evalue, max_target_seqs and db. Do not change the num_threads value unless you use a conf file that allocates more than one cpu. The grid_conf file specified below will request one CPU core and 3GB of RAM.

#!/bin/bash
#SBATCH --partition batch
#SBATCH --ntasks=1
#SBATCH --time=48:00:00
#SBATCH --mem=3gb

ml HpcGridRunner/1.0.2
ml BLAST+/2.10.1-gompi-2019b

hpc_FASTA_GridRunner.pl --grid_conf=$HPCGR_CONF_DIR/sapelo_1c_3G.conf --cmd_template "blastp -query __QUERY_FILE__ -db /db/uniprot/latest/uniprot_sprot  -max_target_seqs 1 -outfmt 6 -evalue 1e-5 -num_threads 1" --seqs_per_bin 250 --query_fasta YOUR_QUERY_FASTA.fasta --out_dir blast_result

The blast results for the each of the split group will be in a separate file under the specified output directory. The file name will end in fa.OUT. To gather the blast output you can generally concatenate all the individual outputs. You can use the following command to find and concatenate all the output files.

find OUTPUT_DIR -name '*.fa.OUT' | xargs cat > combined_blast_results.out

Documentation

Please see http://hpcgridrunner.github.io/

Back to Top

Installation

Installed in /apps/eb/HpcGridRunner/1.0.2

System

64-bit Linux