Harvest-Teaching

From Research Computing Center Wiki
Revision as of 13:03, 12 November 2018 by Yhuang (talk | contribs) (Created page with "Category:TeachingCategory:SoftwareCategory:Bioinformatics === Category === Bioinformatics === Program On === Teaching === Version === 1.1.2 === A...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Category

Bioinformatics

Program On

Teaching

Version

1.1.2

Author / Distributor

Harvest

Description

" Harvest is a suite of core-genome alignment and visualization tools for quickly analyzing thousands of intraspecific microbial genomes." More details are at Harvest

Running Program

The last version of this application is at /usr/local/apps/gb/Harvest/1.1.2

To use this version, please load the module with

ml Harvest/1.1.2 

Here is an example of a shell script, sub.sh, to run on the batch queue:

#!/bin/bash
#SBATCH --job-name=j_Harvest
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=Harvest.%j.out
#SBATCH --error=Harvest.%j.err

cd $SLURM_SUBMIT_DIR
ml Harvest/1.1.2
parsnp [options]

In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.

Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.


Here is an example of job submission command:

sbatch ./sub.sh 

Documentation

ml Harvest/1.1.2 
parsnp  -h
|--Parsnp v1.2--|
For detailed documentation please see --> http://harvest.readthedocs.org/en/latest
usage: parsnp [options] [-g|-r|-q](see below) -d <genome_dir> -p <threads>

Parsnp quick start for three example scenarios: 
1) With reference & genbank file: 
 >parsnp -g <reference_genbank_file1,reference_genbank_file2,..> -d <genome_dir> -p <threads> 

2) With reference but without genbank file:
 >parsnp -r <reference_genome> -d <genome_dir> -p <threads> 

3) Autorecruit reference to a draft assembly:
 >parsnp -q <draft_assembly> -d <genome_db> -p <threads> 

[Input parameters]
<<input/output>>
 -c = <flag>: (c)urated genome directory, use all genomes in dir and ignore MUMi? (default = NO)
 -d = <path>: (d)irectory containing genomes/contigs/scaffolds
 -r = <path>: (r)eference genome (set to ! to pick random one from genome dir)
 -g = <string>: Gen(b)ank file(s) (gbk), comma separated list (default = None)
 -o = <string>: output directory? default [./P_CURRDATE_CURRTIME]
 -q = <path>: (optional) specify (assembled) query genome to use, in addition to genomes found in genome dir (default = NONE)

<<MUMi>>
 -U = <float>: max MUMi distance value for MUMi distribution 
 -M = <flag>: calculate MUMi and exit? overrides all other choices! (default: NO)
 -i = <float>: max MUM(i) distance (default: autocutoff based on distribution of MUMi values)

<<MUM search>>
 -a = <int>: min (a)NCHOR length (default = 1.1*Log(S))
 -C = <int>: maximal cluster D value? (default=100)
 -z = <path>: min LCB si(z)e? (default = 25)

<<LCB alignment>>
 -D = <float>: maximal diagonal difference? Either percentage (e.g. 0.2) or bp (e.g. 100bp) (default = 0.12)
 -e = <flag> greedily extend LCBs? experimental! (default = NO)
 -n = <string>: alignment program (default: libMUSCLE)
 -u = <flag>: output unaligned regions? .unaligned (default: NO)

<<Recombination filtration>>
 -x = <flag>: enable filtering of SNPs located in PhiPack identified regions of recombination? (default: NO)

<<Misc>>
 -h = <flag>: (h)elp: print this message and exit
 -p = <int>: number of threads to use? (default= 1)
 -P = <int>: max partition size? limits memory usage (default= 15000000)
 -v = <flag>: (v)erbose output? (default = NO)
 -V = <flag>: output (V)ersion and exit

Back to Top

Installation

Source code is obtained from Harvest

System

64-bit Linux