DRAM-bio-Sapelo2

From Research Computing Center Wiki
Revision as of 14:17, 7 November 2022 by Karen (talk | contribs) (add --threads to example command)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Category

Bioinformatics

Program On

Sapelo2

Version

1.4.0

Author / Distributor

Please see https://github.com/WrightonLabCSU/DRAM

Description

DRAM (Distilled and Refined Annotation of Metabolism) is a tool for annotating metagenomic assembled genomes and VirSorter identified viral contigs. DRAM annotates MAGs and viral contigs using KEGG (if provided by the user), UniRef90, PFAM, dbCAN, RefSeq viral, VOGDB and the MEROPS peptidase database as well as custom user databases.


The databases DRAM uses are already available on Sapelo2 in /db/DRAM_data/20222204/.


The database locations should be preset in the config file for DRAM-bio, you do not need to download them.

Running Program

  • Version 1.4.0 is installed at /apps/eb/DRAM-bio/1.4.0/

To use version 1.4.0, please load the module with

ml DRAM-bio/1.4.0


Here is an example of a shell script, sub.sh, to run on the batch queue:

#!/bin/bash
#SBATCH --job-name=dram
#SBATCH --partition=batch
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=10gb
#SBATCH --time=2:00:00
#SBATCH --output=log.%j.out
#SBATCH --error=log.%j.err
#SBATCH --mail-user=username@uga.edu
#SBATCH --mail-type=ALL


cd $SLURM_SUBMIT_DIR

ml DRAM-bio/1.4.0

DRAM.py annotate -i <seq.fasta> -o outdir --threads <threads>

In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.


Here is an example of job submission command:

sbatch ./sub.sh 

Documentation

ml DRAM-bio/1.4.0
DRAM.py -h
usage: DRAM.py [-h] {annotate,annotate_genes,distill,strainer,neighborhoods,merge_annotations} ...

positional arguments:
  {annotate,annotate_genes,distill,strainer,neighborhoods,merge_annotations}
    annotate            Annotate genomes/contigs/bins/MAGs
    annotate_genes      Annotate already called genes, limited functionality compared to annotate
    distill             Summarize metabolic content of annotated genomes
    strainer            Strain annotations down to genes of interest
    neighborhoods       Find neighborhoods around genes of interest
    merge_annotations   Merge multiple annotations to one larger set

options:
  -h, --help            show this help message and exit



DRAM.py annotate -h
usage: DRAM.py annotate [-h] -i INPUT_FASTA [-o OUTPUT_DIR] [--min_contig_size MIN_CONTIG_SIZE] [--prodigal_mode {train,meta,single}] [--trans_table {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25}] [--bit_score_threshold BIT_SCORE_THRESHOLD]
                        [--rbh_bit_score_threshold RBH_BIT_SCORE_THRESHOLD] [--kofam_use_dbcan2_thresholds] [--custom_db_name CUSTOM_DB_NAME] [--custom_fasta_loc CUSTOM_FASTA_LOC] [--custom_hmm_name CUSTOM_HMM_NAME] [--custom_hmm_loc CUSTOM_HMM_LOC]
                        [--custom_hmm_cutoffs_loc CUSTOM_HMM_CUTOFFS_LOC] [--gtdb_taxonomy GTDB_TAXONOMY] [--checkm_quality CHECKM_QUALITY] [--use_uniref] [--use_vogdb] [--low_mem_mode] [--skip_trnascan] [--keep_tmp_dir] [--threads THREADS] [--verbose]

options:
  -h, --help            show this help message and exit
  -i INPUT_FASTA, --input_fasta INPUT_FASTA
                        fasta file, optionally with wildcards to point to multiple fastas (default: None)
  -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                        output directory (default: None)
  --min_contig_size MIN_CONTIG_SIZE
                        minimum contig size to be used for gene prediction (default: 2500)
  --prodigal_mode {train,meta,single}
                        Mode of prodigal to use for gene calling. NOTE: normal or single mode require genomes which are high quality with low contamination and long contigs (average length >3 Kbp). (default: meta)
  --trans_table {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25}
                        Translation table for prodigal to use for gene calling. (default: 11)
  --bit_score_threshold BIT_SCORE_THRESHOLD
                        minimum bitScore of search to retain hits (default: 60)
  --rbh_bit_score_threshold RBH_BIT_SCORE_THRESHOLD
                        minimum bitScore of reverse best hits to retain hits (default: 350)
  --kofam_use_dbcan2_thresholds
                        Use dbcan2 suggested HMM cutoffs for KOfam annotation instead of KOfam recommended cutoffs. This will be ignored if annotating with KEGG Genes. (default: False)
  --custom_db_name CUSTOM_DB_NAME
                        Names of custom databases, can be usedmultiple times. (default: None)
  --custom_fasta_loc CUSTOM_FASTA_LOC
                        Location of fastas to annotate against, can be used multiple times butmust match nubmer of custom_db_name's (default: [])
  --custom_hmm_name CUSTOM_HMM_NAME
                        Names of custom hmm databases, can be used multiple times. (default: [])
  --custom_hmm_loc CUSTOM_HMM_LOC
                        Location of hmms to annotate against, can be used multiple times butmust match nubmer of custom_hmm_name's (default: [])
  --custom_hmm_cutoffs_loc CUSTOM_HMM_CUTOFFS_LOC
                        Location of file with custom HMM cutoffs and descriptions, can be used multiple times. (default: [])
  --gtdb_taxonomy GTDB_TAXONOMY
                        Summary file from gtdbtk taxonomy assignment from bins, can be used multipletimes (default: [])
  --checkm_quality CHECKM_QUALITY
                        Summary of of checkM quality assessment from bins, can be used multiple times (default: [])
  --use_uniref          Annotate these fastas against UniRef, drastically increases run time and memory requirements (default: False)
  --use_vogdb           Annotate these fastas against VOGDB, drastically decreases run time (default: False)
  --low_mem_mode        Skip annotating with uniref and use kofam instead of KEGG genes even if provided. Drastically decreases memory usage (default: False)
  --skip_trnascan
  --keep_tmp_dir
  --threads THREADS     number of processors to use (default: 10)
  --verbose



DRAM.py annotate_genes -h
usage: DRAM.py annotate_genes [-h] -i INPUT_FAA [-o OUTPUT_DIR] [--bit_score_threshold BIT_SCORE_THRESHOLD] [--rbh_bit_score_threshold RBH_BIT_SCORE_THRESHOLD] [--kofam_use_dbcan2_thresholds] [--custom_db_name CUSTOM_DB_NAME] [--custom_fasta_loc CUSTOM_FASTA_LOC]
                              [--custom_hmm_name CUSTOM_HMM_NAME] [--custom_hmm_loc CUSTOM_HMM_LOC] [--custom_hmm_cutoffs_loc CUSTOM_HMM_CUTOFFS_LOC] [--use_uniref] [--low_mem_mode] [--keep_tmp_dir] [--threads THREADS] [--verbose]

options:
  -h, --help            show this help message and exit
  -i INPUT_FAA, --input_faa INPUT_FAA
                        fasta file, optionally with wildcards to point to individual MAGs (default: None)
  -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                        output directory (default: None)
  --bit_score_threshold BIT_SCORE_THRESHOLD
                        minimum bitScore of search to retain hits (default: 60)
  --rbh_bit_score_threshold RBH_BIT_SCORE_THRESHOLD
                        minimum bitScore of reverse best hits to retain hits (default: 350)
  --kofam_use_dbcan2_thresholds
                        Use dbcan2 suggested HMM cutoffs for KOfam annotation instead of KOfam recommended cutoffs. This will be ignored if annotating with KEGG Genes. (default: False)
  --custom_db_name CUSTOM_DB_NAME
                        Names of custom databases, can be used multiple times. (default: [])
  --custom_fasta_loc CUSTOM_FASTA_LOC
                        Location of fastas to annotate against, can be used multiple times butmust match nubmer of custom_db_name's (default: [])
  --custom_hmm_name CUSTOM_HMM_NAME
                        Names of custom hmm databases, can be used multiple times. (default: [])
  --custom_hmm_loc CUSTOM_HMM_LOC
                        Location of hmms to annotate against, can be used multiple times butmust match nubmer of custom_hmm_name's (default: [])
  --custom_hmm_cutoffs_loc CUSTOM_HMM_CUTOFFS_LOC
                        Location of file with custom HMM cutoffs and descriptions, can be used multiple times. (default: [])
  --use_uniref          Annotate these fastas against UniRef, drastically increases run time and memory requirements (default: False)
  --low_mem_mode        Skip annotating with uniref and use kofam instead of KEGG genes even if provided. Drastically decreases memory usage (default: False)
  --keep_tmp_dir
  --threads THREADS     number of processors to use (default: 10)
  --verbose



DRAM.py distill -h
usage: DRAM.py distill [-h] [-i INPUT_FILE] [-o OUTPUT_DIR] [--rrna_path RRNA_PATH] [--trna_path TRNA_PATH] [--groupby_column GROUPBY_COLUMN] [--custom_distillate CUSTOM_DISTILLATE] [--distillate_gene_names] [--genomes_per_product GENOMES_PER_PRODUCT]

options:
  -h, --help            show this help message and exit
  -i INPUT_FILE, --input_file INPUT_FILE
                        Annotations path (default: None)
  -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                        Directory to write summarized genomes (default: None)
  --rrna_path RRNA_PATH
                        rRNA output from annotation (default: None)
  --trna_path TRNA_PATH
                        tRNA output from annotation (default: None)
  --groupby_column GROUPBY_COLUMN
                        Column from annotations to group as organism units (default: fasta)
  --custom_distillate CUSTOM_DISTILLATE
                        Custom distillate form to add your own modules (default: None)
  --distillate_gene_names
                        Give names of genes instead of counts in genome metabolism summary (default: False)
  --genomes_per_product GENOMES_PER_PRODUCT
                        Number of genomes per product.html output. Decrease value if getting JavaScript Error: Maximum call stack size exceeded when viewing product.html in browser. (default: 1000)



DRAM.py strainer -h
usage: DRAM.py strainer [-h] -i INPUT_ANNOTATIONS -f INPUT_FASTA [-o OUTPUT_FASTA] [--fastas [FASTAS ...]] [--scaffolds [SCAFFOLDS ...]] [--genes [GENES ...]] [--identifiers [IDENTIFIERS ...]] [--categories [CATEGORIES ...]]
                        [--custom_distillate CUSTOM_DISTILLATE] [--taxonomy [TAXONOMY ...]] [--completeness COMPLETENESS] [--contamination CONTAMINATION]

options:
  -h, --help            show this help message and exit

Input and output files:
  -i INPUT_ANNOTATIONS, --input_annotations INPUT_ANNOTATIONS
                        annotations file to pull genes from (default: None)
  -f INPUT_FASTA, --input_fasta INPUT_FASTA
                        fasta file to filter (default: None)
  -o OUTPUT_FASTA, --output_fasta OUTPUT_FASTA
                        location to write filtered fasta (default: pull_genes.fasta)

Specific names to keep:
  --fastas [FASTAS ...]
                        space separated list of fastas to keep (default: None)
  --scaffolds [SCAFFOLDS ...]
                        space separated list of scaffolds to keep (default: None)
  --genes [GENES ...]   space separated list of genes to keep (default: None)

Annotation filters:
  --identifiers [IDENTIFIERS ...]
                        database identifiers to keep (default: None)
  --categories [CATEGORIES ...]
                        distillate categories to keep genes from (default: None)
  --custom_distillate CUSTOM_DISTILLATE
                        Custom distillate form to add your own modules (default: None)

DRAM based filters:
  --taxonomy [TAXONOMY ...]
                        Level of GTDBTk taxonomy to keep (e.g. c__Clostridia), space separated list (default: None)
  --completeness COMPLETENESS
                        Minimum completeness of genome to keep genes (default: None)
  --contamination CONTAMINATION
                        Maximum contamination of genome to keep genes (default: None)



DRAM.py neighborhoods -h
usage: DRAM.py neighborhoods [-h] [-i INPUT_FILE] [-o OUTPUT_DIR] [--genes [GENES ...]] [--identifiers [IDENTIFIERS ...]] [--custom_distillate CUSTOM_DISTILLATE] [--categories CATEGORIES] [--genes_loc GENES_LOC] [--scaffolds_loc SCAFFOLDS_LOC]
                             [--distance_genes DISTANCE_GENES] [--distance_bp DISTANCE_BP]

options:
  -h, --help            show this help message and exit
  -i INPUT_FILE, --input_file INPUT_FILE
                        Annotations path (default: None)
  -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                        Directory to write gene neighborhoods (default: None)
  --genes [GENES ...]   Gene names from DRAM to find neighborhoods around (default: None)
  --identifiers [IDENTIFIERS ...]
                        Database identifiers assigned by DRAM to find neighborhoods around (default: None)
  --custom_distillate CUSTOM_DISTILLATE
                        Custom distillate form to add your own modules (default: None)
  --categories CATEGORIES
                        Distillate categories to build gene neighborhoods around. (default: None)
  --genes_loc GENES_LOC
                        Location of genes.fna/genes.faa file to filter to neighborhoods (default: None)
  --scaffolds_loc SCAFFOLDS_LOC
                        Location of scaffolds.fna file to filter to neighborhoods (default: None)
  --distance_genes DISTANCE_GENES
                        Number of genes away from center to include in neighborhoods (default: None)
  --distance_bp DISTANCE_BP
                        Number of genes away from center to include in neighborhoods (default: None)



DRAM.py merge_annotations -h
usage: DRAM.py merge_annotations [-h] [-i INPUT_DIRS] [-o OUTPUT_DIR]

options:
  -h, --help            show this help message and exit
  -i INPUT_DIRS, --input_dirs INPUT_DIRS
                        Path with wildcards pointing to DRAM annotation output directories (default: None)
  -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                        Path to output merged annotations files (default: None)