DRAM-bio-Sapelo2: Difference between revisions
(Create page for DRAM-bio software) |
m (add --threads to example command) |
||
Line 54: | Line 54: | ||
ml DRAM-bio/1.4.0<br> | ml DRAM-bio/1.4.0<br> | ||
DRAM.py annotate -i <u><seq.fasta></u> -o <u>outdir</u><br> | DRAM.py annotate -i <u><seq.fasta></u> -o <u>outdir</u> --threads <u><threads></u><br> | ||
</div> | </div> |
Latest revision as of 14:17, 7 November 2022
Category
Bioinformatics
Program On
Sapelo2
Version
1.4.0
Author / Distributor
Please see https://github.com/WrightonLabCSU/DRAM
Description
DRAM (Distilled and Refined Annotation of Metabolism) is a tool for annotating metagenomic assembled genomes and VirSorter identified viral contigs. DRAM annotates MAGs and viral contigs using KEGG (if provided by the user), UniRef90, PFAM, dbCAN, RefSeq viral, VOGDB and the MEROPS peptidase database as well as custom user databases.
The databases DRAM uses are already available on Sapelo2 in /db/DRAM_data/20222204/.
The database locations should be preset in the config file for DRAM-bio, you do not need to download them.
Running Program
- Version 1.4.0 is installed at /apps/eb/DRAM-bio/1.4.0/
To use version 1.4.0, please load the module with
ml DRAM-bio/1.4.0
Here is an example of a shell script, sub.sh, to run on the batch queue:
#!/bin/bash
#SBATCH --job-name=dram
#SBATCH --partition=batch
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=10gb
#SBATCH --time=2:00:00
#SBATCH --output=log.%j.out
#SBATCH --error=log.%j.err
#SBATCH --mail-user=username@uga.edu
#SBATCH --mail-type=ALL
cd $SLURM_SUBMIT_DIR
ml DRAM-bio/1.4.0
DRAM.py annotate -i <seq.fasta> -o outdir --threads <threads>
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
ml DRAM-bio/1.4.0 DRAM.py -h usage: DRAM.py [-h] {annotate,annotate_genes,distill,strainer,neighborhoods,merge_annotations} ... positional arguments: {annotate,annotate_genes,distill,strainer,neighborhoods,merge_annotations} annotate Annotate genomes/contigs/bins/MAGs annotate_genes Annotate already called genes, limited functionality compared to annotate distill Summarize metabolic content of annotated genomes strainer Strain annotations down to genes of interest neighborhoods Find neighborhoods around genes of interest merge_annotations Merge multiple annotations to one larger set options: -h, --help show this help message and exit DRAM.py annotate -h usage: DRAM.py annotate [-h] -i INPUT_FASTA [-o OUTPUT_DIR] [--min_contig_size MIN_CONTIG_SIZE] [--prodigal_mode {train,meta,single}] [--trans_table {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25}] [--bit_score_threshold BIT_SCORE_THRESHOLD] [--rbh_bit_score_threshold RBH_BIT_SCORE_THRESHOLD] [--kofam_use_dbcan2_thresholds] [--custom_db_name CUSTOM_DB_NAME] [--custom_fasta_loc CUSTOM_FASTA_LOC] [--custom_hmm_name CUSTOM_HMM_NAME] [--custom_hmm_loc CUSTOM_HMM_LOC] [--custom_hmm_cutoffs_loc CUSTOM_HMM_CUTOFFS_LOC] [--gtdb_taxonomy GTDB_TAXONOMY] [--checkm_quality CHECKM_QUALITY] [--use_uniref] [--use_vogdb] [--low_mem_mode] [--skip_trnascan] [--keep_tmp_dir] [--threads THREADS] [--verbose] options: -h, --help show this help message and exit -i INPUT_FASTA, --input_fasta INPUT_FASTA fasta file, optionally with wildcards to point to multiple fastas (default: None) -o OUTPUT_DIR, --output_dir OUTPUT_DIR output directory (default: None) --min_contig_size MIN_CONTIG_SIZE minimum contig size to be used for gene prediction (default: 2500) --prodigal_mode {train,meta,single} Mode of prodigal to use for gene calling. NOTE: normal or single mode require genomes which are high quality with low contamination and long contigs (average length >3 Kbp). (default: meta) --trans_table {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25} Translation table for prodigal to use for gene calling. (default: 11) --bit_score_threshold BIT_SCORE_THRESHOLD minimum bitScore of search to retain hits (default: 60) --rbh_bit_score_threshold RBH_BIT_SCORE_THRESHOLD minimum bitScore of reverse best hits to retain hits (default: 350) --kofam_use_dbcan2_thresholds Use dbcan2 suggested HMM cutoffs for KOfam annotation instead of KOfam recommended cutoffs. This will be ignored if annotating with KEGG Genes. (default: False) --custom_db_name CUSTOM_DB_NAME Names of custom databases, can be usedmultiple times. (default: None) --custom_fasta_loc CUSTOM_FASTA_LOC Location of fastas to annotate against, can be used multiple times butmust match nubmer of custom_db_name's (default: []) --custom_hmm_name CUSTOM_HMM_NAME Names of custom hmm databases, can be used multiple times. (default: []) --custom_hmm_loc CUSTOM_HMM_LOC Location of hmms to annotate against, can be used multiple times butmust match nubmer of custom_hmm_name's (default: []) --custom_hmm_cutoffs_loc CUSTOM_HMM_CUTOFFS_LOC Location of file with custom HMM cutoffs and descriptions, can be used multiple times. (default: []) --gtdb_taxonomy GTDB_TAXONOMY Summary file from gtdbtk taxonomy assignment from bins, can be used multipletimes (default: []) --checkm_quality CHECKM_QUALITY Summary of of checkM quality assessment from bins, can be used multiple times (default: []) --use_uniref Annotate these fastas against UniRef, drastically increases run time and memory requirements (default: False) --use_vogdb Annotate these fastas against VOGDB, drastically decreases run time (default: False) --low_mem_mode Skip annotating with uniref and use kofam instead of KEGG genes even if provided. Drastically decreases memory usage (default: False) --skip_trnascan --keep_tmp_dir --threads THREADS number of processors to use (default: 10) --verbose DRAM.py annotate_genes -h usage: DRAM.py annotate_genes [-h] -i INPUT_FAA [-o OUTPUT_DIR] [--bit_score_threshold BIT_SCORE_THRESHOLD] [--rbh_bit_score_threshold RBH_BIT_SCORE_THRESHOLD] [--kofam_use_dbcan2_thresholds] [--custom_db_name CUSTOM_DB_NAME] [--custom_fasta_loc CUSTOM_FASTA_LOC] [--custom_hmm_name CUSTOM_HMM_NAME] [--custom_hmm_loc CUSTOM_HMM_LOC] [--custom_hmm_cutoffs_loc CUSTOM_HMM_CUTOFFS_LOC] [--use_uniref] [--low_mem_mode] [--keep_tmp_dir] [--threads THREADS] [--verbose] options: -h, --help show this help message and exit -i INPUT_FAA, --input_faa INPUT_FAA fasta file, optionally with wildcards to point to individual MAGs (default: None) -o OUTPUT_DIR, --output_dir OUTPUT_DIR output directory (default: None) --bit_score_threshold BIT_SCORE_THRESHOLD minimum bitScore of search to retain hits (default: 60) --rbh_bit_score_threshold RBH_BIT_SCORE_THRESHOLD minimum bitScore of reverse best hits to retain hits (default: 350) --kofam_use_dbcan2_thresholds Use dbcan2 suggested HMM cutoffs for KOfam annotation instead of KOfam recommended cutoffs. This will be ignored if annotating with KEGG Genes. (default: False) --custom_db_name CUSTOM_DB_NAME Names of custom databases, can be used multiple times. (default: []) --custom_fasta_loc CUSTOM_FASTA_LOC Location of fastas to annotate against, can be used multiple times butmust match nubmer of custom_db_name's (default: []) --custom_hmm_name CUSTOM_HMM_NAME Names of custom hmm databases, can be used multiple times. (default: []) --custom_hmm_loc CUSTOM_HMM_LOC Location of hmms to annotate against, can be used multiple times butmust match nubmer of custom_hmm_name's (default: []) --custom_hmm_cutoffs_loc CUSTOM_HMM_CUTOFFS_LOC Location of file with custom HMM cutoffs and descriptions, can be used multiple times. (default: []) --use_uniref Annotate these fastas against UniRef, drastically increases run time and memory requirements (default: False) --low_mem_mode Skip annotating with uniref and use kofam instead of KEGG genes even if provided. Drastically decreases memory usage (default: False) --keep_tmp_dir --threads THREADS number of processors to use (default: 10) --verbose DRAM.py distill -h usage: DRAM.py distill [-h] [-i INPUT_FILE] [-o OUTPUT_DIR] [--rrna_path RRNA_PATH] [--trna_path TRNA_PATH] [--groupby_column GROUPBY_COLUMN] [--custom_distillate CUSTOM_DISTILLATE] [--distillate_gene_names] [--genomes_per_product GENOMES_PER_PRODUCT] options: -h, --help show this help message and exit -i INPUT_FILE, --input_file INPUT_FILE Annotations path (default: None) -o OUTPUT_DIR, --output_dir OUTPUT_DIR Directory to write summarized genomes (default: None) --rrna_path RRNA_PATH rRNA output from annotation (default: None) --trna_path TRNA_PATH tRNA output from annotation (default: None) --groupby_column GROUPBY_COLUMN Column from annotations to group as organism units (default: fasta) --custom_distillate CUSTOM_DISTILLATE Custom distillate form to add your own modules (default: None) --distillate_gene_names Give names of genes instead of counts in genome metabolism summary (default: False) --genomes_per_product GENOMES_PER_PRODUCT Number of genomes per product.html output. Decrease value if getting JavaScript Error: Maximum call stack size exceeded when viewing product.html in browser. (default: 1000) DRAM.py strainer -h usage: DRAM.py strainer [-h] -i INPUT_ANNOTATIONS -f INPUT_FASTA [-o OUTPUT_FASTA] [--fastas [FASTAS ...]] [--scaffolds [SCAFFOLDS ...]] [--genes [GENES ...]] [--identifiers [IDENTIFIERS ...]] [--categories [CATEGORIES ...]] [--custom_distillate CUSTOM_DISTILLATE] [--taxonomy [TAXONOMY ...]] [--completeness COMPLETENESS] [--contamination CONTAMINATION] options: -h, --help show this help message and exit Input and output files: -i INPUT_ANNOTATIONS, --input_annotations INPUT_ANNOTATIONS annotations file to pull genes from (default: None) -f INPUT_FASTA, --input_fasta INPUT_FASTA fasta file to filter (default: None) -o OUTPUT_FASTA, --output_fasta OUTPUT_FASTA location to write filtered fasta (default: pull_genes.fasta) Specific names to keep: --fastas [FASTAS ...] space separated list of fastas to keep (default: None) --scaffolds [SCAFFOLDS ...] space separated list of scaffolds to keep (default: None) --genes [GENES ...] space separated list of genes to keep (default: None) Annotation filters: --identifiers [IDENTIFIERS ...] database identifiers to keep (default: None) --categories [CATEGORIES ...] distillate categories to keep genes from (default: None) --custom_distillate CUSTOM_DISTILLATE Custom distillate form to add your own modules (default: None) DRAM based filters: --taxonomy [TAXONOMY ...] Level of GTDBTk taxonomy to keep (e.g. c__Clostridia), space separated list (default: None) --completeness COMPLETENESS Minimum completeness of genome to keep genes (default: None) --contamination CONTAMINATION Maximum contamination of genome to keep genes (default: None) DRAM.py neighborhoods -h usage: DRAM.py neighborhoods [-h] [-i INPUT_FILE] [-o OUTPUT_DIR] [--genes [GENES ...]] [--identifiers [IDENTIFIERS ...]] [--custom_distillate CUSTOM_DISTILLATE] [--categories CATEGORIES] [--genes_loc GENES_LOC] [--scaffolds_loc SCAFFOLDS_LOC] [--distance_genes DISTANCE_GENES] [--distance_bp DISTANCE_BP] options: -h, --help show this help message and exit -i INPUT_FILE, --input_file INPUT_FILE Annotations path (default: None) -o OUTPUT_DIR, --output_dir OUTPUT_DIR Directory to write gene neighborhoods (default: None) --genes [GENES ...] Gene names from DRAM to find neighborhoods around (default: None) --identifiers [IDENTIFIERS ...] Database identifiers assigned by DRAM to find neighborhoods around (default: None) --custom_distillate CUSTOM_DISTILLATE Custom distillate form to add your own modules (default: None) --categories CATEGORIES Distillate categories to build gene neighborhoods around. (default: None) --genes_loc GENES_LOC Location of genes.fna/genes.faa file to filter to neighborhoods (default: None) --scaffolds_loc SCAFFOLDS_LOC Location of scaffolds.fna file to filter to neighborhoods (default: None) --distance_genes DISTANCE_GENES Number of genes away from center to include in neighborhoods (default: None) --distance_bp DISTANCE_BP Number of genes away from center to include in neighborhoods (default: None) DRAM.py merge_annotations -h usage: DRAM.py merge_annotations [-h] [-i INPUT_DIRS] [-o OUTPUT_DIR] options: -h, --help show this help message and exit -i INPUT_DIRS, --input_dirs INPUT_DIRS Path with wildcards pointing to DRAM annotation output directories (default: None) -o OUTPUT_DIR, --output_dir OUTPUT_DIR Path to output merged annotations files (default: None)