BLAST+-Sapelo2: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
[[Category: | [[Category:Sapelo2]][[Category:Software]][[Category:Bioinformatics]] | ||
=== Category === | === Category === | ||
Revision as of 12:53, 8 April 2022
Category
Bioinformatics
Program On
Sapelo2
Version
2.2.31, 2.9.0,2.10.1,2.11.0,2.12.0
Author / Distributor
Description
Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. More information: http://blast.ncbi.nlm.nih.gov/
Running Program
Also refer to Running Jobs on Sapelo2
For more information on Environment Modules on Sapelo please see the Lmod page. To find all versions of BLAST+ installed on Sapelo2, please use module spider command to do a search, as shown below:
module spider BLAST+
NCBI databases are available at Sapelo2, please refer to Bioinformatics_Databases for more details. Datasets are located in the commonly shared "/db" filesystem. NCBI BLAST datasets are pre-formatted to work with BLAST and BLAST+ and are located in "/db/ncbiblast/" and are organized by date.
To use BLAST+ version 2.12.0, please first load the module, for example:
module load BLAST+/2.12.0-gompi-2020b
Example of a shell script sub.sh to run blastn on the batch partition:
#!/bin/bash #SBATCH --job-name=j_BLAST+ #SBATCH --partition=batch #SBATCH --mail-type=ALL #SBATCH --mail-user=username@uga.edu #SBATCH --ntasks=1 #SBATCH --mem=10gb #SBATCH --time=08:00:00 #SBATCH --output=BLAST+.%j.out #SBATCH --error=BLAST+.%j.err cd $SLURM_SUBMIT_DIR ml BLAST+/2.12.0-gompi-2020b ml ncbiblastdb/20220404 blastn [options]
where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of nodes and cores per node, and the job name need to be modified appropriately as well.
Submit the job to the queue with
sbatch ./sub.sh
Running BLAST+ with multiple threads
Some BLAST+ commands, such as blastn, have the option to use multiple threads with the -num_threads option. Each thread should be run on one core, so the number of cpus you should request should be equal to -num_threads + 1. This accounts for the number of worker threads plus one main process thread.
To request cores, use the slurm header #SBATCH --cpus-per-task.
Below is an example, with -num_threads 8
#!/bin/bash #SBATCH --job-name=j_BLAST+_multithread #SBATCH --partition=batch #SBATCH --mail-type=ALL #SBATCH --mail-user=username@uga.edu #SBATCH --ntasks=1 #SBATCH --cpus-per-task=9 #SBATCH --mem=10gb #SBATCH --time=08:00:00 #SBATCH --output=BLAST+.%j.out #SBATCH --error=BLAST+.%j.err cd $SLURM_SUBMIT_DIR ml BLAST+/2.12.0-gompi-2020b ml ncbiblastdb/20220404 blastn -num_threads 8 -query example.fasta -out results.out -db nt
Documentation
jump to blastn; jump to blastp; jump to blastx; jump to makeblastdb; Back to Top
module load BLAST+/2.7.1-foss-2016b-Python-2.7.14 blastn -help USAGE blastn [-h] [-help] [-import_search_strategy filename] [-export_search_strategy filename] [-task task_name] [-db database_name] [-dbsize num_letters] [-gilist filename] [-seqidlist filename] [-negative_gilist filename] [-negative_seqidlist filename] [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm] [-subject subject_input_file] [-subject_loc range] [-query input_file] [-out output_file] [-evalue evalue] [-word_size int_value] [-gapopen open_penalty] [-gapextend extend_penalty] [-perc_identity float_value] [-qcov_hsp_perc float_value] [-max_hsps int_value] [-xdrop_ungap float_value] [-xdrop_gap float_value] [-xdrop_gap_final float_value] [-searchsp int_value] [-sum_stats bool_value] [-penalty penalty] [-reward reward] [-no_greedy] [-min_raw_gapped_score int_value] [-template_type type] [-template_length int_value] [-dust DUST_options] [-filtering_db filtering_database] [-window_masker_taxid window_masker_taxid] [-window_masker_db window_masker_db] [-soft_masking soft_masking] [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value] [-best_hit_score_edge float_value] [-window_size int_value] [-off_diagonal_range int_value] [-use_index boolean] [-index_name string] [-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines] [-outfmt format] [-show_gis] [-num_descriptions int_value] [-num_alignments int_value] [-line_length line_length] [-html] [-max_target_seqs num_sequences] [-num_threads int_value] [-remote] [-version] DESCRIPTION Nucleotide-Nucleotide BLAST 2.7.1+ OPTIONAL ARGUMENTS -h Print USAGE and DESCRIPTION; ignore all other parameters -help Print USAGE, DESCRIPTION and ARGUMENTS; ignore all other parameters -version Print version number; ignore other arguments *** Input query options -query <File_In> Input file name Default = `-' -query_loc <String> Location on the query sequence in 1-based offsets (Format: start-stop) -strand <String, `both', `minus', `plus'> Query strand(s) to search against database/subject Default = `both' *** General search options -task <String, Permissible values: 'blastn' 'blastn-short' 'dc-megablast' 'megablast' 'rmblastn' > Task to execute Default = `megablast' -db <String> BLAST database name * Incompatible with: subject, subject_loc -out <File_Out> Output file name Default = `-' -evalue <Real> Expectation value (E) threshold for saving hits Default = `10' -word_size <Integer, >=4> Word size for wordfinder algorithm (length of best perfect match) -gapopen <Integer> Cost to open a gap -gapextend <Integer> Cost to extend a gap -penalty <Integer, <=0> Penalty for a nucleotide mismatch -reward <Integer, >=0> Reward for a nucleotide match -use_index <Boolean> Use MegaBLAST database index Default = `false' -index_name <String> MegaBLAST database index name (deprecated; use only for old style indices) *** BLAST-2-Sequences options -subject <File_In> Subject sequence(s) to search * Incompatible with: db, gilist, seqidlist, negative_gilist, negative_seqidlist, db_soft_mask, db_hard_mask -subject_loc <String> Location on the subject sequence in 1-based offsets (Format: start-stop) * Incompatible with: db, gilist, seqidlist, negative_gilist, negative_seqidlist, db_soft_mask, db_hard_mask, remote *** Formatting options -outfmt <String> alignment view options: 0 = Pairwise, 1 = Query-anchored showing identities, 2 = Query-anchored no identities, 3 = Flat query-anchored showing identities, 4 = Flat query-anchored no identities, 5 = BLAST XML, 6 = Tabular, 7 = Tabular with comment lines, 8 = Seqalign (Text ASN.1), 9 = Seqalign (Binary ASN.1), 10 = Comma-separated values, 11 = BLAST archive (ASN.1), 12 = Seqalign (JSON), 13 = Multiple-file BLAST JSON, 14 = Multiple-file BLAST XML2, 15 = Single-file BLAST JSON, 16 = Single-file BLAST XML2, 17 = Sequence Alignment/Map (SAM), 18 = Organism Report Options 6, 7, 10 and 17 can be additionally configured to produce a custom format specified by space delimited format specifiers. The supported format specifiers for options 6, 7 and 10 are: qseqid means Query Seq-id qgi means Query GI qacc means Query accesion qaccver means Query accesion.version qlen means Query sequence length sseqid means Subject Seq-id sallseqid means All subject Seq-id(s), separated by a ';' sgi means Subject GI sallgi means All subject GIs sacc means Subject accession saccver means Subject accession.version sallacc means All subject accessions slen means Subject sequence length qstart means Start of alignment in query qend means End of alignment in query sstart means Start of alignment in subject send means End of alignment in subject qseq means Aligned part of query sequence sseq means Aligned part of subject sequence evalue means Expect value bitscore means Bit score score means Raw score length means Alignment length pident means Percentage of identical matches nident means Number of identical matches mismatch means Number of mismatches positive means Number of positive-scoring matches gapopen means Number of gap openings gaps means Total number of gaps ppos means Percentage of positive-scoring matches frames means Query and subject frames separated by a '/' qframe means Query frame sframe means Subject frame btop means Blast traceback operations (BTOP) staxid means Subject Taxonomy ID ssciname means Subject Scientific Name scomname means Subject Common Name sblastname means Subject Blast Name sskingdom means Subject Super Kingdom staxids means unique Subject Taxonomy ID(s), separated by a ';' (in numerical order) sscinames means unique Subject Scientific Name(s), separated by a ';' scomnames means unique Subject Common Name(s), separated by a ';' sblastnames means unique Subject Blast Name(s), separated by a ';' (in alphabetical order) sskingdoms means unique Subject Super Kingdom(s), separated by a ';' (in alphabetical order) stitle means Subject Title salltitles means All Subject Title(s), separated by a '<>' sstrand means Subject Strand qcovs means Query Coverage Per Subject qcovhsp means Query Coverage Per HSP qcovus means Query Coverage Per Unique Subject (blastn only) When not provided, the default value is: 'qaccver saccver pident length mismatch gapopen qstart qend sstart send evalue bitscore', which is equivalent to the keyword 'std' The supported format specifier for option 17 is: SQ means Include Sequence Data SR means Subject as Reference Seq Default = `0' -show_gis Show NCBI GIs in deflines? -num_descriptions <Integer, >=0> Number of database sequences to show one-line descriptions for Not applicable for outfmt > 4 Default = `500' * Incompatible with: max_target_seqs -num_alignments <Integer, >=0> Number of database sequences to show alignments for Default = `250' * Incompatible with: max_target_seqs -line_length <Integer, >=1> Line length for formatting alignments Not applicable for outfmt > 4 Default = `60' -html Produce HTML output? *** Query filtering options -dust <String> Filter query sequence with DUST (Format: 'yes', 'level window linker', or 'no' to disable) Default = `20 64 1' -filtering_db <String> BLAST database containing filtering elements (i.e.: repeats) -window_masker_taxid <Integer> Enable WindowMasker filtering using a Taxonomic ID -window_masker_db <String> Enable WindowMasker filtering using this repeats database. -soft_masking <Boolean> Apply filtering locations as soft masks Default = `true' -lcase_masking Use lower case filtering in query and subject sequence(s)? *** Restrict search or results -gilist <String> Restrict search of database to list of GI's * Incompatible with: negative_gilist, seqidlist, negative_seqidlist, remote, subject, subject_loc -seqidlist <String> Restrict search of database to list of SeqId's * Incompatible with: gilist, negative_gilist, negative_seqidlist, remote, subject, subject_loc -negative_gilist <String> Restrict search of database to everything except the listed GIs * Incompatible with: gilist, seqidlist, remote, subject, subject_loc -negative_seqidlist <String> Restrict search of database to everything except the listed SeqIDs * Incompatible with: gilist, seqidlist, remote, subject, subject_loc -entrez_query <String> Restrict search with the given Entrez query * Requires: remote -db_soft_mask <String> Filtering algorithm ID to apply to the BLAST database as soft masking * Incompatible with: db_hard_mask, subject, subject_loc -db_hard_mask <String> Filtering algorithm ID to apply to the BLAST database as hard masking * Incompatible with: db_soft_mask, subject, subject_loc -perc_identity <Real, 0..100> Percent identity -qcov_hsp_perc <Real, 0..100> Percent query coverage per hsp -max_hsps <Integer, >=1> Set maximum number of HSPs per subject sequence to save for each query -culling_limit <Integer, >=0> If the query range of a hit is enveloped by that of at least this many higher-scoring hits, delete the hit * Incompatible with: best_hit_overhang, best_hit_score_edge -best_hit_overhang <Real, (>0 and <0.5)> Best Hit algorithm overhang value (recommended value: 0.1) * Incompatible with: culling_limit -best_hit_score_edge <Real, (>0 and <0.5)> Best Hit algorithm score edge value (recommended value: 0.1) * Incompatible with: culling_limit -max_target_seqs <Integer, >=1> Maximum number of aligned sequences to keep Not applicable for outfmt <= 4 Default = `500' * Incompatible with: num_descriptions, num_alignments *** Discontiguous MegaBLAST options -template_type <String, `coding', `coding_and_optimal', `optimal'> Discontiguous MegaBLAST template type * Requires: template_length -template_length <Integer, Permissible values: '16' '18' '21' > Discontiguous MegaBLAST template length * Requires: template_type *** Statistical options -dbsize <Int8> Effective length of the database -searchsp <Int8, >=0> Effective length of the search space -sum_stats <Boolean> Use sum statistics *** Search strategy options -import_search_strategy <File_In> Search strategy to use * Incompatible with: export_search_strategy -export_search_strategy <File_Out> File name to record the search strategy used * Incompatible with: import_search_strategy *** Extension options -xdrop_ungap <Real> X-dropoff value (in bits) for ungapped extensions -xdrop_gap <Real> X-dropoff value (in bits) for preliminary gapped extensions -xdrop_gap_final <Real> X-dropoff value (in bits) for final gapped alignment -no_greedy Use non-greedy dynamic programming extension -min_raw_gapped_score <Integer> Minimum raw gapped score to keep an alignment in the preliminary gapped and traceback stages -ungapped Perform ungapped alignment only? -window_size <Integer, >=0> Multiple hits window size, use 0 to specify 1-hit algorithm -off_diagonal_range <Integer, >=0> Number of off-diagonals to search for the 2nd hit, use 0 to turn off Default = `0' *** Miscellaneous options -parse_deflines Should the query and subject defline(s) be parsed? -num_threads <Integer, (>=1 and =<48)> Number of threads (CPUs) to use in the BLAST search Default = `1' * Incompatible with: remote -remote Execute search remotely? * Incompatible with: gilist, seqidlist, negative_gilist, negative_seqidlist, subject_loc, num_threads
jump to blastn; jump to blastp; jump to blastx; jump to makeblastdb; Back to Top
module load BLAST+/2.7.1-foss-2016b-Python-2.7.14 blastp -help USAGE blastp [-h] [-help] [-import_search_strategy filename] [-export_search_strategy filename] [-task task_name] [-db database_name] [-dbsize num_letters] [-gilist filename] [-seqidlist filename] [-negative_gilist filename] [-negative_seqidlist filename] [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm] [-subject subject_input_file] [-subject_loc range] [-query input_file] [-out output_file] [-evalue evalue] [-word_size int_value] [-gapopen open_penalty] [-gapextend extend_penalty] [-qcov_hsp_perc float_value] [-max_hsps int_value] [-xdrop_ungap float_value] [-xdrop_gap float_value] [-xdrop_gap_final float_value] [-searchsp int_value] [-sum_stats bool_value] [-seg SEG_options] [-soft_masking soft_masking] [-matrix matrix_name] [-threshold float_value] [-culling_limit int_value] [-best_hit_overhang float_value] [-best_hit_score_edge float_value] [-window_size int_value] [-lcase_masking] [-query_loc range] [-parse_deflines] [-outfmt format] [-show_gis] [-num_descriptions int_value] [-num_alignments int_value] [-line_length line_length] [-html] [-max_target_seqs num_sequences] [-num_threads int_value] [-ungapped] [-remote] [-comp_based_stats compo] [-use_sw_tback] [-version] DESCRIPTION Protein-Protein BLAST 2.7.1+ OPTIONAL ARGUMENTS -h Print USAGE and DESCRIPTION; ignore all other parameters -help Print USAGE, DESCRIPTION and ARGUMENTS; ignore all other parameters -version Print version number; ignore other arguments *** Input query options -query <File_In> Input file name Default = `-' -query_loc <String> Location on the query sequence in 1-based offsets (Format: start-stop) *** General search options -task <String, Permissible values: 'blastp' 'blastp-fast' 'blastp-short' > Task to execute Default = `blastp' -db <String> BLAST database name * Incompatible with: subject, subject_loc -out <File_Out> Output file name Default = `-' -evalue <Real> Expectation value (E) threshold for saving hits Default = `10' -word_size <Integer, >=2> Word size for wordfinder algorithm -gapopen <Integer> Cost to open a gap -gapextend <Integer> Cost to extend a gap -matrix <String> Scoring matrix name (normally BLOSUM62) -threshold <Real, >=0> Minimum word score such that the word is added to the BLAST lookup table -comp_based_stats <String> Use composition-based statistics: D or d: default (equivalent to 2 ) 0 or F or f: No composition-based statistics 1: Composition-based statistics as in NAR 29:2994-3005, 2001 2 or T or t : Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, conditioned on sequence properties 3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally Default = `2' *** BLAST-2-Sequences options -subject <File_In> Subject sequence(s) to search * Incompatible with: db, gilist, seqidlist, negative_gilist, negative_seqidlist, db_soft_mask, db_hard_mask -subject_loc <String> Location on the subject sequence in 1-based offsets (Format: start-stop) * Incompatible with: db, gilist, seqidlist, negative_gilist, negative_seqidlist, db_soft_mask, db_hard_mask, remote *** Formatting options -outfmt <String> alignment view options: 0 = Pairwise, 1 = Query-anchored showing identities, 2 = Query-anchored no identities, 3 = Flat query-anchored showing identities, 4 = Flat query-anchored no identities, 5 = BLAST XML, 6 = Tabular, 7 = Tabular with comment lines, 8 = Seqalign (Text ASN.1), 9 = Seqalign (Binary ASN.1), 10 = Comma-separated values, 11 = BLAST archive (ASN.1), 12 = Seqalign (JSON), 13 = Multiple-file BLAST JSON, 14 = Multiple-file BLAST XML2, 15 = Single-file BLAST JSON, 16 = Single-file BLAST XML2, 18 = Organism Report Options 6, 7 and 10 can be additionally configured to produce a custom format specified by space delimited format specifiers. The supported format specifiers are: qseqid means Query Seq-id qgi means Query GI qacc means Query accesion qaccver means Query accesion.version qlen means Query sequence length sseqid means Subject Seq-id sallseqid means All subject Seq-id(s), separated by a ';' sgi means Subject GI sallgi means All subject GIs sacc means Subject accession saccver means Subject accession.version sallacc means All subject accessions slen means Subject sequence length qstart means Start of alignment in query qend means End of alignment in query sstart means Start of alignment in subject send means End of alignment in subject qseq means Aligned part of query sequence sseq means Aligned part of subject sequence evalue means Expect value bitscore means Bit score score means Raw score length means Alignment length pident means Percentage of identical matches nident means Number of identical matches mismatch means Number of mismatches positive means Number of positive-scoring matches gapopen means Number of gap openings gaps means Total number of gaps ppos means Percentage of positive-scoring matches frames means Query and subject frames separated by a '/' qframe means Query frame sframe means Subject frame btop means Blast traceback operations (BTOP) staxid means Subject Taxonomy ID ssciname means Subject Scientific Name scomname means Subject Common Name sblastname means Subject Blast Name sskingdom means Subject Super Kingdom staxids means unique Subject Taxonomy ID(s), separated by a ';' (in numerical order) sscinames means unique Subject Scientific Name(s), separated by a ';' scomnames means unique Subject Common Name(s), separated by a ';' sblastnames means unique Subject Blast Name(s), separated by a ';' (in alphabetical order) sskingdoms means unique Subject Super Kingdom(s), separated by a ';' (in alphabetical order) stitle means Subject Title salltitles means All Subject Title(s), separated by a '<>' sstrand means Subject Strand qcovs means Query Coverage Per Subject qcovhsp means Query Coverage Per HSP qcovus means Query Coverage Per Unique Subject (blastn only) When not provided, the default value is: 'qaccver saccver pident length mismatch gapopen qstart qend sstart send evalue bitscore', which is equivalent to the keyword 'std' Default = `0' -show_gis Show NCBI GIs in deflines? -num_descriptions <Integer, >=0> Number of database sequences to show one-line descriptions for Not applicable for outfmt > 4 Default = `500' * Incompatible with: max_target_seqs -num_alignments <Integer, >=0> Number of database sequences to show alignments for Default = `250' * Incompatible with: max_target_seqs -line_length <Integer, >=1> Line length for formatting alignments Not applicable for outfmt > 4 Default = `60' -html Produce HTML output? *** Query filtering options -seg <String> Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or 'no' to disable) Default = `no' -soft_masking <Boolean> Apply filtering locations as soft masks Default = `false' -lcase_masking Use lower case filtering in query and subject sequence(s)? *** Restrict search or results -gilist <String> Restrict search of database to list of GI's * Incompatible with: negative_gilist, seqidlist, negative_seqidlist, remote, subject, subject_loc -seqidlist <String> Restrict search of database to list of SeqId's * Incompatible with: gilist, negative_gilist, negative_seqidlist, remote, subject, subject_loc -negative_gilist <String> Restrict search of database to everything except the listed GIs * Incompatible with: gilist, seqidlist, remote, subject, subject_loc -negative_seqidlist <String> Restrict search of database to everything except the listed SeqIDs * Incompatible with: gilist, seqidlist, remote, subject, subject_loc -entrez_query <String> Restrict search with the given Entrez query * Requires: remote -db_soft_mask <String> Filtering algorithm ID to apply to the BLAST database as soft masking * Incompatible with: db_hard_mask, subject, subject_loc -db_hard_mask <String> Filtering algorithm ID to apply to the BLAST database as hard masking * Incompatible with: db_soft_mask, subject, subject_loc -qcov_hsp_perc <Real, 0..100> Percent query coverage per hsp -max_hsps <Integer, >=1> Set maximum number of HSPs per subject sequence to save for each query -culling_limit <Integer, >=0> If the query range of a hit is enveloped by that of at least this many higher-scoring hits, delete the hit * Incompatible with: best_hit_overhang, best_hit_score_edge -best_hit_overhang <Real, (>0 and <0.5)> Best Hit algorithm overhang value (recommended value: 0.1) * Incompatible with: culling_limit -best_hit_score_edge <Real, (>0 and <0.5)> Best Hit algorithm score edge value (recommended value: 0.1) * Incompatible with: culling_limit -max_target_seqs <Integer, >=1> Maximum number of aligned sequences to keep Not applicable for outfmt <= 4 Default = `500' * Incompatible with: num_descriptions, num_alignments *** Statistical options -dbsize <Int8> Effective length of the database -searchsp <Int8, >=0> Effective length of the search space -sum_stats <Boolean> Use sum statistics *** Search strategy options -import_search_strategy <File_In> Search strategy to use * Incompatible with: export_search_strategy -export_search_strategy <File_Out> File name to record the search strategy used * Incompatible with: import_search_strategy *** Extension options -xdrop_ungap <Real> X-dropoff value (in bits) for ungapped extensions -xdrop_gap <Real> X-dropoff value (in bits) for preliminary gapped extensions -xdrop_gap_final <Real> X-dropoff value (in bits) for final gapped alignment -window_size <Integer, >=0> Multiple hits window size, use 0 to specify 1-hit algorithm -ungapped Perform ungapped alignment only? *** Miscellaneous options -parse_deflines Should the query and subject defline(s) be parsed? -num_threads <Integer, (>=1 and =<48)> Number of threads (CPUs) to use in the BLAST search Default = `1' * Incompatible with: remote -remote Execute search remotely? * Incompatible with: gilist, seqidlist, negative_gilist, negative_seqidlist, subject_loc, num_threads -use_sw_tback Compute locally optimal Smith-Waterman alignments?
jump to blastn; jump to blastp; jump to blastx; jump to makeblastdb; Back to Top
module load BLAST+/2.7.1-foss-2016b-Python-2.7.14 blastx -help USAGE blastx [-h] [-help] [-import_search_strategy filename] [-export_search_strategy filename] [-task task_name] [-db database_name] [-dbsize num_letters] [-gilist filename] [-seqidlist filename] [-negative_gilist filename] [-negative_seqidlist filename] [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm] [-subject subject_input_file] [-subject_loc range] [-query input_file] [-out output_file] [-evalue evalue] [-word_size int_value] [-gapopen open_penalty] [-gapextend extend_penalty] [-qcov_hsp_perc float_value] [-max_hsps int_value] [-xdrop_ungap float_value] [-xdrop_gap float_value] [-xdrop_gap_final float_value] [-searchsp int_value] [-sum_stats bool_value] [-max_intron_length length] [-seg SEG_options] [-soft_masking soft_masking] [-matrix matrix_name] [-threshold float_value] [-culling_limit int_value] [-best_hit_overhang float_value] [-best_hit_score_edge float_value] [-window_size int_value] [-ungapped] [-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines] [-query_gencode int_value] [-outfmt format] [-show_gis] [-num_descriptions int_value] [-num_alignments int_value] [-line_length line_length] [-html] [-max_target_seqs num_sequences] [-num_threads int_value] [-remote] [-comp_based_stats compo] [-use_sw_tback] [-version] DESCRIPTION Translated Query-Protein Subject BLAST 2.7.1+ OPTIONAL ARGUMENTS -h Print USAGE and DESCRIPTION; ignore all other parameters -help Print USAGE, DESCRIPTION and ARGUMENTS; ignore all other parameters -version Print version number; ignore other arguments *** Input query options -query <File_In> Input file name Default = `-' -query_loc <String> Location on the query sequence in 1-based offsets (Format: start-stop) -strand <String, `both', `minus', `plus'> Query strand(s) to search against database/subject Default = `both' -query_gencode <Integer, values between: 1-6, 9-16, 21-25> Genetic code to use to translate query (see user manual for details) Default = `1' *** General search options -task <String, Permissible values: 'blastx' 'blastx-fast' > Task to execute Default = `blastx' -db <String> BLAST database name * Incompatible with: subject, subject_loc -out <File_Out> Output file name Default = `-' -evalue <Real> Expectation value (E) threshold for saving hits Default = `10' -word_size <Integer, >=2> Word size for wordfinder algorithm -gapopen <Integer> Cost to open a gap -gapextend <Integer> Cost to extend a gap -max_intron_length <Integer, >=0> Length of the largest intron allowed in a translated nucleotide sequence when linking multiple distinct alignments Default = `0' -matrix <String> Scoring matrix name (normally BLOSUM62) -threshold <Real, >=0> Minimum word score such that the word is added to the BLAST lookup table -comp_based_stats <String> Use composition-based statistics: D or d: default (equivalent to 2 ) 0 or F or f: No composition-based statistics 1: Composition-based statistics as in NAR 29:2994-3005, 2001 2 or T or t : Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, conditioned on sequence properties 3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally Default = `2' *** BLAST-2-Sequences options -subject <File_In> Subject sequence(s) to search * Incompatible with: db, gilist, seqidlist, negative_gilist, negative_seqidlist, db_soft_mask, db_hard_mask -subject_loc <String> Location on the subject sequence in 1-based offsets (Format: start-stop) * Incompatible with: db, gilist, seqidlist, negative_gilist, negative_seqidlist, db_soft_mask, db_hard_mask, remote *** Formatting options -outfmt <String> alignment view options: 0 = Pairwise, 1 = Query-anchored showing identities, 2 = Query-anchored no identities, 3 = Flat query-anchored showing identities, 4 = Flat query-anchored no identities, 5 = BLAST XML, 6 = Tabular, 7 = Tabular with comment lines, 8 = Seqalign (Text ASN.1), 9 = Seqalign (Binary ASN.1), 10 = Comma-separated values, 11 = BLAST archive (ASN.1), 12 = Seqalign (JSON), 13 = Multiple-file BLAST JSON, 14 = Multiple-file BLAST XML2, 15 = Single-file BLAST JSON, 16 = Single-file BLAST XML2, 18 = Organism Report Options 6, 7 and 10 can be additionally configured to produce a custom format specified by space delimited format specifiers. The supported format specifiers are: qseqid means Query Seq-id qgi means Query GI qacc means Query accesion qaccver means Query accesion.version qlen means Query sequence length sseqid means Subject Seq-id sallseqid means All subject Seq-id(s), separated by a ';' sgi means Subject GI sallgi means All subject GIs sacc means Subject accession saccver means Subject accession.version sallacc means All subject accessions slen means Subject sequence length qstart means Start of alignment in query qend means End of alignment in query sstart means Start of alignment in subject send means End of alignment in subject qseq means Aligned part of query sequence sseq means Aligned part of subject sequence evalue means Expect value bitscore means Bit score score means Raw score length means Alignment length pident means Percentage of identical matches nident means Number of identical matches mismatch means Number of mismatches positive means Number of positive-scoring matches gapopen means Number of gap openings gaps means Total number of gaps ppos means Percentage of positive-scoring matches frames means Query and subject frames separated by a '/' qframe means Query frame sframe means Subject frame btop means Blast traceback operations (BTOP) staxid means Subject Taxonomy ID ssciname means Subject Scientific Name scomname means Subject Common Name sblastname means Subject Blast Name sskingdom means Subject Super Kingdom staxids means unique Subject Taxonomy ID(s), separated by a ';' (in numerical order) sscinames means unique Subject Scientific Name(s), separated by a ';' scomnames means unique Subject Common Name(s), separated by a ';' sblastnames means unique Subject Blast Name(s), separated by a ';' (in alphabetical order) sskingdoms means unique Subject Super Kingdom(s), separated by a ';' (in alphabetical order) stitle means Subject Title salltitles means All Subject Title(s), separated by a '<>' sstrand means Subject Strand qcovs means Query Coverage Per Subject qcovhsp means Query Coverage Per HSP qcovus means Query Coverage Per Unique Subject (blastn only) When not provided, the default value is: 'qaccver saccver pident length mismatch gapopen qstart qend sstart send evalue bitscore', which is equivalent to the keyword 'std' Default = `0' -show_gis Show NCBI GIs in deflines? -num_descriptions <Integer, >=0> Number of database sequences to show one-line descriptions for Not applicable for outfmt > 4 Default = `500' * Incompatible with: max_target_seqs -num_alignments <Integer, >=0> Number of database sequences to show alignments for Default = `250' * Incompatible with: max_target_seqs -line_length <Integer, >=1> Line length for formatting alignments Not applicable for outfmt > 4 Default = `60' -html Produce HTML output? *** Query filtering options -seg <String> Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or 'no' to disable) Default = `12 2.2 2.5' -soft_masking <Boolean> Apply filtering locations as soft masks Default = `false' -lcase_masking Use lower case filtering in query and subject sequence(s)? *** Restrict search or results -gilist <String> Restrict search of database to list of GI's * Incompatible with: negative_gilist, seqidlist, negative_seqidlist, remote, subject, subject_loc -seqidlist <String> Restrict search of database to list of SeqId's * Incompatible with: gilist, negative_gilist, negative_seqidlist, remote, subject, subject_loc -negative_gilist <String> Restrict search of database to everything except the listed GIs * Incompatible with: gilist, seqidlist, remote, subject, subject_loc -negative_seqidlist <String> Restrict search of database to everything except the listed SeqIDs * Incompatible with: gilist, seqidlist, remote, subject, subject_loc -entrez_query <String> Restrict search with the given Entrez query * Requires: remote -db_soft_mask <String> Filtering algorithm ID to apply to the BLAST database as soft masking * Incompatible with: db_hard_mask, subject, subject_loc -db_hard_mask <String> Filtering algorithm ID to apply to the BLAST database as hard masking * Incompatible with: db_soft_mask, subject, subject_loc -qcov_hsp_perc <Real, 0..100> Percent query coverage per hsp -max_hsps <Integer, >=1> Set maximum number of HSPs per subject sequence to save for each query -culling_limit <Integer, >=0> If the query range of a hit is enveloped by that of at least this many higher-scoring hits, delete the hit * Incompatible with: best_hit_overhang, best_hit_score_edge -best_hit_overhang <Real, (>0 and <0.5)> Best Hit algorithm overhang value (recommended value: 0.1) * Incompatible with: culling_limit -best_hit_score_edge <Real, (>0 and <0.5)> Best Hit algorithm score edge value (recommended value: 0.1) * Incompatible with: culling_limit -max_target_seqs <Integer, >=1> Maximum number of aligned sequences to keep Not applicable for outfmt <= 4 Default = `500' * Incompatible with: num_descriptions, num_alignments *** Statistical options -dbsize <Int8> Effective length of the database -searchsp <Int8, >=0> Effective length of the search space -sum_stats <Boolean> Use sum statistics *** Search strategy options -import_search_strategy <File_In> Search strategy to use * Incompatible with: export_search_strategy -export_search_strategy <File_Out> File name to record the search strategy used * Incompatible with: import_search_strategy *** Extension options -xdrop_ungap <Real> X-dropoff value (in bits) for ungapped extensions -xdrop_gap <Real> X-dropoff value (in bits) for preliminary gapped extensions -xdrop_gap_final <Real> X-dropoff value (in bits) for final gapped alignment -window_size <Integer, >=0> Multiple hits window size, use 0 to specify 1-hit algorithm -ungapped Perform ungapped alignment only? *** Miscellaneous options -parse_deflines Should the query and subject defline(s) be parsed? -num_threads <Integer, (>=1 and =<48)> Number of threads (CPUs) to use in the BLAST search Default = `1' * Incompatible with: remote -remote Execute search remotely? * Incompatible with: gilist, seqidlist, negative_gilist, negative_seqidlist, subject_loc, num_threads -use_sw_tback Compute locally optimal Smith-Waterman alignments?
jump to blastn; jump to blastp; jump to blastx; jump to makeblastdb; Back to Top
module load BLAST+/2.7.1-foss-2016b-Python-2.7.14 makeblastdb -help USAGE makeblastdb [-h] [-help] [-in input_file] [-input_type type] -dbtype molecule_type [-title database_title] [-parse_seqids] [-hash_index] [-mask_data mask_data_files] [-mask_id mask_algo_ids] [-mask_desc mask_algo_descriptions] [-gi_mask] [-gi_mask_name gi_based_mask_names] [-out database_name] [-max_file_sz number_of_bytes] [-logfile File_Name] [-taxid TaxID] [-taxid_map TaxIDMapFile] [-version] DESCRIPTION Application to create BLAST databases, version 2.7.1+ REQUIRED ARGUMENTS -dbtype <String, `nucl', `prot'> Molecule type of target db OPTIONAL ARGUMENTS -h Print USAGE and DESCRIPTION; ignore all other parameters -help Print USAGE, DESCRIPTION and ARGUMENTS; ignore all other parameters -version Print version number; ignore other arguments *** Input options -in <File_In> Input file/database name Default = `-' -input_type <String, `asn1_bin', `asn1_txt', `blastdb', `fasta'> Type of the data specified in input_file Default = `fasta' *** Configuration options -title <String> Title for BLAST database Default = input file name provided to -in argument -parse_seqids Option to parse seqid for FASTA input if set, for all other input types seqids are parsed automatically -hash_index Create index of sequence hash values. *** Sequence masking options -mask_data <String> Comma-separated list of input files containing masking data as produced by NCBI masking applications (e.g. dustmasker, segmasker, windowmasker) -mask_id <String> Comma-separated list of strings to uniquely identify the masking algorithm * Requires: mask_data * Incompatible with: gi_mask -mask_desc <String> Comma-separated list of free form strings to describe the masking algorithm details * Requires: mask_id -gi_mask Create GI indexed masking data. * Requires: parse_seqids * Incompatible with: mask_id -gi_mask_name <String> Comma-separated list of masking data output files. * Requires: mask_data, gi_mask *** Output options -out <String> Name of BLAST database to be created Default = input file name provided to -in argumentRequired if multiple file(s)/database(s) are provided as input -max_file_sz <String> Maximum file size for BLAST database files Default = `1GB' -logfile <File_Out> File to which the program log should be redirected *** Taxonomy options -taxid <Integer, >=0> Taxonomy ID to assign to all sequences * Incompatible with: taxid_map -taxid_map <File_In> Text file mapping sequence IDs to taxonomy IDs. Format:<SequenceId> <TaxonomyId><newline> * Requires: parse_seqids * Incompatible with: taxid