Difference between revisions of "BLAST+-Sapelo2"

From Research Computing Center Wiki
Jump to navigation Jump to search
Line 91: Line 91:
 
<span id="blastn"></span>
 
<span id="blastn"></span>
 
<pre  class="gcommand">  
 
<pre  class="gcommand">  
module load BLAST+/2.7.1-foss-2016b-Python-2.7.14
+
module load BLAST+/2.12.0-gompi-2020b
 
blastn  -help
 
blastn  -help
 
USAGE
 
USAGE
Line 98: Line 98:
 
     [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
 
     [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
 
     [-negative_gilist filename] [-negative_seqidlist filename]
 
     [-negative_gilist filename] [-negative_seqidlist filename]
     [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm]
+
     [-taxids taxids] [-negative_taxids taxids] [-taxidlist filename]
    [-db_hard_mask filtering_algorithm] [-subject subject_input_file]
+
    [-negative_taxidlist filename] [-entrez_query entrez_query]
    [-subject_loc range] [-query input_file] [-out output_file]
+
    [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm]
    [-evalue evalue] [-word_size int_value] [-gapopen open_penalty]
+
    [-subject subject_input_file] [-subject_loc range] [-query input_file]
    [-gapextend extend_penalty] [-perc_identity float_value]
+
    [-out output_file] [-evalue evalue] [-word_size int_value]
    [-qcov_hsp_perc float_value] [-max_hsps int_value]
+
    [-gapopen open_penalty] [-gapextend extend_penalty]
    [-xdrop_ungap float_value] [-xdrop_gap float_value]
+
    [-perc_identity float_value] [-qcov_hsp_perc float_value]
 +
    [-max_hsps int_value] [-xdrop_ungap float_value] [-xdrop_gap float_value]
 
     [-xdrop_gap_final float_value] [-searchsp int_value]
 
     [-xdrop_gap_final float_value] [-searchsp int_value]
 
     [-sum_stats bool_value] [-penalty penalty] [-reward reward] [-no_greedy]
 
     [-sum_stats bool_value] [-penalty penalty] [-reward reward] [-no_greedy]
Line 113: Line 114:
 
     [-window_masker_db window_masker_db] [-soft_masking soft_masking]
 
     [-window_masker_db window_masker_db] [-soft_masking soft_masking]
 
     [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value]
 
     [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value]
     [-best_hit_score_edge float_value] [-window_size int_value]
+
     [-best_hit_score_edge float_value] [-subject_besthit]
    [-off_diagonal_range int_value] [-use_index boolean] [-index_name string]
+
    [-window_size int_value] [-off_diagonal_range int_value]
    [-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines]
+
    [-use_index boolean] [-index_name string] [-lcase_masking]
    [-outfmt format] [-show_gis] [-num_descriptions int_value]
+
    [-query_loc range] [-strand strand] [-parse_deflines] [-outfmt format]
    [-num_alignments int_value] [-line_length line_length] [-html]
+
    [-show_gis] [-num_descriptions int_value] [-num_alignments int_value]
     [-max_target_seqs num_sequences] [-num_threads int_value] [-remote]
+
    [-line_length line_length] [-html] [-sorthits sort_hits]
    [-version]
+
     [-sorthsps sort_hsps] [-max_target_seqs num_sequences]
 +
    [-num_threads int_value] [-mt_mode int_value] [-remote] [-version]
  
 
DESCRIPTION
 
DESCRIPTION
   Nucleotide-Nucleotide BLAST 2.7.1+
+
   Nucleotide-Nucleotide BLAST 2.12.0+
  
 
OPTIONAL ARGUMENTS
 
OPTIONAL ARGUMENTS
Line 176: Line 178:
 
   Subject sequence(s) to search
 
   Subject sequence(s) to search
 
     * Incompatible with:  db, gilist, seqidlist, negative_gilist,
 
     * Incompatible with:  db, gilist, seqidlist, negative_gilist,
   negative_seqidlist, db_soft_mask, db_hard_mask
+
   negative_seqidlist, taxids, taxidlist, negative_taxids, negative_taxidlist,
 +
  db_soft_mask, db_hard_mask
 
  -subject_loc <String>
 
  -subject_loc <String>
 
   Location on the subject sequence in 1-based offsets (Format: start-stop)
 
   Location on the subject sequence in 1-based offsets (Format: start-stop)
 
     * Incompatible with:  db, gilist, seqidlist, negative_gilist,
 
     * Incompatible with:  db, gilist, seqidlist, negative_gilist,
   negative_seqidlist, db_soft_mask, db_hard_mask, remote
+
   negative_seqidlist, taxids, taxidlist, negative_taxids, negative_taxidlist,
 +
  db_soft_mask, db_hard_mask, remote
  
 
  *** Formatting options
 
  *** Formatting options
Line 206: Line 210:
 
    
 
    
 
   Options 6, 7, 10 and 17 can be additionally configured to produce
 
   Options 6, 7, 10 and 17 can be additionally configured to produce
   a custom format specified by space delimited format specifiers.
+
   a custom format specified by space delimited format specifiers,
 +
  or in the case of options 6, 7, and 10, by a token specified
 +
  by the delim keyword. E.g.: "17 delim=@ qacc sacc score".
 +
  The delim keyword must appear after the numeric output format
 +
  specification.
 
   The supported format specifiers for options 6, 7 and 10 are:
 
   The supported format specifiers for options 6, 7 and 10 are:
            qseqid means Query Seq-id
+
      qseqid means Query Seq-id
              qgi means Query GI
+
        qgi means Query GI
              qacc means Query accesion
+
        qacc means Query accesion
          qaccver means Query accesion.version
+
    qaccver means Query accesion.version
              qlen means Query sequence length
+
        qlen means Query sequence length
            sseqid means Subject Seq-id
+
      sseqid means Subject Seq-id
        sallseqid means All subject Seq-id(s), separated by a ';'
+
  sallseqid means All subject Seq-id(s), separated by a ';'
              sgi means Subject GI
+
        sgi means Subject GI
            sallgi means All subject GIs
+
      sallgi means All subject GIs
              sacc means Subject accession
+
        sacc means Subject accession
          saccver means Subject accession.version
+
    saccver means Subject accession.version
          sallacc means All subject accessions
+
    sallacc means All subject accessions
              slen means Subject sequence length
+
        slen means Subject sequence length
            qstart means Start of alignment in query
+
      qstart means Start of alignment in query
              qend means End of alignment in query
+
        qend means End of alignment in query
            sstart means Start of alignment in subject
+
      sstart means Start of alignment in subject
              send means End of alignment in subject
+
        send means End of alignment in subject
              qseq means Aligned part of query sequence
+
        qseq means Aligned part of query sequence
              sseq means Aligned part of subject sequence
+
        sseq means Aligned part of subject sequence
            evalue means Expect value
+
      evalue means Expect value
          bitscore means Bit score
+
    bitscore means Bit score
            score means Raw score
+
      score means Raw score
            length means Alignment length
+
      length means Alignment length
            pident means Percentage of identical matches
+
      pident means Percentage of identical matches
            nident means Number of identical matches
+
      nident means Number of identical matches
          mismatch means Number of mismatches
+
    mismatch means Number of mismatches
          positive means Number of positive-scoring matches
+
    positive means Number of positive-scoring matches
          gapopen means Number of gap openings
+
    gapopen means Number of gap openings
              gaps means Total number of gaps
+
        gaps means Total number of gaps
              ppos means Percentage of positive-scoring matches
+
        ppos means Percentage of positive-scoring matches
            frames means Query and subject frames separated by a '/'
+
      frames means Query and subject frames separated by a '/'
            qframe means Query frame
+
      qframe means Query frame
            sframe means Subject frame
+
      sframe means Subject frame
              btop means Blast traceback operations (BTOP)
+
        btop means Blast traceback operations (BTOP)
            staxid means Subject Taxonomy ID
+
      staxid means Subject Taxonomy ID
          ssciname means Subject Scientific Name
+
    ssciname means Subject Scientific Name
          scomname means Subject Common Name
+
    scomname means Subject Common Name
        sblastname means Subject Blast Name
+
  sblastname means Subject Blast Name
        sskingdom means Subject Super Kingdom
+
  sskingdom means Subject Super Kingdom
          staxids means unique Subject Taxonomy ID(s), separated by a ';'
+
    staxids means unique Subject Taxonomy ID(s), separated by a ';'
                        (in numerical order)
+
  (in numerical order)
        sscinames means unique Subject Scientific Name(s), separated by a ';'
+
  sscinames means unique Subject Scientific Name(s), separated by a ';'
        scomnames means unique Subject Common Name(s), separated by a ';'
+
  scomnames means unique Subject Common Name(s), separated by a ';'
        sblastnames means unique Subject Blast Name(s), separated by a ';'
+
  sblastnames means unique Subject Blast Name(s), separated by a ';'
                        (in alphabetical order)
+
  (in alphabetical order)
        sskingdoms means unique Subject Super Kingdom(s), separated by a ';'
+
  sskingdoms means unique Subject Super Kingdom(s), separated by a ';'
                        (in alphabetical order)  
+
  (in alphabetical order)  
            stitle means Subject Title
+
      stitle means Subject Title
        salltitles means All Subject Title(s), separated by a '<>'
+
  salltitles means All Subject Title(s), separated by a '<>'
          sstrand means Subject Strand
+
    sstrand means Subject Strand
            qcovs means Query Coverage Per Subject
+
      qcovs means Query Coverage Per Subject
          qcovhsp means Query Coverage Per HSP
+
    qcovhsp means Query Coverage Per HSP
            qcovus means Query Coverage Per Unique Subject (blastn only)
+
      qcovus means Query Coverage Per Unique Subject (blastn only)
 
   When not provided, the default value is:
 
   When not provided, the default value is:
 
   'qaccver saccver pident length mismatch gapopen qstart qend sstart send
 
   'qaccver saccver pident length mismatch gapopen qstart qend sstart send
 
   evalue bitscore', which is equivalent to the keyword 'std'
 
   evalue bitscore', which is equivalent to the keyword 'std'
 
   The supported format specifier for option 17 is:
 
   The supported format specifier for option 17 is:
                SQ means Include Sequence Data
+
          SQ means Include Sequence Data
                SR means Subject as Reference Seq
+
          SR means Subject as Reference Seq
 
   Default = `0'
 
   Default = `0'
 
  -show_gis
 
  -show_gis
Line 285: Line 293:
 
  -html
 
  -html
 
   Produce HTML output?
 
   Produce HTML output?
 +
-sorthits <Integer, (>=0 and =<4)>
 +
  Sorting option for hits:
 +
  alignment view options:
 +
    0 = Sort by evalue,
 +
    1 = Sort by bit score,
 +
    2 = Sort by total score,
 +
    3 = Sort by percent identity,
 +
    4 = Sort by query coverage
 +
  Not applicable for outfmt > 4
 +
-sorthsps <Integer, (>=0 and =<4)>
 +
  Sorting option for hps:
 +
    0 = Sort by hsp evalue,
 +
    1 = Sort by hsp score,
 +
    2 = Sort by hsp query start,
 +
    3 = Sort by hsp percent identity,
 +
    4 = Sort by hsp subject start
 +
  Not applicable for outfmt != 0
  
 
  *** Query filtering options
 
  *** Query filtering options
Line 305: Line 330:
 
  *** Restrict search or results
 
  *** Restrict search or results
 
  -gilist <String>
 
  -gilist <String>
   Restrict search of database to list of GI's
+
   Restrict search of database to list of GIs
     * Incompatible with:  negative_gilist, seqidlist, negative_seqidlist,
+
     * Incompatible with:  seqidlist, taxids, taxidlist, negative_gilist,
   remote, subject, subject_loc
+
   negative_seqidlist, negative_taxids, negative_taxidlist, remote, subject,
 +
  subject_loc
 
  -seqidlist <String>
 
  -seqidlist <String>
   Restrict search of database to list of SeqId's
+
   Restrict search of database to list of SeqIDs
     * Incompatible with:  gilist, negative_gilist, negative_seqidlist, remote,
+
     * Incompatible with:  gilist, taxids, taxidlist, negative_gilist,
   subject, subject_loc
+
  negative_seqidlist, negative_taxids, negative_taxidlist, remote, subject,
 +
   subject_loc
 
  -negative_gilist <String>
 
  -negative_gilist <String>
   Restrict search of database to everything except the listed GIs
+
   Restrict search of database to everything except the specified GIs
     * Incompatible with:  gilist, seqidlist, remote, subject, subject_loc
+
     * Incompatible with:  gilist, seqidlist, taxids, taxidlist,
 +
  negative_seqidlist, negative_taxids, negative_taxidlist, remote, subject,
 +
  subject_loc
 
  -negative_seqidlist <String>
 
  -negative_seqidlist <String>
   Restrict search of database to everything except the listed SeqIDs
+
   Restrict search of database to everything except the specified SeqIDs
     * Incompatible with:  gilist, seqidlist, remote, subject, subject_loc
+
     * Incompatible with:  gilist, seqidlist, taxids, taxidlist,
 +
  negative_gilist, negative_taxids, negative_taxidlist, remote, subject,
 +
  subject_loc
 +
-taxids <String>
 +
  Restrict search of database to include only the specified taxonomy IDs
 +
  (multiple IDs delimited by ',')
 +
    * Incompatible with:  gilist, seqidlist, taxidlist, negative_gilist,
 +
  negative_seqidlist, negative_taxids, negative_taxidlist, remote, subject,
 +
  subject_loc
 +
-negative_taxids <String>
 +
  Restrict search of database to everything except the specified taxonomy IDs
 +
  (multiple IDs delimited by ',')
 +
    * Incompatible with:  gilist, seqidlist, taxids, taxidlist,
 +
  negative_gilist, negative_seqidlist, negative_taxidlist, remote, subject,
 +
  subject_loc
 +
-taxidlist <String>
 +
  Restrict search of database to include only the specified taxonomy IDs
 +
    * Incompatible with:  gilist, seqidlist, taxids, negative_gilist,
 +
  negative_seqidlist, negative_taxids, negative_taxidlist, remote, subject,
 +
  subject_loc
 +
-negative_taxidlist <String>
 +
  Restrict search of database to everything except the specified taxonomy IDs
 +
    * Incompatible with:  gilist, seqidlist, taxids, taxidlist,
 +
  negative_gilist, negative_seqidlist, negative_taxids, remote, subject,
 +
  subject_loc
 
  -entrez_query <String>
 
  -entrez_query <String>
 
   Restrict search with the given Entrez query
 
   Restrict search with the given Entrez query
Line 343: Line 396:
 
   Best Hit algorithm score edge value (recommended value: 0.1)
 
   Best Hit algorithm score edge value (recommended value: 0.1)
 
     * Incompatible with:  culling_limit
 
     * Incompatible with:  culling_limit
 +
-subject_besthit
 +
  Turn on best hit per subject sequence
 
  -max_target_seqs <Integer, >=1>
 
  -max_target_seqs <Integer, >=1>
 
   Maximum number of aligned sequences to keep  
 
   Maximum number of aligned sequences to keep  
   Not applicable for outfmt <= 4
+
   (value of 5 or more is recommended)
 
   Default = `500'
 
   Default = `500'
 
     * Incompatible with:  num_descriptions, num_alignments
 
     * Incompatible with:  num_descriptions, num_alignments
Line 396: Line 451:
 
  -parse_deflines
 
  -parse_deflines
 
   Should the query and subject defline(s) be parsed?
 
   Should the query and subject defline(s) be parsed?
  -num_threads <Integer, (>=1 and =<48)>
+
  -num_threads <Integer, >=1>
 
   Number of threads (CPUs) to use in the BLAST search
 
   Number of threads (CPUs) to use in the BLAST search
 
   Default = `1'
 
   Default = `1'
 
     * Incompatible with:  remote
 
     * Incompatible with:  remote
 +
-mt_mode <Integer, (>=0 and =<1)>
 +
  Multi-thread mode to use in BLAST search:
 +
    0 (auto) split by database
 +
    1 split by queries
 +
  Default = `0'
 +
    * Requires:  num_threads
 
  -remote
 
  -remote
 
   Execute search remotely?
 
   Execute search remotely?
     * Incompatible with:  gilist, seqidlist, negative_gilist,
+
     * Incompatible with:  gilist, seqidlist, taxids, taxidlist,
   negative_seqidlist, subject_loc, num_threads
+
   negative_gilist, negative_seqidlist, negative_taxids, negative_taxidlist,
 +
  subject_loc, num_threads
 
</pre>
 
</pre>
 
[[#blastn| jump to blastn]]; [[#blastp| jump to blastp]]; [[#blastx| jump to blastx]]; [[#makeblastdb| jump to makeblastdb]]; [[#top|Back to Top]]
 
[[#blastn| jump to blastn]]; [[#blastp| jump to blastp]]; [[#blastx| jump to blastx]]; [[#makeblastdb| jump to makeblastdb]]; [[#top|Back to Top]]
Line 409: Line 471:
  
 
<pre  class="gcommand">  
 
<pre  class="gcommand">  
module load BLAST+/2.7.1-foss-2016b-Python-2.7.14
+
module load BLAST+/2.12.0-gompi-2020b
 
blastp -help
 
blastp -help
 
USAGE
 
USAGE
Line 698: Line 760:
  
 
<pre  class="gcommand">  
 
<pre  class="gcommand">  
module load BLAST+/2.7.1-foss-2016b-Python-2.7.14
+
module load BLAST+/2.12.0-gompi-2020b
 
blastx -help     
 
blastx -help     
 
USAGE
 
USAGE
Line 999: Line 1,061:
  
 
<pre  class="gcommand">  
 
<pre  class="gcommand">  
module load BLAST+/2.7.1-foss-2016b-Python-2.7.14
+
module load BLAST+/2.12.0-gompi-2020b
 
makeblastdb -help
 
makeblastdb -help
 
USAGE
 
USAGE
Line 1,007: Line 1,069:
 
     [-mask_desc mask_algo_descriptions] [-gi_mask]
 
     [-mask_desc mask_algo_descriptions] [-gi_mask]
 
     [-gi_mask_name gi_based_mask_names] [-out database_name]
 
     [-gi_mask_name gi_based_mask_names] [-out database_name]
     [-max_file_sz number_of_bytes] [-logfile File_Name] [-taxid TaxID]
+
     [-blastdb_version version] [-max_file_sz number_of_bytes]
    [-taxid_map TaxIDMapFile] [-version]
+
    [-logfile File_Name] [-taxid TaxID] [-taxid_map TaxIDMapFile] [-version]
  
 
DESCRIPTION
 
DESCRIPTION
   Application to create BLAST databases, version 2.7.1+
+
   Application to create BLAST databases, version 2.12.0+
  
 
REQUIRED ARGUMENTS
 
REQUIRED ARGUMENTS
Line 1,068: Line 1,130:
 
   Default = input file name provided to -in argumentRequired if multiple
 
   Default = input file name provided to -in argumentRequired if multiple
 
   file(s)/database(s) are provided as input
 
   file(s)/database(s) are provided as input
 +
-blastdb_version <Integer, 4..5>
 +
  Version of BLAST database to be created
 +
  Default = `5'
 
  -max_file_sz <String>
 
  -max_file_sz <String>
 
   Maximum file size for BLAST database files
 
   Maximum file size for BLAST database files
Line 1,083: Line 1,148:
 
     * Requires:  parse_seqids
 
     * Requires:  parse_seqids
 
     * Incompatible with:  taxid
 
     * Incompatible with:  taxid
 +
  
 
</pre>
 
</pre>
  
 
[[#top|Back to Top]]
 
[[#top|Back to Top]]

Revision as of 14:00, 8 April 2022

Category

Bioinformatics

Program On

Sapelo2

Version

2.2.31, 2.9.0,2.10.1,2.11.0,2.12.0

Author / Distributor

NCBI

Description

Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. More information: http://blast.ncbi.nlm.nih.gov/

Running Program

Also refer to Running Jobs on Sapelo2

For more information on Environment Modules on Sapelo please see the Lmod page. To find all versions of BLAST+ installed on Sapelo2, please use module spider command to do a search, as shown below:

module spider BLAST+

NCBI databases are available at Sapelo2, please refer to Bioinformatics_Databases for more details. Datasets are located in the commonly shared "/db" filesystem. NCBI BLAST datasets are pre-formatted to work with BLAST and BLAST+ and are located in "/db/ncbiblast/" and are organized by date.

To use BLAST+ version 2.12.0, please first load the module, for example:

module load BLAST+/2.12.0-gompi-2020b

Example of a shell script sub.sh to run blastn on the batch partition:

#!/bin/bash
#SBATCH --job-name=j_BLAST+
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=BLAST+.%j.out
#SBATCH --error=BLAST+.%j.err
cd $SLURM_SUBMIT_DIR
ml BLAST+/2.12.0-gompi-2020b
ml ncbiblastdb/20220404
blastn [options]


where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of nodes and cores per node, and the job name need to be modified appropriately as well.

Submit the job to the queue with

sbatch ./sub.sh

Running BLAST+ with multiple threads

Some BLAST+ commands, such as blastn, have the option to use multiple threads with the -num_threads option. Each thread should be run on one core, so the number of cpus you should request should be equal to -num_threads + 1. This accounts for the number of worker threads plus one main process thread.

To request cores, use the slurm header #SBATCH --cpus-per-task.

Below is an example, with -num_threads 8

#!/bin/bash
#SBATCH --job-name=j_BLAST+_multithread
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=9
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=BLAST+.%j.out
#SBATCH --error=BLAST+.%j.err
cd $SLURM_SUBMIT_DIR
ml BLAST+/2.12.0-gompi-2020b
ml ncbiblastdb/20220404
blastn -num_threads 8 -query example.fasta -out results.out -db nt

Documentation

jump to blastn; jump to blastp; jump to blastx; jump to makeblastdb; Back to Top

 
module load  BLAST+/2.12.0-gompi-2020b
blastn  -help
USAGE
  blastn [-h] [-help] [-import_search_strategy filename]
    [-export_search_strategy filename] [-task task_name] [-db database_name]
    [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
    [-negative_gilist filename] [-negative_seqidlist filename]
    [-taxids taxids] [-negative_taxids taxids] [-taxidlist filename]
    [-negative_taxidlist filename] [-entrez_query entrez_query]
    [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm]
    [-subject subject_input_file] [-subject_loc range] [-query input_file]
    [-out output_file] [-evalue evalue] [-word_size int_value]
    [-gapopen open_penalty] [-gapextend extend_penalty]
    [-perc_identity float_value] [-qcov_hsp_perc float_value]
    [-max_hsps int_value] [-xdrop_ungap float_value] [-xdrop_gap float_value]
    [-xdrop_gap_final float_value] [-searchsp int_value]
    [-sum_stats bool_value] [-penalty penalty] [-reward reward] [-no_greedy]
    [-min_raw_gapped_score int_value] [-template_type type]
    [-template_length int_value] [-dust DUST_options]
    [-filtering_db filtering_database]
    [-window_masker_taxid window_masker_taxid]
    [-window_masker_db window_masker_db] [-soft_masking soft_masking]
    [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value]
    [-best_hit_score_edge float_value] [-subject_besthit]
    [-window_size int_value] [-off_diagonal_range int_value]
    [-use_index boolean] [-index_name string] [-lcase_masking]
    [-query_loc range] [-strand strand] [-parse_deflines] [-outfmt format]
    [-show_gis] [-num_descriptions int_value] [-num_alignments int_value]
    [-line_length line_length] [-html] [-sorthits sort_hits]
    [-sorthsps sort_hsps] [-max_target_seqs num_sequences]
    [-num_threads int_value] [-mt_mode int_value] [-remote] [-version]

DESCRIPTION
   Nucleotide-Nucleotide BLAST 2.12.0+

OPTIONAL ARGUMENTS
 -h
   Print USAGE and DESCRIPTION;  ignore all other parameters
 -help
   Print USAGE, DESCRIPTION and ARGUMENTS; ignore all other parameters
 -version
   Print version number;  ignore other arguments

 *** Input query options
 -query <File_In>
   Input file name
   Default = `-'
 -query_loc <String>
   Location on the query sequence in 1-based offsets (Format: start-stop)
 -strand <String, `both', `minus', `plus'>
   Query strand(s) to search against database/subject
   Default = `both'

 *** General search options
 -task <String, Permissible values: 'blastn' 'blastn-short' 'dc-megablast'
                'megablast' 'rmblastn' >
   Task to execute
   Default = `megablast'
 -db <String>
   BLAST database name
    * Incompatible with:  subject, subject_loc
 -out <File_Out>
   Output file name
   Default = `-'
 -evalue <Real>
   Expectation value (E) threshold for saving hits 
   Default = `10'
 -word_size <Integer, >=4>
   Word size for wordfinder algorithm (length of best perfect match)
 -gapopen <Integer>
   Cost to open a gap
 -gapextend <Integer>
   Cost to extend a gap
 -penalty <Integer, <=0>
   Penalty for a nucleotide mismatch
 -reward <Integer, >=0>
   Reward for a nucleotide match
 -use_index <Boolean>
   Use MegaBLAST database index
   Default = `false'
 -index_name <String>
   MegaBLAST database index name (deprecated; use only for old style indices)

 *** BLAST-2-Sequences options
 -subject <File_In>
   Subject sequence(s) to search
    * Incompatible with:  db, gilist, seqidlist, negative_gilist,
   negative_seqidlist, taxids, taxidlist, negative_taxids, negative_taxidlist,
   db_soft_mask, db_hard_mask
 -subject_loc <String>
   Location on the subject sequence in 1-based offsets (Format: start-stop)
    * Incompatible with:  db, gilist, seqidlist, negative_gilist,
   negative_seqidlist, taxids, taxidlist, negative_taxids, negative_taxidlist,
   db_soft_mask, db_hard_mask, remote

 *** Formatting options
 -outfmt <String>
   alignment view options:
     0 = Pairwise,
     1 = Query-anchored showing identities,
     2 = Query-anchored no identities,
     3 = Flat query-anchored showing identities,
     4 = Flat query-anchored no identities,
     5 = BLAST XML,
     6 = Tabular,
     7 = Tabular with comment lines,
     8 = Seqalign (Text ASN.1),
     9 = Seqalign (Binary ASN.1),
    10 = Comma-separated values,
    11 = BLAST archive (ASN.1),
    12 = Seqalign (JSON),
    13 = Multiple-file BLAST JSON,
    14 = Multiple-file BLAST XML2,
    15 = Single-file BLAST JSON,
    16 = Single-file BLAST XML2,
    17 = Sequence Alignment/Map (SAM),
    18 = Organism Report
   
   Options 6, 7, 10 and 17 can be additionally configured to produce
   a custom format specified by space delimited format specifiers,
   or in the case of options 6, 7, and 10, by a token specified
   by the delim keyword. E.g.: "17 delim=@ qacc sacc score".
   The delim keyword must appear after the numeric output format
   specification.
   The supported format specifiers for options 6, 7 and 10 are:
   	    qseqid means Query Seq-id
   	       qgi means Query GI
   	      qacc means Query accesion
   	   qaccver means Query accesion.version
   	      qlen means Query sequence length
   	    sseqid means Subject Seq-id
   	 sallseqid means All subject Seq-id(s), separated by a ';'
   	       sgi means Subject GI
   	    sallgi means All subject GIs
   	      sacc means Subject accession
   	   saccver means Subject accession.version
   	   sallacc means All subject accessions
   	      slen means Subject sequence length
   	    qstart means Start of alignment in query
   	      qend means End of alignment in query
   	    sstart means Start of alignment in subject
   	      send means End of alignment in subject
   	      qseq means Aligned part of query sequence
   	      sseq means Aligned part of subject sequence
   	    evalue means Expect value
   	  bitscore means Bit score
   	     score means Raw score
   	    length means Alignment length
   	    pident means Percentage of identical matches
   	    nident means Number of identical matches
   	  mismatch means Number of mismatches
   	  positive means Number of positive-scoring matches
   	   gapopen means Number of gap openings
   	      gaps means Total number of gaps
   	      ppos means Percentage of positive-scoring matches
   	    frames means Query and subject frames separated by a '/'
   	    qframe means Query frame
   	    sframe means Subject frame
   	      btop means Blast traceback operations (BTOP)
   	    staxid means Subject Taxonomy ID
   	  ssciname means Subject Scientific Name
   	  scomname means Subject Common Name
   	sblastname means Subject Blast Name
   	 sskingdom means Subject Super Kingdom
   	   staxids means unique Subject Taxonomy ID(s), separated by a ';'
   			 (in numerical order)
   	 sscinames means unique Subject Scientific Name(s), separated by a ';'
   	 scomnames means unique Subject Common Name(s), separated by a ';'
   	sblastnames means unique Subject Blast Name(s), separated by a ';'
   			 (in alphabetical order)
   	sskingdoms means unique Subject Super Kingdom(s), separated by a ';'
   			 (in alphabetical order) 
   	    stitle means Subject Title
   	salltitles means All Subject Title(s), separated by a '<>'
   	   sstrand means Subject Strand
   	     qcovs means Query Coverage Per Subject
   	   qcovhsp means Query Coverage Per HSP
   	    qcovus means Query Coverage Per Unique Subject (blastn only)
   When not provided, the default value is:
   'qaccver saccver pident length mismatch gapopen qstart qend sstart send
   evalue bitscore', which is equivalent to the keyword 'std'
   The supported format specifier for option 17 is:
   	        SQ means Include Sequence Data
   	        SR means Subject as Reference Seq
   Default = `0'
 -show_gis
   Show NCBI GIs in deflines?
 -num_descriptions <Integer, >=0>
   Number of database sequences to show one-line descriptions for
   Not applicable for outfmt > 4
   Default = `500'
    * Incompatible with:  max_target_seqs
 -num_alignments <Integer, >=0>
   Number of database sequences to show alignments for
   Default = `250'
    * Incompatible with:  max_target_seqs
 -line_length <Integer, >=1>
   Line length for formatting alignments
   Not applicable for outfmt > 4
   Default = `60'
 -html
   Produce HTML output?
 -sorthits <Integer, (>=0 and =<4)>
   Sorting option for hits:
   alignment view options:
     0 = Sort by evalue,
     1 = Sort by bit score,
     2 = Sort by total score,
     3 = Sort by percent identity,
     4 = Sort by query coverage
   Not applicable for outfmt > 4
 -sorthsps <Integer, (>=0 and =<4)>
   Sorting option for hps:
     0 = Sort by hsp evalue,
     1 = Sort by hsp score,
     2 = Sort by hsp query start,
     3 = Sort by hsp percent identity,
     4 = Sort by hsp subject start
   Not applicable for outfmt != 0

 *** Query filtering options
 -dust <String>
   Filter query sequence with DUST (Format: 'yes', 'level window linker', or
   'no' to disable)
   Default = `20 64 1'
 -filtering_db <String>
   BLAST database containing filtering elements (i.e.: repeats)
 -window_masker_taxid <Integer>
   Enable WindowMasker filtering using a Taxonomic ID
 -window_masker_db <String>
   Enable WindowMasker filtering using this repeats database.
 -soft_masking <Boolean>
   Apply filtering locations as soft masks
   Default = `true'
 -lcase_masking
   Use lower case filtering in query and subject sequence(s)?

 *** Restrict search or results
 -gilist <String>
   Restrict search of database to list of GIs
    * Incompatible with:  seqidlist, taxids, taxidlist, negative_gilist,
   negative_seqidlist, negative_taxids, negative_taxidlist, remote, subject,
   subject_loc
 -seqidlist <String>
   Restrict search of database to list of SeqIDs
    * Incompatible with:  gilist, taxids, taxidlist, negative_gilist,
   negative_seqidlist, negative_taxids, negative_taxidlist, remote, subject,
   subject_loc
 -negative_gilist <String>
   Restrict search of database to everything except the specified GIs
    * Incompatible with:  gilist, seqidlist, taxids, taxidlist,
   negative_seqidlist, negative_taxids, negative_taxidlist, remote, subject,
   subject_loc
 -negative_seqidlist <String>
   Restrict search of database to everything except the specified SeqIDs
    * Incompatible with:  gilist, seqidlist, taxids, taxidlist,
   negative_gilist, negative_taxids, negative_taxidlist, remote, subject,
   subject_loc
 -taxids <String>
   Restrict search of database to include only the specified taxonomy IDs
   (multiple IDs delimited by ',')
    * Incompatible with:  gilist, seqidlist, taxidlist, negative_gilist,
   negative_seqidlist, negative_taxids, negative_taxidlist, remote, subject,
   subject_loc
 -negative_taxids <String>
   Restrict search of database to everything except the specified taxonomy IDs
   (multiple IDs delimited by ',')
    * Incompatible with:  gilist, seqidlist, taxids, taxidlist,
   negative_gilist, negative_seqidlist, negative_taxidlist, remote, subject,
   subject_loc
 -taxidlist <String>
   Restrict search of database to include only the specified taxonomy IDs
    * Incompatible with:  gilist, seqidlist, taxids, negative_gilist,
   negative_seqidlist, negative_taxids, negative_taxidlist, remote, subject,
   subject_loc
 -negative_taxidlist <String>
   Restrict search of database to everything except the specified taxonomy IDs
    * Incompatible with:  gilist, seqidlist, taxids, taxidlist,
   negative_gilist, negative_seqidlist, negative_taxids, remote, subject,
   subject_loc
 -entrez_query <String>
   Restrict search with the given Entrez query
    * Requires:  remote
 -db_soft_mask <String>
   Filtering algorithm ID to apply to the BLAST database as soft masking
    * Incompatible with:  db_hard_mask, subject, subject_loc
 -db_hard_mask <String>
   Filtering algorithm ID to apply to the BLAST database as hard masking
    * Incompatible with:  db_soft_mask, subject, subject_loc
 -perc_identity <Real, 0..100>
   Percent identity
 -qcov_hsp_perc <Real, 0..100>
   Percent query coverage per hsp
 -max_hsps <Integer, >=1>
   Set maximum number of HSPs per subject sequence to save for each query
 -culling_limit <Integer, >=0>
   If the query range of a hit is enveloped by that of at least this many
   higher-scoring hits, delete the hit
    * Incompatible with:  best_hit_overhang, best_hit_score_edge
 -best_hit_overhang <Real, (>0 and <0.5)>
   Best Hit algorithm overhang value (recommended value: 0.1)
    * Incompatible with:  culling_limit
 -best_hit_score_edge <Real, (>0 and <0.5)>
   Best Hit algorithm score edge value (recommended value: 0.1)
    * Incompatible with:  culling_limit
 -subject_besthit
   Turn on best hit per subject sequence
 -max_target_seqs <Integer, >=1>
   Maximum number of aligned sequences to keep 
   (value of 5 or more is recommended)
   Default = `500'
    * Incompatible with:  num_descriptions, num_alignments

 *** Discontiguous MegaBLAST options
 -template_type <String, `coding', `coding_and_optimal', `optimal'>
   Discontiguous MegaBLAST template type
    * Requires:  template_length
 -template_length <Integer, Permissible values: '16' '18' '21' >
   Discontiguous MegaBLAST template length
    * Requires:  template_type

 *** Statistical options
 -dbsize <Int8>
   Effective length of the database 
 -searchsp <Int8, >=0>
   Effective length of the search space
 -sum_stats <Boolean>
   Use sum statistics

 *** Search strategy options
 -import_search_strategy <File_In>
   Search strategy to use
    * Incompatible with:  export_search_strategy
 -export_search_strategy <File_Out>
   File name to record the search strategy used
    * Incompatible with:  import_search_strategy

 *** Extension options
 -xdrop_ungap <Real>
   X-dropoff value (in bits) for ungapped extensions
 -xdrop_gap <Real>
   X-dropoff value (in bits) for preliminary gapped extensions
 -xdrop_gap_final <Real>
   X-dropoff value (in bits) for final gapped alignment
 -no_greedy
   Use non-greedy dynamic programming extension
 -min_raw_gapped_score <Integer>
   Minimum raw gapped score to keep an alignment in the preliminary gapped and
   traceback stages
 -ungapped
   Perform ungapped alignment only?
 -window_size <Integer, >=0>
   Multiple hits window size, use 0 to specify 1-hit algorithm
 -off_diagonal_range <Integer, >=0>
   Number of off-diagonals to search for the 2nd hit, use 0 to turn off
   Default = `0'

 *** Miscellaneous options
 -parse_deflines
   Should the query and subject defline(s) be parsed?
 -num_threads <Integer, >=1>
   Number of threads (CPUs) to use in the BLAST search
   Default = `1'
    * Incompatible with:  remote
 -mt_mode <Integer, (>=0 and =<1)>
   Multi-thread mode to use in BLAST search:
    0 (auto) split by database 
    1 split by queries
   Default = `0'
    * Requires:  num_threads
 -remote
   Execute search remotely?
    * Incompatible with:  gilist, seqidlist, taxids, taxidlist,
   negative_gilist, negative_seqidlist, negative_taxids, negative_taxidlist,
   subject_loc, num_threads

jump to blastn; jump to blastp; jump to blastx; jump to makeblastdb; Back to Top

 
module load BLAST+/2.12.0-gompi-2020b
blastp -help
USAGE
  blastp [-h] [-help] [-import_search_strategy filename]
    [-export_search_strategy filename] [-task task_name] [-db database_name]
    [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
    [-negative_gilist filename] [-negative_seqidlist filename]
    [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm]
    [-db_hard_mask filtering_algorithm] [-subject subject_input_file]
    [-subject_loc range] [-query input_file] [-out output_file]
    [-evalue evalue] [-word_size int_value] [-gapopen open_penalty]
    [-gapextend extend_penalty] [-qcov_hsp_perc float_value]
    [-max_hsps int_value] [-xdrop_ungap float_value] [-xdrop_gap float_value]
    [-xdrop_gap_final float_value] [-searchsp int_value]
    [-sum_stats bool_value] [-seg SEG_options] [-soft_masking soft_masking]
    [-matrix matrix_name] [-threshold float_value] [-culling_limit int_value]
    [-best_hit_overhang float_value] [-best_hit_score_edge float_value]
    [-window_size int_value] [-lcase_masking] [-query_loc range]
    [-parse_deflines] [-outfmt format] [-show_gis]
    [-num_descriptions int_value] [-num_alignments int_value]
    [-line_length line_length] [-html] [-max_target_seqs num_sequences]
    [-num_threads int_value] [-ungapped] [-remote] [-comp_based_stats compo]
    [-use_sw_tback] [-version]

DESCRIPTION
   Protein-Protein BLAST 2.7.1+

OPTIONAL ARGUMENTS
 -h
   Print USAGE and DESCRIPTION;  ignore all other parameters
 -help
   Print USAGE, DESCRIPTION and ARGUMENTS; ignore all other parameters
 -version
   Print version number;  ignore other arguments

 *** Input query options
 -query <File_In>
   Input file name
   Default = `-'
 -query_loc <String>
   Location on the query sequence in 1-based offsets (Format: start-stop)

 *** General search options
 -task <String, Permissible values: 'blastp' 'blastp-fast' 'blastp-short' >
   Task to execute
   Default = `blastp'
 -db <String>
   BLAST database name
    * Incompatible with:  subject, subject_loc
 -out <File_Out>
   Output file name
   Default = `-'
 -evalue <Real>
   Expectation value (E) threshold for saving hits 
   Default = `10'
 -word_size <Integer, >=2>
   Word size for wordfinder algorithm
 -gapopen <Integer>
   Cost to open a gap
 -gapextend <Integer>
   Cost to extend a gap
 -matrix <String>
   Scoring matrix name (normally BLOSUM62)
 -threshold <Real, >=0>
   Minimum word score such that the word is added to the BLAST lookup table
 -comp_based_stats <String>
   Use composition-based statistics:
       D or d: default (equivalent to 2 )
       0 or F or f: No composition-based statistics
       1: Composition-based statistics as in NAR 29:2994-3005, 2001
       2 or T or t : Composition-based score adjustment as in Bioinformatics
   21:902-911,
       2005, conditioned on sequence properties
       3: Composition-based score adjustment as in Bioinformatics 21:902-911,
       2005, unconditionally
   Default = `2'

 *** BLAST-2-Sequences options
 -subject <File_In>
   Subject sequence(s) to search
    * Incompatible with:  db, gilist, seqidlist, negative_gilist,
   negative_seqidlist, db_soft_mask, db_hard_mask
 -subject_loc <String>
   Location on the subject sequence in 1-based offsets (Format: start-stop)
    * Incompatible with:  db, gilist, seqidlist, negative_gilist,
   negative_seqidlist, db_soft_mask, db_hard_mask, remote

 *** Formatting options
 -outfmt <String>
   alignment view options:
     0 = Pairwise,
     1 = Query-anchored showing identities,
     2 = Query-anchored no identities,
     3 = Flat query-anchored showing identities,
     4 = Flat query-anchored no identities,
     5 = BLAST XML,
     6 = Tabular,
     7 = Tabular with comment lines,
     8 = Seqalign (Text ASN.1),
     9 = Seqalign (Binary ASN.1),
    10 = Comma-separated values,
    11 = BLAST archive (ASN.1),
    12 = Seqalign (JSON),
    13 = Multiple-file BLAST JSON,
    14 = Multiple-file BLAST XML2,
    15 = Single-file BLAST JSON,
    16 = Single-file BLAST XML2,
    18 = Organism Report
   
   Options 6, 7 and 10 can be additionally configured to produce
   a custom format specified by space delimited format specifiers.
   The supported format specifiers are:
            qseqid means Query Seq-id
               qgi means Query GI
              qacc means Query accesion
           qaccver means Query accesion.version
              qlen means Query sequence length
            sseqid means Subject Seq-id
         sallseqid means All subject Seq-id(s), separated by a ';'
               sgi means Subject GI
            sallgi means All subject GIs
              sacc means Subject accession
           saccver means Subject accession.version
           sallacc means All subject accessions
              slen means Subject sequence length
            qstart means Start of alignment in query
              qend means End of alignment in query
            sstart means Start of alignment in subject
              send means End of alignment in subject
              qseq means Aligned part of query sequence
              sseq means Aligned part of subject sequence
            evalue means Expect value
          bitscore means Bit score
             score means Raw score
            length means Alignment length
            pident means Percentage of identical matches
            nident means Number of identical matches
          mismatch means Number of mismatches
          positive means Number of positive-scoring matches
           gapopen means Number of gap openings
              gaps means Total number of gaps
              ppos means Percentage of positive-scoring matches
            frames means Query and subject frames separated by a '/'
            qframe means Query frame
            sframe means Subject frame
              btop means Blast traceback operations (BTOP)
            staxid means Subject Taxonomy ID
          ssciname means Subject Scientific Name
          scomname means Subject Common Name
        sblastname means Subject Blast Name
         sskingdom means Subject Super Kingdom
           staxids means unique Subject Taxonomy ID(s), separated by a ';'
                         (in numerical order)
         sscinames means unique Subject Scientific Name(s), separated by a ';'
         scomnames means unique Subject Common Name(s), separated by a ';'
        sblastnames means unique Subject Blast Name(s), separated by a ';'
                         (in alphabetical order)
        sskingdoms means unique Subject Super Kingdom(s), separated by a ';'
                         (in alphabetical order) 
            stitle means Subject Title
        salltitles means All Subject Title(s), separated by a '<>'
           sstrand means Subject Strand
             qcovs means Query Coverage Per Subject
           qcovhsp means Query Coverage Per HSP
            qcovus means Query Coverage Per Unique Subject (blastn only)
   When not provided, the default value is:
   'qaccver saccver pident length mismatch gapopen qstart qend sstart send
   evalue bitscore', which is equivalent to the keyword 'std'
   Default = `0'
 -show_gis
   Show NCBI GIs in deflines?
 -num_descriptions <Integer, >=0>
   Number of database sequences to show one-line descriptions for
   Not applicable for outfmt > 4
   Default = `500'
    * Incompatible with:  max_target_seqs
 -num_alignments <Integer, >=0>
   Number of database sequences to show alignments for
   Default = `250'
    * Incompatible with:  max_target_seqs
 -line_length <Integer, >=1>
   Line length for formatting alignments
   Not applicable for outfmt > 4
   Default = `60'
 -html
   Produce HTML output?

 *** Query filtering options
 -seg <String>
   Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or
   'no' to disable)
   Default = `no'
 -soft_masking <Boolean>
   Apply filtering locations as soft masks
   Default = `false'
 -lcase_masking
   Use lower case filtering in query and subject sequence(s)?

 *** Restrict search or results
 -gilist <String>
   Restrict search of database to list of GI's
    * Incompatible with:  negative_gilist, seqidlist, negative_seqidlist,
   remote, subject, subject_loc
 -seqidlist <String>
   Restrict search of database to list of SeqId's
    * Incompatible with:  gilist, negative_gilist, negative_seqidlist, remote,
   subject, subject_loc
 -negative_gilist <String>
   Restrict search of database to everything except the listed GIs
    * Incompatible with:  gilist, seqidlist, remote, subject, subject_loc
 -negative_seqidlist <String>
   Restrict search of database to everything except the listed SeqIDs
    * Incompatible with:  gilist, seqidlist, remote, subject, subject_loc
 -entrez_query <String>
   Restrict search with the given Entrez query
    * Requires:  remote
 -db_soft_mask <String>
   Filtering algorithm ID to apply to the BLAST database as soft masking
    * Incompatible with:  db_hard_mask, subject, subject_loc
 -db_hard_mask <String>
   Filtering algorithm ID to apply to the BLAST database as hard masking
    * Incompatible with:  db_soft_mask, subject, subject_loc
 -qcov_hsp_perc <Real, 0..100>
   Percent query coverage per hsp
 -max_hsps <Integer, >=1>
   Set maximum number of HSPs per subject sequence to save for each query
 -culling_limit <Integer, >=0>
   If the query range of a hit is enveloped by that of at least this many
   higher-scoring hits, delete the hit
    * Incompatible with:  best_hit_overhang, best_hit_score_edge
 -best_hit_overhang <Real, (>0 and <0.5)>
   Best Hit algorithm overhang value (recommended value: 0.1)
    * Incompatible with:  culling_limit
 -best_hit_score_edge <Real, (>0 and <0.5)>
   Best Hit algorithm score edge value (recommended value: 0.1)
    * Incompatible with:  culling_limit
 -max_target_seqs <Integer, >=1>
   Maximum number of aligned sequences to keep 
   Not applicable for outfmt <= 4
   Default = `500'
    * Incompatible with:  num_descriptions, num_alignments

 *** Statistical options
 -dbsize <Int8>
   Effective length of the database 
 -searchsp <Int8, >=0>
   Effective length of the search space
 -sum_stats <Boolean>
   Use sum statistics

 *** Search strategy options
 -import_search_strategy <File_In>
   Search strategy to use
    * Incompatible with:  export_search_strategy
 -export_search_strategy <File_Out>
   File name to record the search strategy used
    * Incompatible with:  import_search_strategy

 *** Extension options
 -xdrop_ungap <Real>
   X-dropoff value (in bits) for ungapped extensions
 -xdrop_gap <Real>
   X-dropoff value (in bits) for preliminary gapped extensions
 -xdrop_gap_final <Real>
   X-dropoff value (in bits) for final gapped alignment
 -window_size <Integer, >=0>
   Multiple hits window size, use 0 to specify 1-hit algorithm
 -ungapped
   Perform ungapped alignment only?

 *** Miscellaneous options
 -parse_deflines
   Should the query and subject defline(s) be parsed?
 -num_threads <Integer, (>=1 and =<48)>
   Number of threads (CPUs) to use in the BLAST search
   Default = `1'
    * Incompatible with:  remote
 -remote
   Execute search remotely?
    * Incompatible with:  gilist, seqidlist, negative_gilist,
   negative_seqidlist, subject_loc, num_threads
 -use_sw_tback
   Compute locally optimal Smith-Waterman alignments?

jump to blastn; jump to blastp; jump to blastx; jump to makeblastdb; Back to Top

 
module load BLAST+/2.12.0-gompi-2020b
blastx -help     
USAGE
  blastx [-h] [-help] [-import_search_strategy filename]
    [-export_search_strategy filename] [-task task_name] [-db database_name]
    [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
    [-negative_gilist filename] [-negative_seqidlist filename]
    [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm]
    [-db_hard_mask filtering_algorithm] [-subject subject_input_file]
    [-subject_loc range] [-query input_file] [-out output_file]
    [-evalue evalue] [-word_size int_value] [-gapopen open_penalty]
    [-gapextend extend_penalty] [-qcov_hsp_perc float_value]
    [-max_hsps int_value] [-xdrop_ungap float_value] [-xdrop_gap float_value]
    [-xdrop_gap_final float_value] [-searchsp int_value]
    [-sum_stats bool_value] [-max_intron_length length] [-seg SEG_options]
    [-soft_masking soft_masking] [-matrix matrix_name]
    [-threshold float_value] [-culling_limit int_value]
    [-best_hit_overhang float_value] [-best_hit_score_edge float_value]
    [-window_size int_value] [-ungapped] [-lcase_masking] [-query_loc range]
    [-strand strand] [-parse_deflines] [-query_gencode int_value]
    [-outfmt format] [-show_gis] [-num_descriptions int_value]
    [-num_alignments int_value] [-line_length line_length] [-html]
    [-max_target_seqs num_sequences] [-num_threads int_value] [-remote]
    [-comp_based_stats compo] [-use_sw_tback] [-version]

DESCRIPTION
   Translated Query-Protein Subject BLAST 2.7.1+

OPTIONAL ARGUMENTS
 -h
   Print USAGE and DESCRIPTION;  ignore all other parameters
 -help
   Print USAGE, DESCRIPTION and ARGUMENTS; ignore all other parameters
 -version
   Print version number;  ignore other arguments

 *** Input query options
 -query <File_In>
   Input file name
   Default = `-'
 -query_loc <String>
   Location on the query sequence in 1-based offsets (Format: start-stop)
 -strand <String, `both', `minus', `plus'>
   Query strand(s) to search against database/subject
   Default = `both'
 -query_gencode <Integer, values between: 1-6, 9-16, 21-25>
   Genetic code to use to translate query (see user manual for details)
   Default = `1'

 *** General search options
 -task <String, Permissible values: 'blastx' 'blastx-fast' >
   Task to execute
   Default = `blastx'
 -db <String>
   BLAST database name
    * Incompatible with:  subject, subject_loc
 -out <File_Out>
   Output file name
   Default = `-'
 -evalue <Real>
   Expectation value (E) threshold for saving hits 
   Default = `10'
 -word_size <Integer, >=2>
   Word size for wordfinder algorithm
 -gapopen <Integer>
   Cost to open a gap
 -gapextend <Integer>
   Cost to extend a gap
 -max_intron_length <Integer, >=0>
   Length of the largest intron allowed in a translated nucleotide sequence
   when linking multiple distinct alignments
   Default = `0'
 -matrix <String>
   Scoring matrix name (normally BLOSUM62)
 -threshold <Real, >=0>
   Minimum word score such that the word is added to the BLAST lookup table
 -comp_based_stats <String>
   Use composition-based statistics:
       D or d: default (equivalent to 2 )
       0 or F or f: No composition-based statistics
       1: Composition-based statistics as in NAR 29:2994-3005, 2001
       2 or T or t : Composition-based score adjustment as in Bioinformatics
   21:902-911,
       2005, conditioned on sequence properties
       3: Composition-based score adjustment as in Bioinformatics 21:902-911,
       2005, unconditionally
   Default = `2'

 *** BLAST-2-Sequences options
 -subject <File_In>
   Subject sequence(s) to search
    * Incompatible with:  db, gilist, seqidlist, negative_gilist,
   negative_seqidlist, db_soft_mask, db_hard_mask
 -subject_loc <String>
   Location on the subject sequence in 1-based offsets (Format: start-stop)
    * Incompatible with:  db, gilist, seqidlist, negative_gilist,
   negative_seqidlist, db_soft_mask, db_hard_mask, remote

 *** Formatting options
 -outfmt <String>
   alignment view options:
     0 = Pairwise,
     1 = Query-anchored showing identities,
     2 = Query-anchored no identities,
     3 = Flat query-anchored showing identities,
     4 = Flat query-anchored no identities,
     5 = BLAST XML,
     6 = Tabular,
     7 = Tabular with comment lines,
     8 = Seqalign (Text ASN.1),
     9 = Seqalign (Binary ASN.1),
    10 = Comma-separated values,
    11 = BLAST archive (ASN.1),
    12 = Seqalign (JSON),
    13 = Multiple-file BLAST JSON,
    14 = Multiple-file BLAST XML2,
    15 = Single-file BLAST JSON,
    16 = Single-file BLAST XML2,
    18 = Organism Report
   
   Options 6, 7 and 10 can be additionally configured to produce
   a custom format specified by space delimited format specifiers.
   The supported format specifiers are:
            qseqid means Query Seq-id
               qgi means Query GI
              qacc means Query accesion
           qaccver means Query accesion.version
              qlen means Query sequence length
            sseqid means Subject Seq-id
         sallseqid means All subject Seq-id(s), separated by a ';'
               sgi means Subject GI
            sallgi means All subject GIs
              sacc means Subject accession
           saccver means Subject accession.version
           sallacc means All subject accessions
              slen means Subject sequence length
            qstart means Start of alignment in query
              qend means End of alignment in query
            sstart means Start of alignment in subject
              send means End of alignment in subject
              qseq means Aligned part of query sequence
              sseq means Aligned part of subject sequence
            evalue means Expect value
          bitscore means Bit score
             score means Raw score
            length means Alignment length
            pident means Percentage of identical matches
            nident means Number of identical matches
          mismatch means Number of mismatches
          positive means Number of positive-scoring matches
           gapopen means Number of gap openings
              gaps means Total number of gaps
              ppos means Percentage of positive-scoring matches
            frames means Query and subject frames separated by a '/'
            qframe means Query frame
            sframe means Subject frame
              btop means Blast traceback operations (BTOP)
            staxid means Subject Taxonomy ID
          ssciname means Subject Scientific Name
          scomname means Subject Common Name
        sblastname means Subject Blast Name
         sskingdom means Subject Super Kingdom
           staxids means unique Subject Taxonomy ID(s), separated by a ';'
                         (in numerical order)
         sscinames means unique Subject Scientific Name(s), separated by a ';'
         scomnames means unique Subject Common Name(s), separated by a ';'
        sblastnames means unique Subject Blast Name(s), separated by a ';'
                         (in alphabetical order)
        sskingdoms means unique Subject Super Kingdom(s), separated by a ';'
                         (in alphabetical order) 
            stitle means Subject Title
        salltitles means All Subject Title(s), separated by a '<>'
           sstrand means Subject Strand
             qcovs means Query Coverage Per Subject
           qcovhsp means Query Coverage Per HSP
            qcovus means Query Coverage Per Unique Subject (blastn only)
   When not provided, the default value is:
   'qaccver saccver pident length mismatch gapopen qstart qend sstart send
   evalue bitscore', which is equivalent to the keyword 'std'
   Default = `0'
 -show_gis
   Show NCBI GIs in deflines?
 -num_descriptions <Integer, >=0>
   Number of database sequences to show one-line descriptions for
   Not applicable for outfmt > 4
   Default = `500'
    * Incompatible with:  max_target_seqs
 -num_alignments <Integer, >=0>
   Number of database sequences to show alignments for
   Default = `250'
    * Incompatible with:  max_target_seqs
 -line_length <Integer, >=1>
   Line length for formatting alignments
   Not applicable for outfmt > 4
   Default = `60'
 -html
   Produce HTML output?

 *** Query filtering options
 -seg <String>
   Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or
   'no' to disable)
   Default = `12 2.2 2.5'
 -soft_masking <Boolean>
   Apply filtering locations as soft masks
   Default = `false'
 -lcase_masking
   Use lower case filtering in query and subject sequence(s)?

 *** Restrict search or results
 -gilist <String>
   Restrict search of database to list of GI's
    * Incompatible with:  negative_gilist, seqidlist, negative_seqidlist,
   remote, subject, subject_loc
 -seqidlist <String>
   Restrict search of database to list of SeqId's
    * Incompatible with:  gilist, negative_gilist, negative_seqidlist, remote,
   subject, subject_loc
 -negative_gilist <String>
   Restrict search of database to everything except the listed GIs
    * Incompatible with:  gilist, seqidlist, remote, subject, subject_loc
 -negative_seqidlist <String>
   Restrict search of database to everything except the listed SeqIDs
    * Incompatible with:  gilist, seqidlist, remote, subject, subject_loc
 -entrez_query <String>
   Restrict search with the given Entrez query
    * Requires:  remote
 -db_soft_mask <String>
   Filtering algorithm ID to apply to the BLAST database as soft masking
    * Incompatible with:  db_hard_mask, subject, subject_loc
 -db_hard_mask <String>
   Filtering algorithm ID to apply to the BLAST database as hard masking
    * Incompatible with:  db_soft_mask, subject, subject_loc
 -qcov_hsp_perc <Real, 0..100>
   Percent query coverage per hsp
 -max_hsps <Integer, >=1>
   Set maximum number of HSPs per subject sequence to save for each query
 -culling_limit <Integer, >=0>
   If the query range of a hit is enveloped by that of at least this many
   higher-scoring hits, delete the hit
    * Incompatible with:  best_hit_overhang, best_hit_score_edge
 -best_hit_overhang <Real, (>0 and <0.5)>
   Best Hit algorithm overhang value (recommended value: 0.1)
    * Incompatible with:  culling_limit
 -best_hit_score_edge <Real, (>0 and <0.5)>
   Best Hit algorithm score edge value (recommended value: 0.1)
    * Incompatible with:  culling_limit
 -max_target_seqs <Integer, >=1>
   Maximum number of aligned sequences to keep 
   Not applicable for outfmt <= 4
   Default = `500'
    * Incompatible with:  num_descriptions, num_alignments

 *** Statistical options
 -dbsize <Int8>
   Effective length of the database 
 -searchsp <Int8, >=0>
   Effective length of the search space
 -sum_stats <Boolean>
   Use sum statistics

 *** Search strategy options
 -import_search_strategy <File_In>
   Search strategy to use
    * Incompatible with:  export_search_strategy
 -export_search_strategy <File_Out>
   File name to record the search strategy used
    * Incompatible with:  import_search_strategy

 *** Extension options
 -xdrop_ungap <Real>
   X-dropoff value (in bits) for ungapped extensions
 -xdrop_gap <Real>
   X-dropoff value (in bits) for preliminary gapped extensions
 -xdrop_gap_final <Real>
   X-dropoff value (in bits) for final gapped alignment
 -window_size <Integer, >=0>
   Multiple hits window size, use 0 to specify 1-hit algorithm
 -ungapped
   Perform ungapped alignment only?

 *** Miscellaneous options
 -parse_deflines
   Should the query and subject defline(s) be parsed?
 -num_threads <Integer, (>=1 and =<48)>
   Number of threads (CPUs) to use in the BLAST search
   Default = `1'
    * Incompatible with:  remote
 -remote
   Execute search remotely?
    * Incompatible with:  gilist, seqidlist, negative_gilist,
   negative_seqidlist, subject_loc, num_threads
 -use_sw_tback
   Compute locally optimal Smith-Waterman alignments?


jump to blastn; jump to blastp; jump to blastx; jump to makeblastdb; Back to Top

 
module load BLAST+/2.12.0-gompi-2020b
makeblastdb -help
USAGE
  makeblastdb [-h] [-help] [-in input_file] [-input_type type]
    -dbtype molecule_type [-title database_title] [-parse_seqids]
    [-hash_index] [-mask_data mask_data_files] [-mask_id mask_algo_ids]
    [-mask_desc mask_algo_descriptions] [-gi_mask]
    [-gi_mask_name gi_based_mask_names] [-out database_name]
    [-blastdb_version version] [-max_file_sz number_of_bytes]
    [-logfile File_Name] [-taxid TaxID] [-taxid_map TaxIDMapFile] [-version]

DESCRIPTION
   Application to create BLAST databases, version 2.12.0+

REQUIRED ARGUMENTS
 -dbtype <String, `nucl', `prot'>
   Molecule type of target db

OPTIONAL ARGUMENTS
 -h
   Print USAGE and DESCRIPTION;  ignore all other parameters
 -help
   Print USAGE, DESCRIPTION and ARGUMENTS; ignore all other parameters
 -version
   Print version number;  ignore other arguments

 *** Input options
 -in <File_In>
   Input file/database name
   Default = `-'
 -input_type <String, `asn1_bin', `asn1_txt', `blastdb', `fasta'>
   Type of the data specified in input_file
   Default = `fasta'

 *** Configuration options
 -title <String>
   Title for BLAST database
   Default = input file name provided to -in argument
 -parse_seqids
   Option to parse seqid for FASTA input if set, for all other input types
   seqids are parsed automatically
 -hash_index
   Create index of sequence hash values.

 *** Sequence masking options
 -mask_data <String>
   Comma-separated list of input files containing masking data as produced by
   NCBI masking applications (e.g. dustmasker, segmasker, windowmasker)
 -mask_id <String>
   Comma-separated list of strings to uniquely identify the masking algorithm
    * Requires:  mask_data
    * Incompatible with:  gi_mask
 -mask_desc <String>
   Comma-separated list of free form strings to describe the masking algorithm
   details
    * Requires:  mask_id
 -gi_mask
   Create GI indexed masking data.
    * Requires:  parse_seqids
    * Incompatible with:  mask_id
 -gi_mask_name <String>
   Comma-separated list of masking data output files.
    * Requires:  mask_data, gi_mask

 *** Output options
 -out <String>
   Name of BLAST database to be created
   Default = input file name provided to -in argumentRequired if multiple
   file(s)/database(s) are provided as input
 -blastdb_version <Integer, 4..5>
   Version of BLAST database to be created
   Default = `5'
 -max_file_sz <String>
   Maximum file size for BLAST database files
   Default = `1GB'
 -logfile <File_Out>
   File to which the program log should be redirected

 *** Taxonomy options
 -taxid <Integer, >=0>
   Taxonomy ID to assign to all sequences
    * Incompatible with:  taxid_map
 -taxid_map <File_In>
   Text file mapping sequence IDs to taxonomy IDs.
   Format:<SequenceId> <TaxonomyId><newline>
    * Requires:  parse_seqids
    * Incompatible with:  taxid


Back to Top