GATK-Teaching: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
Line 9: Line 9:


=== Version ===
=== Version ===
3.4-0
3.8-0
   
   
=== Author / Distributor ===
=== Author / Distributor ===
Line 16: Line 16:
   
   
=== Description ===
=== Description ===
"The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute   to analyse next-generation resequencing data. The toolkit offers a wide variety of tools,  with a primary focus on variant discovery and genotyping as well as strong emphasis on   data quality assurance. Its robust architecture, powerful processing engine and   high-performance computing features make it capable of taking on projects of any size."
"The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyse next-generation resequencing data. The toolkit offers a wide variety of tools,  with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size."
More details are at [http://www.broadinstitute.org/gatk/ GATK]
More details are at [http://www.broadinstitute.org/gatk/ GATK]


=== Running Program ===
=== Running Program ===


The last version of this application is at /usr/local/apps/eb/GATK/3.4-0-Java-1.8.0_144
The last version of this application is at /usr/local/apps/eb/GATK/3.8-0-Java-1.8.0_144


To use this version, please load the module with
To use this version, please load the module with
<pre class="gscript">
<pre class="gscript">
ml GATK/3.4-0-Java-1.8.0_144  
ml GATK/3.8-0-Java-1.8.0_144  
</pre>  
</pre>  


Line 43: Line 43:
   
   
cd $SLURM_SUBMIT_DIR<br>
cd $SLURM_SUBMIT_DIR<br>
ml GATK/3.4-0-Java-1.8.0_144<br>     
ml GATK/3.8-0-Java-1.8.0_144<br>     
java -jar $EBROOTGATK/GenomeAnalysisTK.jar <u>[options]</u><br>   
java -jar $EBROOTGATK/GenomeAnalysisTK.jar <u>[options]</u><br>   
</div>
</div>
Line 59: Line 59:
   
   
<pre  class="gcommand">
<pre  class="gcommand">
ml GATK/3.4-0-Java-1.8.0_144  
ml GATK/3.8-0-Java-1.8.0_144  
java -jar $EBROOTGATK/GenomeAnalysisTK.jar -h
java -jar $EBROOTGATK/GenomeAnalysisTK.jar -h
--------------------------------------------------------------------------------
----------------------------------------------------------------------------------
The Genome Analysis Toolkit (GATK) v3.4-0-g7e26428, Compiled 2015/05/15 03:25:41
The Genome Analysis Toolkit (GATK) v3.8-0-ge9d806836, Compiled 2017/07/28 21:26:50
Copyright (c) 2010 The Broad Institute
Copyright (c) 2010-2016 The Broad Institute
For support and documentation go to http://www.broadinstitute.org/gatk
For support and documentation go to https://software.broadinstitute.org/gatk
--------------------------------------------------------------------------------
[Wed Aug 15 15:22:45 EDT 2018] Executing on Linux 3.10.0-862.9.1.el7.x86_64 amd64
--------------------------------------------------------------------------------
Java HotSpot(TM) 64-Bit Server VM 1.8.0_144-b01
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
usage: java -jar GenomeAnalysisTK.jar -T <analysis_type> [-args <arg_file>] [-I <input_file>] [--showFullBamList] [-rbs  
usage: java -jar GenomeAnalysisTK.jar -T <analysis_type> [-args <arg_file>] [-I <input_file>] [--showFullBamList] [-rbs  
       <read_buffer_size>] [-et <phone_home>] [-K <gatk_key>] [-tag <tag>] [-rf <read_filter>] [-drf <disable_read_filter>] [-L  
       <read_buffer_size>] [-rf <read_filter>] [-drf <disable_read_filter>] [-L <intervals>] [-XL <excludeIntervals>] [-isr  
      <intervals>] [-XL <excludeIntervals>] [-isr <interval_set_rule>] [-im <interval_merging>] [-ip <interval_padding>] [-R  
      <interval_set_rule>] [-im <interval_merging>] [-ip <interval_padding>] [-R <reference_sequence>] [-ndrs] [-maxRuntime  
      <reference_sequence>] [-ndrs] [-maxRuntime <maxRuntime>] [-maxRuntimeUnits <maxRuntimeUnits>] [-dt <downsampling_type>]  
      <maxRuntime>] [-maxRuntimeUnits <maxRuntimeUnits>] [-dt <downsampling_type>] [-dfrac <downsample_to_fraction>] [-dcov  
      [-dfrac <downsample_to_fraction>] [-dcov <downsample_to_coverage>] [-baq <baq>] [-baqGOP <baqGapOpenPenalty>] [-fixNDN]  
      <downsample_to_coverage>] [-baq <baq>] [-baqGOP <baqGapOpenPenalty>] [-fixNDN] [-fixMisencodedQuals]  
      [-fixMisencodedQuals] [-allowPotentiallyMisencodedQuals] [-OQ] [-DBQ <defaultBaseQualities>] [-PF <performanceLog>]  
      [-allowPotentiallyMisencodedQuals] [-OQ] [-DBQ <defaultBaseQualities>] [-PF <performanceLog>] [-BQSR <BQSR>] [-qq  
      [-BQSR <BQSR>] [-qq <quantize_quals>] [-DIQ] [-EOQ] [-preserveQ <preserve_qscores_less_than>] [-globalQScorePrior  
      <quantize_quals>] [-SQQ <static_quantized_quals>] [-DIQ] [-EOQ] [-preserveQ <preserve_qscores_less_than>]  
      <globalQScorePrior>] [-S <validation_strictness>] [-rpr] [-kpr] [-sample_rename_mapping_file  
      [-globalQScorePrior <globalQScorePrior>] [-secondsBetweenProgressUpdates <secondsBetweenProgressUpdates>] [-S  
      <sample_rename_mapping_file>] [-U <unsafe>] [-disable_auto_index_creation_and_locking_when_reading_rods] [-sites_only]  
      <validation_strictness>] [-rpr] [-kpr] [-sample_rename_mapping_file <sample_rename_mapping_file>] [-U <unsafe>]
       [-writeFullFormat] [-compress <bam_compression>] [-simplifyBAM] [--disable_bam_indexing] [--generate_md5] [-nt  
      [-jdk_deflater] [-jdk_inflater] [-disable_auto_index_creation_and_locking_when_reading_rods] [-no_cmdline_in_header]  
      <num_threads>] [-nct <num_cpu_threads_per_data_thread>] [-mte] [-bfh <num_bam_file_handles>] [-rgbl  
       [-sites_only] [-writeFullFormat] [-compress <bam_compression>] [-simplifyBAM] [--disable_bam_indexing] [--generate_md5]  
      <read_group_black_list>] [-ped <pedigree>] [-pedString <pedigreeString>] [-pedValidationType <pedigreeValidationType>]  
      [-nt <num_threads>] [-nct <num_cpu_threads_per_data_thread>] [-mte] [-rgbl <read_group_black_list>] [-ped <pedigree>]  
      [-variant_index_type <variant_index_type>] [-variant_index_parameter <variant_index_parameter>] [-l <logging_level>]  
      [-pedString <pedigreeString>] [-pedValidationType <pedigreeValidationType>] [-variant_index_type <variant_index_type>]  
      [-log <log_to_file>] [-h] [-version]
      [-variant_index_parameter <variant_index_parameter>] [-ref_win_stop <reference_window_stop>] [-l <logging_level>] [-log  
      <log_to_file>] [-h] [-version]


  -T,--analysis_type <analysis_type>                                                      Name of the tool to run
  -T,--analysis_type <analysis_type>                                                      Name of the tool to run
Line 86: Line 89:
                                                                                           specified file
                                                                                           specified file
  -I,--input_file <input_file>                                                            Input file containing sequence  
  -I,--input_file <input_file>                                                            Input file containing sequence  
                                                                                           data (SAM or BAM)
                                                                                           data (BAM or CRAM)
  --showFullBamList                                                                        Emit a log entry (level INFO)
  --showFullBamList                                                                        Emit list of input BAM/CRAM
                                                                                          containing the full list of  
                                                                                           files to log
                                                                                           sequence data files to be
                                                                                          included in the analysis
                                                                                          (including files inside
                                                                                          .bam.list files).
  -rbs,--read_buffer_size <read_buffer_size>                                              Number of reads per SAM file  
  -rbs,--read_buffer_size <read_buffer_size>                                              Number of reads per SAM file  
                                                                                           to buffer in memory
                                                                                           to buffer in memory
-et,--phone_home <phone_home>                                                            Run reporting mode (NO_ET|AWS|
                                                                                          STDOUT)
-K,--gatk_key <gatk_key>                                                                GATK key file required to run
                                                                                          with -et NO_ET
-tag,--tag <tag>                                                                        Tag to identify this GATK run
                                                                                          as part of a group of runs
  -rf,--read_filter <read_filter>                                                          Filters to apply to reads  
  -rf,--read_filter <read_filter>                                                          Filters to apply to reads  
                                                                                           before analysis
                                                                                           before analysis
Line 154: Line 147:
                                                                                           given number of levels (with  
                                                                                           given number of levels (with  
                                                                                           -BQSR)
                                                                                           -BQSR)
-SQQ,--static_quantized_quals <static_quantized_quals>                                  Use static quantized quality
                                                                                          scores to a given number of
                                                                                          levels (with -BQSR)
  -DIQ,--disable_indel_quals                                                              Disable printing of base  
  -DIQ,--disable_indel_quals                                                              Disable printing of base  
                                                                                           insertion and deletion tags  
                                                                                           insertion and deletion tags  
Line 165: Line 161:
  -globalQScorePrior,--globalQScorePrior <globalQScorePrior>                              Global Qscore Bayesian prior  
  -globalQScorePrior,--globalQScorePrior <globalQScorePrior>                              Global Qscore Bayesian prior  
                                                                                           to use for BQSR
                                                                                           to use for BQSR
-secondsBetweenProgressUpdates,--secondsBetweenProgressUpdates                          Time interval for process
<secondsBetweenProgressUpdates>                                                          meter information output (in
                                                                                          seconds)
  -S,--validation_strictness <validation_strictness>                                      How strict should we be with  
  -S,--validation_strictness <validation_strictness>                                      How strict should we be with  
                                                                                           validation (STRICT|LENIENT|
                                                                                           validation (STRICT|LENIENT|
Line 183: Line 182:
                                                                                           ALLOW_SEQ_DICT_INCOMPATIBILITY|
                                                                                           ALLOW_SEQ_DICT_INCOMPATIBILITY|
                                                                                           LENIENT_VCF_PROCESSING|ALL)
                                                                                           LENIENT_VCF_PROCESSING|ALL)
-jdk_deflater,--use_jdk_deflater                                                        Use the JDK Deflater instead
                                                                                          of the IntelDeflater for
                                                                                          writing BAMs
-jdk_inflater,--use_jdk_inflater                                                        Use the JDK Inflater instead
                                                                                          of the IntelInflater for
                                                                                          reading BAMs
d_locking_when_reading_rods,--disable_auto_index_creation_and_locking_when_reading_rods  Disable both auto-generation  
d_locking_when_reading_rods,--disable_auto_index_creation_and_locking_when_reading_rods  Disable both auto-generation  
                                                                                           of index files and index file  
                                                                                           of index files and index file  
                                                                                           locking
                                                                                           locking
  -sites_only,--sites_only                                                                Just output sites without
-no_cmdline_in_header,--no_cmdline_in_header                                            Don't include the command line
                                                                                          genotypes (i.e. only the first
                                                                                          in output VCF headers
                                                                                          8 columns of the VCF)
  -sites_only,--sites_only                                                                Output sites-only VCF
  -writeFullFormat,--never_trim_vcf_format_field                                          Always output all the records  
  -writeFullFormat,--never_trim_vcf_format_field                                          Always output all the records  
                                                                                           in VCF FORMAT fields, even if  
                                                                                           in VCF FORMAT fields, even if  
Line 195: Line 200:
                                                                                           writing BAM files (0 - 9,  
                                                                                           writing BAM files (0 - 9,  
                                                                                           higher is more compressed)
                                                                                           higher is more compressed)
  -simplifyBAM,--simplifyBAM                                                              If provided, output BAM files
  -simplifyBAM,--simplifyBAM                                                              Strip down read content and
                                                                                          will be simplified to include
                                                                                           tags
                                                                                          just key reads for downstream
                                                                                          variation discovery analyses
                                                                                          (removing duplicates, PF-,
                                                                                          non-primary reads), as well
                                                                                           stripping all extended tags  
                                                                                          from the kept reads except the
                                                                                          read group identifier
  --disable_bam_indexing                                                                  Turn off on-the-fly creation  
  --disable_bam_indexing                                                                  Turn off on-the-fly creation  
                                                                                           of indices for output BAM  
                                                                                           of indices for output BAM/CRAM
                                                                                           files.
                                                                                           files
  --generate_md5                                                                          Enable on-the-fly creation of  
  --generate_md5                                                                          Enable on-the-fly creation of  
                                                                                           md5s for output BAM files.
                                                                                           md5s for output BAM files.
Line 215: Line 213:
  -mte,--monitorThreadEfficiency                                                          Enable threading efficiency  
  -mte,--monitorThreadEfficiency                                                          Enable threading efficiency  
                                                                                           monitoring
                                                                                           monitoring
-bfh,--num_bam_file_handles <num_bam_file_handles>                                      Total number of BAM file
                                                                                          handles to keep open
                                                                                          simultaneously
  -rgbl,--read_group_black_list <read_group_black_list>                                    Exclude read groups based on  
  -rgbl,--read_group_black_list <read_group_black_list>                                    Exclude read groups based on  
                                                                                           tags
                                                                                           tags
Line 223: Line 218:
  -pedString,--pedigreeString <pedigreeString>                                            Pedigree string for samples
  -pedString,--pedigreeString <pedigreeString>                                            Pedigree string for samples
  -pedValidationType,--pedigreeValidationType <pedigreeValidationType>                    Validation strictness for  
  -pedValidationType,--pedigreeValidationType <pedigreeValidationType>                    Validation strictness for  
                                                                                           pedigree information (STRICT|
                                                                                           pedigree (STRICT|SILENT)
                                                                                          SILENT)
  -variant_index_type,--variant_index_type <variant_index_type>                            Type of IndexCreator to use  
  -variant_index_type,--variant_index_type <variant_index_type>                            Type of IndexCreator to use  
                                                                                           for VCF/BCF indices  
                                                                                           for VCF/BCF indices  
Line 231: Line 225:
  -variant_index_parameter,--variant_index_parameter <variant_index_parameter>            Parameter to pass to the  
  -variant_index_parameter,--variant_index_parameter <variant_index_parameter>            Parameter to pass to the  
                                                                                           VCF/BCF IndexCreator
                                                                                           VCF/BCF IndexCreator
-ref_win_stop,--reference_window_stop <reference_window_stop>                            Reference window stop
  -l,--logging_level <logging_level>                                                      Set the minimum level of  
  -l,--logging_level <logging_level>                                                      Set the minimum level of  
                                                                                           logging
                                                                                           logging
Line 240: Line 235:
   VariantAnnotator              Annotate variant calls with context information
   VariantAnnotator              Annotate variant calls with context information
                                  
                                  
  beagle                         
  bqsr                           
   BeagleOutputToVCF             Takes files produced by Beagle imputation engine and creates a vcf with modified
   AnalyzeCovariates             Create plots to visualize base recalibration results
                                annotations.
   BaseRecalibrator              Detect systematic errors in base quality scores
   ProduceBeagleInput            Converts the input VCF into a format accepted by the Beagle imputation/analysis
                                  
                                 program.
cancer                         
   VariantsToBeagleUnphased      Produces an input file to Beagle imputation engine, listing unphased, hard-called
   AssignSomaticStatus          Assigns somatic status to a set of calls
                                genotypes for a single sample in input variant file.
                                  
                                  
  bqsr                           
  contamination                 
   AnalyzeCovariates            Create plots to visualize base recalibration results  <p/> This tool generates plots
   AnnotatePopulationAFWalker    Given a input VCF representing a collection of populations, split the input into each
                                 for visualizing the quality of a recalibration run.
                                 population, and annotate each record with population allele frequencies
   BaseRecalibrator              Generate base recalibration table to compensate for systematic errors
   ContEst                      Estimate cross-sample contamination
                                  
                                  
  coverage                         
  coverage                         
Line 265: Line 259:
                                  
                                  
  diagnostics                     
  diagnostics                     
  BaseCoverageDistribution      Evaluate coverage distribution per base
  CoveredByNSamplesSites        Report well-covered intervals
   ErrorRatePerCycle            Compute the read error rate per position
   ErrorRatePerCycle            Compute the read error rate per position
   FindCoveredIntervals          Outputs a list of intervals that are covered above a given threshold
   FindCoveredIntervals          Outputs a list of intervals that are covered to or above a given threshold
   ReadGroupProperties          Collect statistics about read groups and their properties
   ReadGroupProperties          Collect statistics about read groups and their properties
   ReadLengthDistribution        Collect read length statistics
   ReadLengthDistribution        Collect read length statistics
                               
diffengine                     
  DiffObjects                  A generic engine for comparing tree-structured objects
                                  
                                  
  examples                         
  examples                         
   GATKPaperGenotyper            A simple Bayesian genotyper, that outputs a text based call format.
   GATKPaperGenotyper            Simple Bayesian genotyper used in the original GATK paper
                                  
                                  
  fasta                           
  fasta                           
Line 287: Line 282:
                                  
                                  
  haplotypecaller                 
  haplotypecaller                 
   HaplotypeCaller              Call SNPs and indels simultaneously via local re-assembly of haplotypes in an active
   HaplotypeCaller              Call germline SNPs and indels via local re-assembly of haplotypes
                                region
   HaplotypeResolver            Haplotype-based resolution of variants in separate callsets.
   HaplotypeResolver            Haplotype-based resolution of variants in separate callsets.
                                  
                                  
Line 295: Line 289:
   LeftAlignIndels              Left-align indels within reads in a bam file
   LeftAlignIndels              Left-align indels within reads in a bam file
   RealignerTargetCreator        Define intervals to target for local realignment
   RealignerTargetCreator        Define intervals to target for local realignment
                               
m2                             
  MuTect2                      Call somatic SNPs and indels via local re-assembly of haplotypes
                                  
                                  
  missing                         
  missing                         
Line 325: Line 322:
   ClipReads                    Read clipping based on quality, position or sequence matching
   ClipReads                    Read clipping based on quality, position or sequence matching
   PrintReads                    Write out sequence read data (for filtering, merging, subsetting etc)
   PrintReads                    Write out sequence read data (for filtering, merging, subsetting etc)
  ReadAdaptorTrimmer            Utility tool to blindly strip base adaptors
   SplitSamFile                  Split a BAM file by sample
   SplitSamFile                  Split a BAM file by sample
                                  
                                  
Line 334: Line 330:
  simulatereads                   
  simulatereads                   
   SimulateReadsForVariants      Generate simulated reads for variants
   SimulateReadsForVariants      Generate simulated reads for variants
                               
validation                     
  GenotypeAndValidate          Genotype and validate a dataset and the calls of another dataset using the Unified
                                Genotyper
                                  
                                  
  validationsiteselector           
  validationsiteselector           
Line 354: Line 346:
   CombineGVCFs                  Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file
   CombineGVCFs                  Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file
   CombineVariants              Combine variant records from different sources
   CombineVariants              Combine variant records from different sources
  FilterLiftedVariants          Filters a lifted-over VCF file for reference bases that have been changed
   GenotypeConcordance          Genotype concordance between two callsets
   GenotypeConcordance          Genotype concordance between two callsets
   GenotypeGVCFs                Perform joint genotyping on gVCF files produced by HaplotypeCaller
   GenotypeGVCFs                Perform joint genotyping on gVCF files produced by HaplotypeCaller
   LeftAlignAndTrimVariants      Left-align indels in a variant callset
   LeftAlignAndTrimVariants      Left-align indels in a variant callset
  LiftoverVariants              Lifts a VCF file over from one build to another
   RandomlySplitVariants        Randomly split variants into different sets
   RandomlySplitVariants        Randomly split variants into different sets
   RegenotypeVariants            Regenotypes the variants from a VCF containing PLs or GLs.
   RegenotypeVariants            Regenotypes the variants from a VCF containing PLs or GLs.
Line 368: Line 358:
   VariantsToTable              Extract specific fields from a VCF file to a tab-delimited table
   VariantsToTable              Extract specific fields from a VCF file to a tab-delimited table
   VariantsToVCF                Convert variants from other file formats to VCF format
   VariantsToVCF                Convert variants from other file formats to VCF format
  VariantValidationAssessor    Annotate a validation VCF with QC metrics
                                  
                                  



Latest revision as of 14:22, 15 August 2018

Category

Bioinformatics

Program On

Teaching

Version

3.8-0

Author / Distributor

GATK

Description

"The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyse next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size." More details are at GATK

Running Program

The last version of this application is at /usr/local/apps/eb/GATK/3.8-0-Java-1.8.0_144

To use this version, please load the module with

ml GATK/3.8-0-Java-1.8.0_144 

Here is an example of a shell script, sub.sh, to run on the batch queue:

#!/bin/bash
#SBATCH --job-name=j_GATK
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=GATK.%j.out
#SBATCH --error=GATK.%j.err

cd $SLURM_SUBMIT_DIR
ml GATK/3.8-0-Java-1.8.0_144
java -jar $EBROOTGATK/GenomeAnalysisTK.jar [options]

In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.

Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.


Here is an example of job submission command:

sbatch ./sub.sh 

Documentation

ml GATK/3.8-0-Java-1.8.0_144 
java -jar $EBROOTGATK/GenomeAnalysisTK.jar -h
----------------------------------------------------------------------------------
The Genome Analysis Toolkit (GATK) v3.8-0-ge9d806836, Compiled 2017/07/28 21:26:50
Copyright (c) 2010-2016 The Broad Institute
For support and documentation go to https://software.broadinstitute.org/gatk
[Wed Aug 15 15:22:45 EDT 2018] Executing on Linux 3.10.0-862.9.1.el7.x86_64 amd64
Java HotSpot(TM) 64-Bit Server VM 1.8.0_144-b01
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
usage: java -jar GenomeAnalysisTK.jar -T <analysis_type> [-args <arg_file>] [-I <input_file>] [--showFullBamList] [-rbs 
       <read_buffer_size>] [-rf <read_filter>] [-drf <disable_read_filter>] [-L <intervals>] [-XL <excludeIntervals>] [-isr 
       <interval_set_rule>] [-im <interval_merging>] [-ip <interval_padding>] [-R <reference_sequence>] [-ndrs] [-maxRuntime 
       <maxRuntime>] [-maxRuntimeUnits <maxRuntimeUnits>] [-dt <downsampling_type>] [-dfrac <downsample_to_fraction>] [-dcov 
       <downsample_to_coverage>] [-baq <baq>] [-baqGOP <baqGapOpenPenalty>] [-fixNDN] [-fixMisencodedQuals] 
       [-allowPotentiallyMisencodedQuals] [-OQ] [-DBQ <defaultBaseQualities>] [-PF <performanceLog>] [-BQSR <BQSR>] [-qq 
       <quantize_quals>] [-SQQ <static_quantized_quals>] [-DIQ] [-EOQ] [-preserveQ <preserve_qscores_less_than>] 
       [-globalQScorePrior <globalQScorePrior>] [-secondsBetweenProgressUpdates <secondsBetweenProgressUpdates>] [-S 
       <validation_strictness>] [-rpr] [-kpr] [-sample_rename_mapping_file <sample_rename_mapping_file>] [-U <unsafe>] 
       [-jdk_deflater] [-jdk_inflater] [-disable_auto_index_creation_and_locking_when_reading_rods] [-no_cmdline_in_header] 
       [-sites_only] [-writeFullFormat] [-compress <bam_compression>] [-simplifyBAM] [--disable_bam_indexing] [--generate_md5] 
       [-nt <num_threads>] [-nct <num_cpu_threads_per_data_thread>] [-mte] [-rgbl <read_group_black_list>] [-ped <pedigree>] 
       [-pedString <pedigreeString>] [-pedValidationType <pedigreeValidationType>] [-variant_index_type <variant_index_type>] 
       [-variant_index_parameter <variant_index_parameter>] [-ref_win_stop <reference_window_stop>] [-l <logging_level>] [-log 
       <log_to_file>] [-h] [-version]

 -T,--analysis_type <analysis_type>                                                       Name of the tool to run
 -args,--arg_file <arg_file>                                                              Reads arguments from the 
                                                                                          specified file
 -I,--input_file <input_file>                                                             Input file containing sequence 
                                                                                          data (BAM or CRAM)
 --showFullBamList                                                                        Emit list of input BAM/CRAM 
                                                                                          files to log
 -rbs,--read_buffer_size <read_buffer_size>                                               Number of reads per SAM file 
                                                                                          to buffer in memory
 -rf,--read_filter <read_filter>                                                          Filters to apply to reads 
                                                                                          before analysis
 -drf,--disable_read_filter <disable_read_filter>                                         Read filters to disable
 -L,--intervals <intervals>                                                               One or more genomic intervals 
                                                                                          over which to operate
 -XL,--excludeIntervals <excludeIntervals>                                                One or more genomic intervals 
                                                                                          to exclude from processing
 -isr,--interval_set_rule <interval_set_rule>                                             Set merging approach to use 
                                                                                          for combining interval inputs 
                                                                                          (UNION|INTERSECTION)
 -im,--interval_merging <interval_merging>                                                Interval merging rule for 
                                                                                          abutting intervals (ALL|
                                                                                          OVERLAPPING_ONLY)
 -ip,--interval_padding <interval_padding>                                                Amount of padding (in bp) to 
                                                                                          add to each interval
 -R,--reference_sequence <reference_sequence>                                             Reference sequence file
 -ndrs,--nonDeterministicRandomSeed                                                       Use a non-deterministic random 
                                                                                          seed
 -maxRuntime,--maxRuntime <maxRuntime>                                                    Stop execution cleanly as soon 
                                                                                          as maxRuntime has been reached
 -maxRuntimeUnits,--maxRuntimeUnits <maxRuntimeUnits>                                     Unit of time used by 
                                                                                          maxRuntime (NANOSECONDS|
                                                                                          MICROSECONDS|MILLISECONDS|
                                                                                          SECONDS|MINUTES|HOURS|DAYS)
 -dt,--downsampling_type <downsampling_type>                                              Type of read downsampling to 
                                                                                          employ at a given locus (NONE|
                                                                                          ALL_READS|BY_SAMPLE)
 -dfrac,--downsample_to_fraction <downsample_to_fraction>                                 Fraction of reads to 
                                                                                          downsample to
 -dcov,--downsample_to_coverage <downsample_to_coverage>                                  Target coverage threshold for 
                                                                                          downsampling to coverage
 -baq,--baq <baq>                                                                         Type of BAQ calculation to 
                                                                                          apply in the engine (OFF|
                                                                                          CALCULATE_AS_NECESSARY|
                                                                                          RECALCULATE)
 -baqGOP,--baqGapOpenPenalty <baqGapOpenPenalty>                                          BAQ gap open penalty
 -fixNDN,--refactor_NDN_cigar_string                                                      Reduce NDN elements in CIGAR 
                                                                                          string
 -fixMisencodedQuals,--fix_misencoded_quality_scores                                      Fix mis-encoded base quality 
                                                                                          scores
 -allowPotentiallyMisencodedQuals,--allow_potentially_misencoded_quality_scores           Ignore warnings about base 
                                                                                          quality score encoding
 -OQ,--useOriginalQualities                                                               Use the base quality scores 
                                                                                          from the OQ tag
 -DBQ,--defaultBaseQualities <defaultBaseQualities>                                       Assign a default base quality
 -PF,--performanceLog <performanceLog>                                                    Write GATK runtime performance 
                                                                                          log to this file
 -BQSR,--BQSR <BQSR>                                                                      Input covariates table file 
                                                                                          for on-the-fly base quality 
                                                                                          score recalibration
 -qq,--quantize_quals <quantize_quals>                                                    Quantize quality scores to a 
                                                                                          given number of levels (with 
                                                                                          -BQSR)
 -SQQ,--static_quantized_quals <static_quantized_quals>                                   Use static quantized quality 
                                                                                          scores to a given number of 
                                                                                          levels (with -BQSR)
 -DIQ,--disable_indel_quals                                                               Disable printing of base 
                                                                                          insertion and deletion tags 
                                                                                          (with -BQSR)
 -EOQ,--emit_original_quals                                                               Emit the OQ tag with the 
                                                                                          original base qualities (with 
                                                                                          -BQSR)
 -preserveQ,--preserve_qscores_less_than <preserve_qscores_less_than>                     Don't recalibrate bases with 
                                                                                          quality scores less than this 
                                                                                          threshold (with -BQSR)
 -globalQScorePrior,--globalQScorePrior <globalQScorePrior>                               Global Qscore Bayesian prior 
                                                                                          to use for BQSR
 -secondsBetweenProgressUpdates,--secondsBetweenProgressUpdates                           Time interval for process 
<secondsBetweenProgressUpdates>                                                           meter information output (in 
                                                                                          seconds)
 -S,--validation_strictness <validation_strictness>                                       How strict should we be with 
                                                                                          validation (STRICT|LENIENT|
                                                                                          SILENT)
 -rpr,--remove_program_records                                                            Remove program records from 
                                                                                          the SAM header
 -kpr,--keep_program_records                                                              Keep program records in the 
                                                                                          SAM header
 -sample_rename_mapping_file,--sample_rename_mapping_file <sample_rename_mapping_file>    Rename sample IDs on-the-fly 
                                                                                          at runtime using the provided 
                                                                                          mapping file
 -U,--unsafe <unsafe>                                                                     Enable unsafe operations: 
                                                                                          nothing will be checked at 
                                                                                          runtime (ALLOW_N_CIGAR_READS|
                                                                                          ALLOW_UNINDEXED_BAM|
                                                                                          ALLOW_UNSET_BAM_SORT_ORDER|
                                                                                          NO_READ_ORDER_VERIFICATION|
                                                                                          ALLOW_SEQ_DICT_INCOMPATIBILITY|
                                                                                          LENIENT_VCF_PROCESSING|ALL)
 -jdk_deflater,--use_jdk_deflater                                                         Use the JDK Deflater instead 
                                                                                          of the IntelDeflater for 
                                                                                          writing BAMs
 -jdk_inflater,--use_jdk_inflater                                                         Use the JDK Inflater instead 
                                                                                          of the IntelInflater for 
                                                                                          reading BAMs
d_locking_when_reading_rods,--disable_auto_index_creation_and_locking_when_reading_rods   Disable both auto-generation 
                                                                                          of index files and index file 
                                                                                          locking
 -no_cmdline_in_header,--no_cmdline_in_header                                             Don't include the command line 
                                                                                          in output VCF headers
 -sites_only,--sites_only                                                                 Output sites-only VCF
 -writeFullFormat,--never_trim_vcf_format_field                                           Always output all the records 
                                                                                          in VCF FORMAT fields, even if 
                                                                                          some are missing
 -compress,--bam_compression <bam_compression>                                            Compression level to use for 
                                                                                          writing BAM files (0 - 9, 
                                                                                          higher is more compressed)
 -simplifyBAM,--simplifyBAM                                                               Strip down read content and 
                                                                                          tags
 --disable_bam_indexing                                                                   Turn off on-the-fly creation 
                                                                                          of indices for output BAM/CRAM 
                                                                                          files
 --generate_md5                                                                           Enable on-the-fly creation of 
                                                                                          md5s for output BAM files.
 -nt,--num_threads <num_threads>                                                          Number of data threads to 
                                                                                          allocate to this analysis
 -nct,--num_cpu_threads_per_data_thread <num_cpu_threads_per_data_thread>                 Number of CPU threads to 
                                                                                          allocate per data thread
 -mte,--monitorThreadEfficiency                                                           Enable threading efficiency 
                                                                                          monitoring
 -rgbl,--read_group_black_list <read_group_black_list>                                    Exclude read groups based on 
                                                                                          tags
 -ped,--pedigree <pedigree>                                                               Pedigree files for samples
 -pedString,--pedigreeString <pedigreeString>                                             Pedigree string for samples
 -pedValidationType,--pedigreeValidationType <pedigreeValidationType>                     Validation strictness for 
                                                                                          pedigree (STRICT|SILENT)
 -variant_index_type,--variant_index_type <variant_index_type>                            Type of IndexCreator to use 
                                                                                          for VCF/BCF indices 
                                                                                          (DYNAMIC_SEEK|DYNAMIC_SIZE|
                                                                                          LINEAR|INTERVAL)
 -variant_index_parameter,--variant_index_parameter <variant_index_parameter>             Parameter to pass to the 
                                                                                          VCF/BCF IndexCreator
 -ref_win_stop,--reference_window_stop <reference_window_stop>                            Reference window stop
 -l,--logging_level <logging_level>                                                       Set the minimum level of 
                                                                                          logging
 -log,--log_to_file <log_to_file>                                                         Set the logging location
 -h,--help                                                                                Generate the help message
 -version,--version                                                                       Output version information

 annotator                       
   VariantAnnotator              Annotate variant calls with context information
                                 
 bqsr                            
   AnalyzeCovariates             Create plots to visualize base recalibration results
   BaseRecalibrator              Detect systematic errors in base quality scores
                                 
 cancer                          
   AssignSomaticStatus           Assigns somatic status to a set of calls
                                 
 contamination                   
   AnnotatePopulationAFWalker    Given a input VCF representing a collection of populations, split the input into each 
                                 population, and annotate each record with population allele frequencies
   ContEst                       Estimate cross-sample contamination
                                 
 coverage                        
   CallableLoci                  Collect statistics on callable, uncallable, poorly mapped, and other parts of the 
                                 genome
   CompareCallableLoci           Compare callability statistics
   DepthOfCoverage               Assess sequence coverage by a wide array of metrics, partitioned by sample, read group, 
                                 or library
   GCContentByInterval           Calculates the GC content of the reference sequence for each interval
                                 
 diagnosetargets                 
   DiagnoseTargets               Analyze coverage distribution and validate read mates per interval and per sample
                                 
 diagnostics                     
   ErrorRatePerCycle             Compute the read error rate per position
   FindCoveredIntervals          Outputs a list of intervals that are covered to or above a given threshold
   ReadGroupProperties           Collect statistics about read groups and their properties
   ReadLengthDistribution        Collect read length statistics
                                 
 diffengine                      
   DiffObjects                   A generic engine for comparing tree-structured objects
                                 
 examples                        
   GATKPaperGenotyper            Simple Bayesian genotyper used in the original GATK paper
                                 
 fasta                           
   FastaAlternateReferenceMaker  Generate an alternative reference sequence over the specified interval
   FastaReferenceMaker           Create a subset of a FASTA reference sequence
   FastaStats                    Calculate basic statistics about the reference sequence itself
                                 
 filters                         
   VariantFiltration             Filter variant calls based on INFO and FORMAT annotations
                                 
 genotyper                       
   UnifiedGenotyper              Call SNPs and indels on a per-locus basis
                                 
 haplotypecaller                 
   HaplotypeCaller               Call germline SNPs and indels via local re-assembly of haplotypes
   HaplotypeResolver             Haplotype-based resolution of variants in separate callsets.
                                 
 indels                          
   IndelRealigner                Perform local realignment of reads around indels
   LeftAlignIndels               Left-align indels within reads in a bam file
   RealignerTargetCreator        Define intervals to target for local realignment
                                 
 m2                              
   MuTect2                       Call somatic SNPs and indels via local re-assembly of haplotypes
                                 
 missing                         
   QualifyMissingIntervals       Collect quality metrics for a set of intervals
                                 
 phasing                         
   PhaseByTransmission           Compute the most likely genotype combination and phasing for trios and parent/child 
                                 pairs
   ReadBackedPhasing             Annotate physical phasing information
                                 
 qc                              
   CheckPileup                   Compare GATK's internal pileup to a reference Samtools pileup
   CountBases                    Count the number of bases in a set of reads
   CountIntervals                Count contiguous regions in an interval list
   CountLoci                     Count the total number of covered loci
   CountMales                    Count the number of reads seen from male samples
   CountReadEvents               Count the number of read events
   CountReads                    Count the number of reads
   CountRODs                     Count the number of ROD objects encountered
   CountRODsByRef                Count the number of ROD objects encountered along the reference
   CountTerminusEvent            Count the number of reads ending in insertions, deletions or soft-clips
   ErrorThrowing                 A walker that simply throws errors.
   FlagStat                      Collect statistics about sequence reads based on their SAM flags
   Pileup                        Print read alignments in Pileup-style format
   PrintRODs                     Print out all of the RODs in the input data set
   QCRef                         Quality control for the reference fasta
   ReadClippingStats             Collect read clipping statistics
                                 
 readutils                       
   ClipReads                     Read clipping based on quality, position or sequence matching
   PrintReads                    Write out sequence read data (for filtering, merging, subsetting etc)
   SplitSamFile                  Split a BAM file by sample
                                 
 rnaseq                          
   ASEReadCounter                Calculate read counts per allele for allele-specific expression analysis
   SplitNCigarReads              Splits reads that contain Ns in their CIGAR string
                                 
 simulatereads                   
   SimulateReadsForVariants      Generate simulated reads for variants
                                 
 validationsiteselector          
   ValidationSiteSelector        Randomly select variant records according to specified options
                                 
 varianteval                     
   VariantEval                   General-purpose tool for variant evaluation (% in dbSNP, genotype concordance, Ti/Tv 
                                 ratios, and a lot more)
                                 
 variantrecalibration            
   ApplyRecalibration            Apply a score cutoff to filter variants based on a recalibration table
   VariantRecalibrator           Build a recalibration model to score variant quality for filtering purposes
                                 
 variantutils                    
   CalculateGenotypePosteriors   Calculate genotype posterior likelihoods given panel data
   CombineGVCFs                  Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file
   CombineVariants               Combine variant records from different sources
   GenotypeConcordance           Genotype concordance between two callsets
   GenotypeGVCFs                 Perform joint genotyping on gVCF files produced by HaplotypeCaller
   LeftAlignAndTrimVariants      Left-align indels in a variant callset
   RandomlySplitVariants         Randomly split variants into different sets
   RegenotypeVariants            Regenotypes the variants from a VCF containing PLs or GLs.
   SelectHeaders                 Selects headers from a VCF source
   SelectVariants                Select a subset of variants from a larger callset
   ValidateVariants              Validate a VCF file with an extra strict set of criteria
   VariantsToAllelicPrimitives   Simplify multi-nucleotide variants (MNPs) into more basic/primitive alleles.
   VariantsToBinaryPed           Convert VCF to binary pedigree file
   VariantsToTable               Extract specific fields from a VCF file to a tab-delimited table
   VariantsToVCF                 Convert variants from other file formats to VCF format
                                 


Back to Top

Installation

Source code is obtained from GATK

System

64-bit Linux