MuTect-Teaching
Category
Bioinformatics
Program On
Teaching
Version
1.1.7
Author / Distributor
Description
MuTect is a method developed at the Broad Institute for the reliable and accurate identification of somatic point mutations in next generation sequencing data of cancer genomes. More information: http://www.broadinstitute.org/cancer/cga/mutect
Running Program
Also refer to Running Jobs on the teaching cluster
- Version 1.1.7, installed in /usr/local/apps/eb/MuTect/1.1.7-Java-1.7.0_80
To use this version of MuTect, please first load the module with
module load MuTect/1.1.7-Java-1.7.0_80
Sample job submission script (sub.sh) to run the module:
#!/bin/bash
#SBATCH --job-name=j_mutect
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=2gb
#SBATCH --time=08:00:00
#SBATCH --output=mutect.%j.out
#SBATCH --error=mutect.%j.err
cd $SLURM_SUBMIT_DIR
ml MuTect/1.1.7-Java-1.7.0_80
java -jar /usr/local/apps/eb/MuTect/1.1.7-Java-1.7.0_80/mutect-1.1.7.jar [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
module load MuTect/1.1.7-Java-1.7.0_80
java -jar /usr/local/apps/eb/MuTect/1.1.7-Java-1.7.0_80/mutect-1.1.7.jar --help
--------------------------------------------------------------------------------
The Genome Analysis Toolkit (GATK) v3.1-0-g72492bb, Compiled 2015/01/21 17:10:56
Copyright (c) 2010 The Broad Institute
For support and documentation go to http://www.broadinstitute.org/gatk
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
usage: java -jar mutect-1.1.7.jar -T <analysis_type> [-args <arg_file>] [-I <input_file>] [-rbs <read_buffer_size>] [-et
<phone_home>] [-K <gatk_key>] [-tag <tag>] [-rf <read_filter>] [-L <intervals>] [-XL <excludeIntervals>] [-isr
<interval_set_rule>] [-im <interval_merging>] [-ip <interval_padding>] [-R <reference_sequence>] [-ndrs] [-maxRuntime
<maxRuntime>] [-maxRuntimeUnits <maxRuntimeUnits>] [-dt <downsampling_type>] [-dfrac <downsample_to_fraction>] [-dcov
<downsample_to_coverage>] [-baq <baq>] [-baqGOP <baqGapOpenPenalty>] [-fixMisencodedQuals]
[-allowPotentiallyMisencodedQuals] [-OQ] [-DBQ <defaultBaseQualities>] [-PF <performanceLog>] [-BQSR <BQSR>] [-DIQ]
[-EOQ] [-preserveQ <preserve_qscores_less_than>] [-globalQScorePrior <globalQScorePrior>] [-S <validation_strictness>]
[-rpr] [-kpr] [-sample_rename_mapping_file <sample_rename_mapping_file>] [-U <unsafe>] [-nt <num_threads>] [-nct
<num_cpu_threads_per_data_thread>] [-mte] [-bfh <num_bam_file_handles>] [-rgbl <read_group_black_list>] [-ped
<pedigree>] [-pedString <pedigreeString>] [-pedValidationType <pedigreeValidationType>] [-variant_index_type
<variant_index_type>] [-variant_index_parameter <variant_index_parameter>] [-l <logging_level>] [-log <log_to_file>]
[-h] [-version]
-T,--analysis_type <analysis_type> Name of the tool to run
-args,--arg_file <arg_file> Reads arguments from the
specified file
-I,--input_file <input_file> Input file containing sequence
data (SAM or BAM)
-rbs,--read_buffer_size <read_buffer_size> Number of reads per SAM file to
buffer in memory
-et,--phone_home <phone_home> Run reporting mode (NO_ET|AWS|
STDOUT)
-K,--gatk_key <gatk_key> GATK key file required to run
with -et NO_ET
-tag,--tag <tag> Tag to identify this GATK run
as part of a group of runs
-rf,--read_filter <read_filter> Filters to apply to reads
before analysis
-L,--intervals <intervals> One or more genomic intervals
over which to operate
-XL,--excludeIntervals <excludeIntervals> One or more genomic intervals
to exclude from processing
-isr,--interval_set_rule <interval_set_rule> Set merging approach to use for
combining interval inputs
(UNION|INTERSECTION)
-im,--interval_merging <interval_merging> Interval merging rule for
abutting intervals (ALL|
OVERLAPPING_ONLY)
-ip,--interval_padding <interval_padding> Amount of padding (in bp) to
add to each interval
-R,--reference_sequence <reference_sequence> Reference sequence file
-ndrs,--nonDeterministicRandomSeed Use a non-deterministic random
seed
-maxRuntime,--maxRuntime <maxRuntime> Stop execution cleanly as soon
as maxRuntime has been reached
-maxRuntimeUnits,--maxRuntimeUnits <maxRuntimeUnits> Unit of time used by maxRuntime
(NANOSECONDS|MICROSECONDS|
MILLISECONDS|SECONDS|MINUTES|
HOURS|DAYS)
-dt,--downsampling_type <downsampling_type> Type of read downsampling to
employ at a given locus (NONE|
ALL_READS|BY_SAMPLE)
-dfrac,--downsample_to_fraction <downsample_to_fraction> Fraction of reads to downsample
to
-dcov,--downsample_to_coverage <downsample_to_coverage> Target coverage threshold for
downsampling to coverage
-baq,--baq <baq> Type of BAQ calculation to
apply in the engine (OFF|
CALCULATE_AS_NECESSARY|
RECALCULATE)
-baqGOP,--baqGapOpenPenalty <baqGapOpenPenalty> BAQ gap open penalty
-fixMisencodedQuals,--fix_misencoded_quality_scores Fix mis-encoded base quality
scores
-allowPotentiallyMisencodedQuals,--allow_potentially_misencoded_quality_scores Ignore warnings about base
quality score encoding
-OQ,--useOriginalQualities Use the base quality scores
from the OQ tag
-DBQ,--defaultBaseQualities <defaultBaseQualities> Assign a default base quality
-PF,--performanceLog <performanceLog> Write GATK runtime performance
log to this file
-BQSR,--BQSR <BQSR> Input covariates table file for
on-the-fly base quality score
recalibration
-DIQ,--disable_indel_quals Disable printing of base
insertion and deletion tags
(with -BQSR)
-EOQ,--emit_original_quals Emit the OQ tag with the
original base qualities (with
-BQSR)
-preserveQ,--preserve_qscores_less_than <preserve_qscores_less_than> Don't recalibrate bases with
quality scores less than this
threshold (with -BQSR)
-globalQScorePrior,--globalQScorePrior <globalQScorePrior> Global Qscore Bayesian prior to
use for BQSR
-S,--validation_strictness <validation_strictness> How strict should we be with
validation (STRICT|LENIENT|
SILENT)
-rpr,--remove_program_records Remove program records from the
SAM header
-kpr,--keep_program_records Keep program records in the SAM
header
-sample_rename_mapping_file,--sample_rename_mapping_file <sample_rename_mapping_file> Rename sample IDs on-the-fly at
runtime using the provided
mapping file
-U,--unsafe <unsafe> Enable unsafe operations:
nothing will be checked at
runtime (ALLOW_N_CIGAR_READS|
ALLOW_UNINDEXED_BAM|
ALLOW_UNSET_BAM_SORT_ORDER|
NO_READ_ORDER_VERIFICATION|
ALLOW_SEQ_DICT_INCOMPATIBILITY|
LENIENT_VCF_PROCESSING|ALL)
-nt,--num_threads <num_threads> Number of data threads to
allocate to this analysis
-nct,--num_cpu_threads_per_data_thread <num_cpu_threads_per_data_thread> Number of CPU threads to
allocate per data thread
-mte,--monitorThreadEfficiency Enable threading efficiency
monitoring
-bfh,--num_bam_file_handles <num_bam_file_handles> Total number of BAM file
handles to keep open
simultaneously
-rgbl,--read_group_black_list <read_group_black_list> Exclude read groups based on
tags
-ped,--pedigree <pedigree> Pedigree files for samples
-pedString,--pedigreeString <pedigreeString> Pedigree string for samples
-pedValidationType,--pedigreeValidationType <pedigreeValidationType> Validation strictness for
pedigree information (STRICT|
SILENT)
-variant_index_type,--variant_index_type <variant_index_type> Type of IndexCreator to use for
VCF/BCF indices (DYNAMIC_SEEK|
DYNAMIC_SIZE|LINEAR|INTERVAL)
-variant_index_parameter,--variant_index_parameter <variant_index_parameter> Parameter to pass to the
VCF/BCF IndexCreator
-l,--logging_level <logging_level> Set the minimum level of
logging
-log,--log_to_file <log_to_file> Set the logging location
-h,--help Generate the help message
-version,--version Output version information
alignment
CheckAlignment Validates consistency of the aligner interface
annotator
VariantAnnotator Annotates variant calls with context information.
beagle
BeagleOutputToVCF Takes files produced by Beagle imputation engine and creates a vcf with modified
annotations.
ProduceBeagleInput Converts the input VCF into a format accepted by the Beagle imputation/analysis
program.
VariantsToBeagleUnphased Produces an input file to Beagle imputation engine, listing unphased, hard-called
genotypes for a single sample in input variant file.
bqsr
AnalyzeCovariates Tool to analyze and evaluate base recalibration ables.
BaseRecalibrator First pass of the base quality score recalibration -- Generates recalibration table
based on various user-specified covariates (such as read group, reported quality score,
machine cycle, and nucleotide context).
coverage
CallableLoci Emits a data file containing information about callable, uncallable, poorly mapped, and
other parts of the genome <p/>
CompareCallableLoci Test routine for new VariantContext object
DepthOfCoverage Assess sequence coverage by a wide array of metrics, partitioned by sample, read group,
or library
GCContentByInterval Walks along reference and calculates the GC content for each interval.
diagnosetargets
DiagnoseTargets Analyzes coverage distribution and validates read mates for a given interval and
sample.
diagnostics
BaseCoverageDistribution Simple walker to plot the coverage distribution per base
CoveredByNSamplesSites Print intervals file with all the variant sites for which most of the samples have good
coverage
ErrorRatePerCycle Compute the read error rate per position
FindCoveredIntervals Outputs a list of intervals that are covered above a given threshold.
ReadGroupProperties Emits a GATKReport containing read group, sample, library, platform, center, sequencing
data, paired end status, simple read type name (e.g.
ReadLengthDistribution Outputs the read lengths of all the reads in a file.
diffengine
DiffObjects A generic engine for comparing tree-structured objects
examples
GATKPaperGenotyper A simple Bayesian genotyper, that outputs a text based call format.
fasta
FastaAlternateReferenceMaker Generates an alternative reference sequence over the specified interval.
FastaReferenceMaker Renders a new reference in FASTA format consisting of only those loci provided in the
input data set.
FastaStats Calculate basic statistics about the reference sequence itself
filters
VariantFiltration Filters variant calls using a number of user-selectable, parameterizable criteria.
genotyper
UnifiedGenotyper A variant caller which unifies the approaches of several disparate callers -- Works for
single-sample and multi-sample data.
haplotypecaller
HaplotypeCaller Call SNPs and indels simultaneously via local de-novo assembly of haplotypes in an
active region.
HaplotypeResolver Haplotype-based resolution of variants in 2 different eval files.
indels
IndelRealigner Performs local realignment of reads to correct misalignments due to the presence of
indels.
LeftAlignIndels Left-aligns indels from reads in a bam file.
RealignerTargetCreator Emits intervals for the Local Indel Realigner to target for realignment.
missing
QualifyMissingIntervals Walks along reference and calculates a few metrics for each interval.
mutect
MuTect
phasing
PhaseByTransmission Computes the most likely genotype combination and phases trios and parent/child pairs
ReadBackedPhasing Walks along all variant ROD loci, caching a user-defined window of VariantContext
sites, and then finishes phasing them when they go out of range (using upstream and
downstream reads).
qc
CheckPileup Compare GATK's internal pileup to a reference Samtools pileup
CountBases Walks over the input data set, calculating the number of bases seen for diagnostic
purposes.
CountIntervals Count contiguous regions in an interval list.
CountLoci Walks over the input data set, calculating the total number of covered loci for
diagnostic purposes.
CountMales Walks over the input data set, calculating the number of reads seen from male samples
for diagnostic purposes.
CountReadEvents Walks over the input data set, counting the number of read events (from the CIGAR
operator)
CountReads Walks over the input data set, calculating the number of reads seen for diagnostic
purposes.
CountRODs Prints out counts of the number of reference ordered data objects encountered.
CountRODsByRef Prints out counts of the number of reference ordered data objects encountered along the
reference.
CountTerminusEvent Walks over the input data set, counting the number of reads ending in
insertions/deletions or soft-clips
ErrorThrowing A walker that simply throws errors.
FlagStat A reimplementation of the 'samtools flagstat' subcommand in the GATK
Pileup Emulates the samtools pileup command to print aligned reads
PrintRODs Prints out all of the RODs in the input data set.
QCRef Quality control for the reference fasta
ReadClippingStats Read clipping statistics for all reads.
readutils
ClipReads Read clipping based on quality, position or sequence matching
PrintReads Renders, in SAM/BAM format, all reads from the input data set in the order in which
they appear in the input file.
ReadAdaptorTrimmer Utility tool to blindly strip base adaptors.
SplitSamFile Divides the input data set into separate BAM files, one for each sample in the input
data set.
rnaseq
SplitNCigarReads Splits reads that contain Ns in their cigar string (e.g.
simulatereads
SimulateReadsForVariants Generates simulated reads for variants
validation
GenotypeAndValidate Genotypes a dataset and validates the calls of another dataset using the Unified
Genotyper.
ValidationAmplicons Creates FASTA sequences for use in Seqenom or PCR utilities for site amplification and
subsequent validation
validationsiteselector
ValidationSiteSelector Randomly selects VCF records according to specified options.
varianteval
VariantEval General-purpose tool for variant evaluation (% in dbSNP, genotype concordance, Ti/Tv
ratios, and a lot more)
variantrecalibration
ApplyRecalibration Applies cuts to the input vcf file (by adding filter lines) to achieve the desired
novel truth sensitivity levels which were specified during VariantRecalibration
VariantRecalibrator Create a Gaussian mixture model by looking at the annotations values over a high
quality subset of the input call set and then evaluate all input variants.
variantutils
CalculateGenotypePosteriors Calculates genotype posterior likelihoods given panel data
CombineGVCFs Combines any number of gVCF files that were produced by the Haplotype Caller into a
single joint gVCF file.
CombineVariants Combines VCF records from different sources.
FilterLiftedVariants Filters a lifted-over VCF file for ref bases that have been changed.
GenotypeConcordance Genotype concordance (per-sample and aggregate counts and frequencies, NRD/NRS and site
allele overlaps) between two callsets
GenotypeGVCFs Genotypes any number of gVCF files that were produced by the Haplotype Caller into a
single joint VCF file.
LeftAlignAndTrimVariants Left-aligns indels from a variants file.
LiftoverVariants Lifts a VCF file over from one build to another.
RandomlySplitVariants Takes a VCF file, randomly splits variants into two different sets, and outputs 2 new
VCFs with the results.
RegenotypeVariants Regenotypes the variants from a VCF.
SelectHeaders Selects headers from a VCF source.
SelectVariants Selects variants from a VCF source.
ValidateVariants Validates a VCF file with an extra strict set of criteria.
VariantsToAllelicPrimitives Takes alleles from a variants file and breaks them up (if possible) into more
basic/primitive alleles.
VariantsToBinaryPed Converts a VCF file to a binary plink Ped file (.bed/.bim/.fam)
VariantsToTable Emits specific fields from a VCF file to a tab-deliminated table
VariantsToVCF Converts variants from other file formats to VCF format.
VariantValidationAssessor Annotates a validation (from Sequenom for example) VCF with QC metrics (HW-equilibrium,
% failed probes)
Installation
System
64-bit Linux