MuTect-Teaching
Category
Bioinformatics
Program On
Teaching
Version
1.1.7
Author / Distributor
Description
MuTect is a method developed at the Broad Institute for the reliable and accurate identification of somatic point mutations in next generation sequencing data of cancer genomes. More information: http://www.broadinstitute.org/cancer/cga/mutect
Running Program
Also refer to Running Jobs on the teaching cluster
- Version 1.1.7, installed in /usr/local/apps/eb/MuTect/1.1.7-Java-1.7.0_80
To use this version of MuTect, please first load the module with
module load MuTect/1.1.7-Java-1.7.0_80
Sample job submission script (sub.sh) to run the module:
#!/bin/bash
#SBATCH --job-name=j_mutect
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=2gb
#SBATCH --time=08:00:00
#SBATCH --output=mutect.%j.out
#SBATCH --error=mutect.%j.err
cd $SLURM_SUBMIT_DIR
ml MuTect/1.1.7-Java-1.7.0_80
java -jar /usr/local/apps/eb/MuTect/1.1.7-Java-1.7.0_80/mutect-1.1.7.jar [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
module load MuTect/1.1.7-Java-1.7.0_80 java -jar /usr/local/apps/eb/MuTect/1.1.7-Java-1.7.0_80/mutect-1.1.7.jar --help -------------------------------------------------------------------------------- The Genome Analysis Toolkit (GATK) v3.1-0-g72492bb, Compiled 2015/01/21 17:10:56 Copyright (c) 2010 The Broad Institute For support and documentation go to http://www.broadinstitute.org/gatk -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- usage: java -jar mutect-1.1.7.jar -T <analysis_type> [-args <arg_file>] [-I <input_file>] [-rbs <read_buffer_size>] [-et <phone_home>] [-K <gatk_key>] [-tag <tag>] [-rf <read_filter>] [-L <intervals>] [-XL <excludeIntervals>] [-isr <interval_set_rule>] [-im <interval_merging>] [-ip <interval_padding>] [-R <reference_sequence>] [-ndrs] [-maxRuntime <maxRuntime>] [-maxRuntimeUnits <maxRuntimeUnits>] [-dt <downsampling_type>] [-dfrac <downsample_to_fraction>] [-dcov <downsample_to_coverage>] [-baq <baq>] [-baqGOP <baqGapOpenPenalty>] [-fixMisencodedQuals] [-allowPotentiallyMisencodedQuals] [-OQ] [-DBQ <defaultBaseQualities>] [-PF <performanceLog>] [-BQSR <BQSR>] [-DIQ] [-EOQ] [-preserveQ <preserve_qscores_less_than>] [-globalQScorePrior <globalQScorePrior>] [-S <validation_strictness>] [-rpr] [-kpr] [-sample_rename_mapping_file <sample_rename_mapping_file>] [-U <unsafe>] [-nt <num_threads>] [-nct <num_cpu_threads_per_data_thread>] [-mte] [-bfh <num_bam_file_handles>] [-rgbl <read_group_black_list>] [-ped <pedigree>] [-pedString <pedigreeString>] [-pedValidationType <pedigreeValidationType>] [-variant_index_type <variant_index_type>] [-variant_index_parameter <variant_index_parameter>] [-l <logging_level>] [-log <log_to_file>] [-h] [-version] -T,--analysis_type <analysis_type> Name of the tool to run -args,--arg_file <arg_file> Reads arguments from the specified file -I,--input_file <input_file> Input file containing sequence data (SAM or BAM) -rbs,--read_buffer_size <read_buffer_size> Number of reads per SAM file to buffer in memory -et,--phone_home <phone_home> Run reporting mode (NO_ET|AWS| STDOUT) -K,--gatk_key <gatk_key> GATK key file required to run with -et NO_ET -tag,--tag <tag> Tag to identify this GATK run as part of a group of runs -rf,--read_filter <read_filter> Filters to apply to reads before analysis -L,--intervals <intervals> One or more genomic intervals over which to operate -XL,--excludeIntervals <excludeIntervals> One or more genomic intervals to exclude from processing -isr,--interval_set_rule <interval_set_rule> Set merging approach to use for combining interval inputs (UNION|INTERSECTION) -im,--interval_merging <interval_merging> Interval merging rule for abutting intervals (ALL| OVERLAPPING_ONLY) -ip,--interval_padding <interval_padding> Amount of padding (in bp) to add to each interval -R,--reference_sequence <reference_sequence> Reference sequence file -ndrs,--nonDeterministicRandomSeed Use a non-deterministic random seed -maxRuntime,--maxRuntime <maxRuntime> Stop execution cleanly as soon as maxRuntime has been reached -maxRuntimeUnits,--maxRuntimeUnits <maxRuntimeUnits> Unit of time used by maxRuntime (NANOSECONDS|MICROSECONDS| MILLISECONDS|SECONDS|MINUTES| HOURS|DAYS) -dt,--downsampling_type <downsampling_type> Type of read downsampling to employ at a given locus (NONE| ALL_READS|BY_SAMPLE) -dfrac,--downsample_to_fraction <downsample_to_fraction> Fraction of reads to downsample to -dcov,--downsample_to_coverage <downsample_to_coverage> Target coverage threshold for downsampling to coverage -baq,--baq <baq> Type of BAQ calculation to apply in the engine (OFF| CALCULATE_AS_NECESSARY| RECALCULATE) -baqGOP,--baqGapOpenPenalty <baqGapOpenPenalty> BAQ gap open penalty -fixMisencodedQuals,--fix_misencoded_quality_scores Fix mis-encoded base quality scores -allowPotentiallyMisencodedQuals,--allow_potentially_misencoded_quality_scores Ignore warnings about base quality score encoding -OQ,--useOriginalQualities Use the base quality scores from the OQ tag -DBQ,--defaultBaseQualities <defaultBaseQualities> Assign a default base quality -PF,--performanceLog <performanceLog> Write GATK runtime performance log to this file -BQSR,--BQSR <BQSR> Input covariates table file for on-the-fly base quality score recalibration -DIQ,--disable_indel_quals Disable printing of base insertion and deletion tags (with -BQSR) -EOQ,--emit_original_quals Emit the OQ tag with the original base qualities (with -BQSR) -preserveQ,--preserve_qscores_less_than <preserve_qscores_less_than> Don't recalibrate bases with quality scores less than this threshold (with -BQSR) -globalQScorePrior,--globalQScorePrior <globalQScorePrior> Global Qscore Bayesian prior to use for BQSR -S,--validation_strictness <validation_strictness> How strict should we be with validation (STRICT|LENIENT| SILENT) -rpr,--remove_program_records Remove program records from the SAM header -kpr,--keep_program_records Keep program records in the SAM header -sample_rename_mapping_file,--sample_rename_mapping_file <sample_rename_mapping_file> Rename sample IDs on-the-fly at runtime using the provided mapping file -U,--unsafe <unsafe> Enable unsafe operations: nothing will be checked at runtime (ALLOW_N_CIGAR_READS| ALLOW_UNINDEXED_BAM| ALLOW_UNSET_BAM_SORT_ORDER| NO_READ_ORDER_VERIFICATION| ALLOW_SEQ_DICT_INCOMPATIBILITY| LENIENT_VCF_PROCESSING|ALL) -nt,--num_threads <num_threads> Number of data threads to allocate to this analysis -nct,--num_cpu_threads_per_data_thread <num_cpu_threads_per_data_thread> Number of CPU threads to allocate per data thread -mte,--monitorThreadEfficiency Enable threading efficiency monitoring -bfh,--num_bam_file_handles <num_bam_file_handles> Total number of BAM file handles to keep open simultaneously -rgbl,--read_group_black_list <read_group_black_list> Exclude read groups based on tags -ped,--pedigree <pedigree> Pedigree files for samples -pedString,--pedigreeString <pedigreeString> Pedigree string for samples -pedValidationType,--pedigreeValidationType <pedigreeValidationType> Validation strictness for pedigree information (STRICT| SILENT) -variant_index_type,--variant_index_type <variant_index_type> Type of IndexCreator to use for VCF/BCF indices (DYNAMIC_SEEK| DYNAMIC_SIZE|LINEAR|INTERVAL) -variant_index_parameter,--variant_index_parameter <variant_index_parameter> Parameter to pass to the VCF/BCF IndexCreator -l,--logging_level <logging_level> Set the minimum level of logging -log,--log_to_file <log_to_file> Set the logging location -h,--help Generate the help message -version,--version Output version information alignment CheckAlignment Validates consistency of the aligner interface annotator VariantAnnotator Annotates variant calls with context information. beagle BeagleOutputToVCF Takes files produced by Beagle imputation engine and creates a vcf with modified annotations. ProduceBeagleInput Converts the input VCF into a format accepted by the Beagle imputation/analysis program. VariantsToBeagleUnphased Produces an input file to Beagle imputation engine, listing unphased, hard-called genotypes for a single sample in input variant file. bqsr AnalyzeCovariates Tool to analyze and evaluate base recalibration ables. BaseRecalibrator First pass of the base quality score recalibration -- Generates recalibration table based on various user-specified covariates (such as read group, reported quality score, machine cycle, and nucleotide context). coverage CallableLoci Emits a data file containing information about callable, uncallable, poorly mapped, and other parts of the genome <p/> CompareCallableLoci Test routine for new VariantContext object DepthOfCoverage Assess sequence coverage by a wide array of metrics, partitioned by sample, read group, or library GCContentByInterval Walks along reference and calculates the GC content for each interval. diagnosetargets DiagnoseTargets Analyzes coverage distribution and validates read mates for a given interval and sample. diagnostics BaseCoverageDistribution Simple walker to plot the coverage distribution per base CoveredByNSamplesSites Print intervals file with all the variant sites for which most of the samples have good coverage ErrorRatePerCycle Compute the read error rate per position FindCoveredIntervals Outputs a list of intervals that are covered above a given threshold. ReadGroupProperties Emits a GATKReport containing read group, sample, library, platform, center, sequencing data, paired end status, simple read type name (e.g. ReadLengthDistribution Outputs the read lengths of all the reads in a file. diffengine DiffObjects A generic engine for comparing tree-structured objects examples GATKPaperGenotyper A simple Bayesian genotyper, that outputs a text based call format. fasta FastaAlternateReferenceMaker Generates an alternative reference sequence over the specified interval. FastaReferenceMaker Renders a new reference in FASTA format consisting of only those loci provided in the input data set. FastaStats Calculate basic statistics about the reference sequence itself filters VariantFiltration Filters variant calls using a number of user-selectable, parameterizable criteria. genotyper UnifiedGenotyper A variant caller which unifies the approaches of several disparate callers -- Works for single-sample and multi-sample data. haplotypecaller HaplotypeCaller Call SNPs and indels simultaneously via local de-novo assembly of haplotypes in an active region. HaplotypeResolver Haplotype-based resolution of variants in 2 different eval files. indels IndelRealigner Performs local realignment of reads to correct misalignments due to the presence of indels. LeftAlignIndels Left-aligns indels from reads in a bam file. RealignerTargetCreator Emits intervals for the Local Indel Realigner to target for realignment. missing QualifyMissingIntervals Walks along reference and calculates a few metrics for each interval. mutect MuTect phasing PhaseByTransmission Computes the most likely genotype combination and phases trios and parent/child pairs ReadBackedPhasing Walks along all variant ROD loci, caching a user-defined window of VariantContext sites, and then finishes phasing them when they go out of range (using upstream and downstream reads). qc CheckPileup Compare GATK's internal pileup to a reference Samtools pileup CountBases Walks over the input data set, calculating the number of bases seen for diagnostic purposes. CountIntervals Count contiguous regions in an interval list. CountLoci Walks over the input data set, calculating the total number of covered loci for diagnostic purposes. CountMales Walks over the input data set, calculating the number of reads seen from male samples for diagnostic purposes. CountReadEvents Walks over the input data set, counting the number of read events (from the CIGAR operator) CountReads Walks over the input data set, calculating the number of reads seen for diagnostic purposes. CountRODs Prints out counts of the number of reference ordered data objects encountered. CountRODsByRef Prints out counts of the number of reference ordered data objects encountered along the reference. CountTerminusEvent Walks over the input data set, counting the number of reads ending in insertions/deletions or soft-clips ErrorThrowing A walker that simply throws errors. FlagStat A reimplementation of the 'samtools flagstat' subcommand in the GATK Pileup Emulates the samtools pileup command to print aligned reads PrintRODs Prints out all of the RODs in the input data set. QCRef Quality control for the reference fasta ReadClippingStats Read clipping statistics for all reads. readutils ClipReads Read clipping based on quality, position or sequence matching PrintReads Renders, in SAM/BAM format, all reads from the input data set in the order in which they appear in the input file. ReadAdaptorTrimmer Utility tool to blindly strip base adaptors. SplitSamFile Divides the input data set into separate BAM files, one for each sample in the input data set. rnaseq SplitNCigarReads Splits reads that contain Ns in their cigar string (e.g. simulatereads SimulateReadsForVariants Generates simulated reads for variants validation GenotypeAndValidate Genotypes a dataset and validates the calls of another dataset using the Unified Genotyper. ValidationAmplicons Creates FASTA sequences for use in Seqenom or PCR utilities for site amplification and subsequent validation validationsiteselector ValidationSiteSelector Randomly selects VCF records according to specified options. varianteval VariantEval General-purpose tool for variant evaluation (% in dbSNP, genotype concordance, Ti/Tv ratios, and a lot more) variantrecalibration ApplyRecalibration Applies cuts to the input vcf file (by adding filter lines) to achieve the desired novel truth sensitivity levels which were specified during VariantRecalibration VariantRecalibrator Create a Gaussian mixture model by looking at the annotations values over a high quality subset of the input call set and then evaluate all input variants. variantutils CalculateGenotypePosteriors Calculates genotype posterior likelihoods given panel data CombineGVCFs Combines any number of gVCF files that were produced by the Haplotype Caller into a single joint gVCF file. CombineVariants Combines VCF records from different sources. FilterLiftedVariants Filters a lifted-over VCF file for ref bases that have been changed. GenotypeConcordance Genotype concordance (per-sample and aggregate counts and frequencies, NRD/NRS and site allele overlaps) between two callsets GenotypeGVCFs Genotypes any number of gVCF files that were produced by the Haplotype Caller into a single joint VCF file. LeftAlignAndTrimVariants Left-aligns indels from a variants file. LiftoverVariants Lifts a VCF file over from one build to another. RandomlySplitVariants Takes a VCF file, randomly splits variants into two different sets, and outputs 2 new VCFs with the results. RegenotypeVariants Regenotypes the variants from a VCF. SelectHeaders Selects headers from a VCF source. SelectVariants Selects variants from a VCF source. ValidateVariants Validates a VCF file with an extra strict set of criteria. VariantsToAllelicPrimitives Takes alleles from a variants file and breaks them up (if possible) into more basic/primitive alleles. VariantsToBinaryPed Converts a VCF file to a binary plink Ped file (.bed/.bim/.fam) VariantsToTable Emits specific fields from a VCF file to a tab-deliminated table VariantsToVCF Converts variants from other file formats to VCF format. VariantValidationAssessor Annotates a validation (from Sequenom for example) VCF with QC metrics (HW-equilibrium, % failed probes)
Installation
System
64-bit Linux