Picard-Teaching: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
(Created page with "Category:TeachingCategory:SoftwareCategory:Bioinformatics === Category === Bioinformatics === Program On === Teaching === Version === 2.16.0 ===...")
 
No edit summary
 
(3 intermediate revisions by the same user not shown)
Line 23: Line 23:
The last version of this application is at /usr/local/apps/eb/picard/2.16.0-Java-1.8.0_144
The last version of this application is at /usr/local/apps/eb/picard/2.16.0-Java-1.8.0_144


To use this version, please loads the module with
To use this version, please load the module with
<pre class="gscript">
<pre class="gscript">
ml picard/2.16.0-Java-1.8.0_144  
ml picard/2.16.0-Java-1.8.0_144  
</pre>  
</pre>  


Here is an example of a shell script, sub.sh, to run on at the batch queue:  
Here is an example of a shell script, sub.sh, to run on the batch queue:  


<div class="gscript2">
<div class="gscript2">
Line 40: Line 40:
<nowiki>#</nowiki>SBATCH --time=<u>08:00:00</u><br>   
<nowiki>#</nowiki>SBATCH --time=<u>08:00:00</u><br>   
<nowiki>#</nowiki>SBATCH --output=picard.%j.out<br>
<nowiki>#</nowiki>SBATCH --output=picard.%j.out<br>
<nowiki>#</nowiki>SBATCH --error=picard.%j.err<br>
   
   
cd $SLURM_SUBMIT_DIR<br>
cd $SLURM_SUBMIT_DIR<br>
ml picard/2.16.0-Java-1.8.0_144<br>     
ml picard/2.16.0-Java-1.8.0_144<br>     
java-jar/usr/local/apps/eb/picard/2.16.0-Java-1.8.0_144/picard.jar <u>[options]</u><br>   
java -jar /usr/local/apps/eb/picard/2.16.0-Java-1.8.0_144/picard.jar <u>[options]</u><br>   
</div>
</div>
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.   
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.   
Line 59: Line 60:
<pre  class="gcommand">
<pre  class="gcommand">
ml picard/2.16.0-Java-1.8.0_144  
ml picard/2.16.0-Java-1.8.0_144  
java-jar/usr/local/apps/eb/picard/2.16.0-Java-1.8.0_144/picard.jar java-jar/usr/local/apps/eb/picard/2.16.0-Java-1.8.0_144/picard.jar -h
java -jar /usr/local/apps/eb/picard/2.16.0-Java-1.8.0_144/picard.jar -h
To execute picard run: java -jar $EBROOTPICARD/picard.jar[yhuang@hn-teach 3.0]$ java -jar /usr/local/apps/eb/picard/2.16.0-Java-1.8.0_144/picard.jar -h
USAGE: PicardCommandLine <program name> [-h]
 
Available Programs:
--------------------------------------------------------------------------------------
Alpha Tools:                                    Tools that are currently UNSUPPORTED until further testing and maturation.
    CollectIndependentReplicateMetrics          (Experimental) Estimates the rate of independent replication of reads within a bam.
    CollectWgsMetricsWithNonZeroCoverage        (Experimental) Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments. 
    UmiAwareMarkDuplicatesWithMateCigar          Identifies duplicate reads using information from read positions and UMIs.
 
--------------------------------------------------------------------------------------
Fasta:                                          Tools for manipulating FASTA, or related data.
    CreateSequenceDictionary                    Creates a sequence dictionary for a reference sequence. 
    ExtractSequences                            Subsets intervals from a reference sequence to a new FASTA file.
    NonNFastaSize                                Counts the number of non-N bases in a fasta file.
    NormalizeFasta                              Normalizes lines of sequence in a FASTA file to be of the same length.
 
--------------------------------------------------------------------------------------
Fingerprinting Tools:                            Tools for manipulating fingerprints, or related data.
    CheckFingerprint                            Computes a fingerprint from the supplied input (SAM/BAM or VCF) file and compares it to the provided genotypes
    ClusterCrosscheckMetrics                    Clusters the results of a CrosscheckFingerprints run by LOD score.
    CrosscheckFingerprints                      Checks if all fingerprints appear to come from the same individual.
    CrosscheckReadGroupFingerprints              DEPRECATED: USE CrosscheckFingerprints. Checks if all read groups appear to come from the same individual.
 
--------------------------------------------------------------------------------------
Illumina Tools:                                  Tools for manipulating data specific to Illumina sequencers.
    CheckIlluminaDirectory                      Asserts the validity for specified Illumina basecalling data. 
    CollectIlluminaBasecallingMetrics            Collects Illumina Basecalling metrics for a sequencing run. 
    CollectIlluminaLaneMetrics                  Collects Illumina lane metrics for the given BaseCalling analysis directory. 
    ExtractIlluminaBarcodes                      Tool determines the barcode for each read in an Illumina lane. 
    IlluminaBasecallsToFastq                    Generate FASTQ file(s) from Illumina basecall read data. 
    IlluminaBasecallsToSam                      Transforms raw Illumina sequencing data into an unmapped SAM or BAM file.
    MarkIlluminaAdapters                        Reads a SAM or BAM file and rewrites it with new adapter-trimming tags. 
 
--------------------------------------------------------------------------------------
Interval Tools:                                  Tools for manipulating Picard interval lists.
    BedToIntervalList                            Converts a BED file to a Picard Interval List. 
    IntervalListToBed                            Converts an Picard IntervalList file to a BED file.
    IntervalListTools                            Manipulates interval lists. 
    LiftOverIntervalList                        Lifts over an interval list from one reference build to another. 
    ScatterIntervalsByNs                        Writes an interval list based on splitting a reference by Ns. 
 
--------------------------------------------------------------------------------------
Metrics:                                        Tools for reporting metrics on various data types.
    AccumulateVariantCallingMetrics              Combines multiple Variant Calling Metrics files into a single file
    CollectAlignmentSummaryMetrics              <b>Produces a summary of alignment metrics from a SAM or BAM file.</b> 
    CollectBaseDistributionByCycle              Chart the nucleotide distribution per cycle in a SAM or BAM file
    CollectGcBiasMetrics                        Collect metrics regarding GC bias.
    CollectHiSeqXPfFailMetrics                  Classify PF-Failing reads in a HiSeqX Illumina Basecalling directory into various categories.
    CollectHsMetrics                            Collects hybrid-selection (HS) metrics for a SAM or BAM file. 
    CollectInsertSizeMetrics                    Collect metrics about the insert size distribution of a paired-end library.
    CollectJumpingLibraryMetrics                Collect jumping library metrics.
    CollectMultipleMetrics                      Collect multiple classes of metrics. 
    CollectOxoGMetrics                          Collect metrics to assess oxidative artifacts.
    CollectQualityYieldMetrics                  Collect metrics about reads that pass quality thresholds and Illumina-specific filters. 
    CollectRawWgsMetrics                        Collect whole genome sequencing-related metrics. 
    CollectRnaSeqMetrics                        Produces RNA alignment metrics for a SAM or BAM file. 
    CollectRrbsMetrics                          <b>Collects metrics from reduced representation bisulfite sequencing (Rrbs) data.</b> 
    CollectSequencingArtifactMetrics            Collect metrics to quantify single-base sequencing artifacts. 
    CollectTargetedPcrMetrics                    Calculate PCR-related metrics from targeted sequencing data.
    CollectVariantCallingMetrics                Collects per-sample and aggregate (spanning all samples) metrics from the provided VCF file
    CollectWgsMetrics                            Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments.
    CompareMetrics                              Compare two metrics files.
    ConvertSequencingArtifactToOxoG              Extract OxoG metrics from generalized artifacts metrics. 
    EstimateLibraryComplexity                    Estimates the numbers of unique molecules in a sequencing library. 
    MeanQualityByCycle                          Collect mean quality by cycle.
    QualityScoreDistribution                    Chart the distribution of quality scores. 
 
--------------------------------------------------------------------------------------
Miscellaneous Tools:                            A set of miscellaneous tools.               
    BaitDesigner                                Designs oligonucleotide baits for hybrid selection reactions.
    FifoBuffer                                  FIFO buffer used to buffer input and output streams with a customizable buffer size
 
--------------------------------------------------------------------------------------
SAM/BAM:                                        Tools for manipulating SAM, BAM, or related data.
    AddCommentsToBam                            Adds comments to the header of a BAM file.
    AddOrReplaceReadGroups                      Replace read groups in a BAM file.
    BamIndexStats                                Generate index statistics from a BAM file
    BamToBfq                                    Create BFQ files from a BAM file for use by the maq aligner. 
    BuildBamIndex                                Generates a BAM index ".bai" file. 
    CalculateReadGroupChecksum                  Creates a hash code based on the read groups (RG). 
    CheckTerminatorBlock                        Asserts the provided gzip file's (e.g., BAM) last block is well-formed; RC 100 otherwise
    CleanSam                                    Cleans the provided SAM/BAM, soft-clipping beyond-end-of-reference alignments and setting MAPQ to 0 for unmapped reads
    CompareSAMs                                  Compare two input ".sam" or ".bam" files. 
    DownsampleSam                                Downsample a SAM or BAM file. 
    FastqToSam                                  Converts a FASTQ file to an unaligned BAM or SAM file. 
    FilterSamReads                              Subset read data from a SAM or BAM file
    FixMateInformation                          Verify mate-pair information between mates and fix if needed.
    GatherBamFiles                              Concatenate one or more BAM files as efficiently as possible
    MarkDuplicates                              Identifies duplicate reads. 
    MarkDuplicatesWithMateCigar                  Identifies duplicate reads, accounting for mate CIGAR. 
    MergeBamAlignment                            Merge alignment data from a SAM or BAM with data in an unmapped BAM file. 
    MergeSamFiles                                Merges multiple SAM and/or BAM files into a single file. 
    PositionBasedDownsampleSam                  Downsample a SAM or BAM file to retain a subset of the reads based on the reads location in each tile in the flowcell.
    ReorderSam                                  Reorders reads in a SAM or BAM file to match ordering in reference
    ReplaceSamHeader                            Replaces the SAMFileHeader in a SAM or BAM file. 
    RevertOriginalBaseQualitiesAndAddMateCigar  Reverts the original base qualities and adds the mate cigar tag to read-group BAMs
    RevertSam                                    Reverts SAM or BAM files to a previous state. 
    SamFormatConverter                          Convert a BAM file to a SAM file, or a SAM to a BAM
    SamToFastq                                  Converts a SAM or BAM file to FASTQ. 
    SetNmAndUqTags                              DEPRECATED: Use SetNmMdAndUqTags instead.
    SetNmMdAndUqTags                            Fixes the NM, MD, and UQ tags in a SAM file. 
    SortSam                                      Sorts a SAM or BAM file. 
    SplitSamByLibrary                            Splits a SAM or BAM file into individual files by library
    SplitSamByNumberOfReads                      Splits a SAM or BAM file to multiple BAMs.
    ValidateSamFile                              Validates a SAM or BAM file. 
    ViewSam                                      Prints a SAM or BAM file to the screen
 
--------------------------------------------------------------------------------------
Unit Testing:                                    Unit testing                               
    SimpleMarkDuplicatesWithMateCigar            (Experimental) Examines aligned records in the supplied SAM or BAM file to locate duplicate molecules.
 
--------------------------------------------------------------------------------------
VCF/BCF:                                        Tools for manipulating VCF, BCF, or related data.
    FilterVcf                                    Hard filters a VCF.
    FindMendelianViolations                      Finds mendelian violations of all types within a VCF
    FixVcfHeader                                Replaces or fixes a VCF header.
    GatherVcfs                                  Gathers multiple VCF files from a scatter operation into a single VCF file
    GenotypeConcordance                          Evaluate genotype concordance between callsets.
    LiftoverVcf                                  Lifts over a VCF file from one reference build to another. 
    MakeSitesOnlyVcf                            Creates a VCF bereft of genotype information from an input VCF or BCF
    MergeVcfs                                    Merges multiple VCF or BCF files into one VCF file or BCF
    RenameSampleInVcf                            Renames a sample within a VCF or BCF. 
    SortVcf                                      Sorts one or more VCF files. 
    SplitVcfs                                    Splits SNPs and INDELs into separate files. 
    UpdateVcfSequenceDictionary                  Takes a VCF and a second file that contains a sequence dictionary and updates the VCF with the new sequence dictionary.
    VcfFormatConverter                          Converts VCF to BCF or BCF to VCF. 
    VcfToIntervalList                            Converts a VCF or BCF file to a Picard Interval List.
 
--------------------------------------------------------------------------------------


</pre>
</pre>

Latest revision as of 14:43, 15 August 2018

Category

Bioinformatics

Program On

Teaching

Version

2.16.0

Author / Distributor

picard

Description

"A set of tools (in Java) for working with next generation sequencing data in the BAM format." More details are at picard

Running Program

The last version of this application is at /usr/local/apps/eb/picard/2.16.0-Java-1.8.0_144

To use this version, please load the module with

ml picard/2.16.0-Java-1.8.0_144 

Here is an example of a shell script, sub.sh, to run on the batch queue:

#!/bin/bash
#SBATCH --job-name=j_picard
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=picard.%j.out
#SBATCH --error=picard.%j.err

cd $SLURM_SUBMIT_DIR
ml picard/2.16.0-Java-1.8.0_144
java -jar /usr/local/apps/eb/picard/2.16.0-Java-1.8.0_144/picard.jar [options]

In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.

Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.


Here is an example of job submission command:

sbatch ./sub.sh 

Documentation

ml picard/2.16.0-Java-1.8.0_144 
java -jar /usr/local/apps/eb/picard/2.16.0-Java-1.8.0_144/picard.jar -h
To execute picard run: java -jar $EBROOTPICARD/picard.jar[yhuang@hn-teach 3.0]$ java -jar /usr/local/apps/eb/picard/2.16.0-Java-1.8.0_144/picard.jar -h
USAGE: PicardCommandLine <program name> [-h]

Available Programs:
--------------------------------------------------------------------------------------
Alpha Tools:                                     Tools that are currently UNSUPPORTED until further testing and maturation.
    CollectIndependentReplicateMetrics           (Experimental) Estimates the rate of independent replication of reads within a bam.
    CollectWgsMetricsWithNonZeroCoverage         (Experimental) Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments.  
    UmiAwareMarkDuplicatesWithMateCigar          Identifies duplicate reads using information from read positions and UMIs. 

--------------------------------------------------------------------------------------
Fasta:                                           Tools for manipulating FASTA, or related data.
    CreateSequenceDictionary                     Creates a sequence dictionary for a reference sequence.  
    ExtractSequences                             Subsets intervals from a reference sequence to a new FASTA file.
    NonNFastaSize                                Counts the number of non-N bases in a fasta file.
    NormalizeFasta                               Normalizes lines of sequence in a FASTA file to be of the same length.

--------------------------------------------------------------------------------------
Fingerprinting Tools:                            Tools for manipulating fingerprints, or related data.
    CheckFingerprint                             Computes a fingerprint from the supplied input (SAM/BAM or VCF) file and compares it to the provided genotypes
    ClusterCrosscheckMetrics                     Clusters the results of a CrosscheckFingerprints run by LOD score.
    CrosscheckFingerprints                       Checks if all fingerprints appear to come from the same individual.
    CrosscheckReadGroupFingerprints              DEPRECATED: USE CrosscheckFingerprints. Checks if all read groups appear to come from the same individual.

--------------------------------------------------------------------------------------
Illumina Tools:                                  Tools for manipulating data specific to Illumina sequencers.
    CheckIlluminaDirectory                       Asserts the validity for specified Illumina basecalling data.  
    CollectIlluminaBasecallingMetrics            Collects Illumina Basecalling metrics for a sequencing run.  
    CollectIlluminaLaneMetrics                   Collects Illumina lane metrics for the given BaseCalling analysis directory.  
    ExtractIlluminaBarcodes                      Tool determines the barcode for each read in an Illumina lane.  
    IlluminaBasecallsToFastq                     Generate FASTQ file(s) from Illumina basecall read data.  
    IlluminaBasecallsToSam                       Transforms raw Illumina sequencing data into an unmapped SAM or BAM file.
    MarkIlluminaAdapters                         Reads a SAM or BAM file and rewrites it with new adapter-trimming tags.  

--------------------------------------------------------------------------------------
Interval Tools:                                  Tools for manipulating Picard interval lists.
    BedToIntervalList                            Converts a BED file to a Picard Interval List.  
    IntervalListToBed                            Converts an Picard IntervalList file to a BED file.
    IntervalListTools                            Manipulates interval lists.  
    LiftOverIntervalList                         Lifts over an interval list from one reference build to another.  
    ScatterIntervalsByNs                         Writes an interval list based on splitting a reference by Ns.  

--------------------------------------------------------------------------------------
Metrics:                                         Tools for reporting metrics on various data types.
    AccumulateVariantCallingMetrics              Combines multiple Variant Calling Metrics files into a single file
    CollectAlignmentSummaryMetrics               <b>Produces a summary of alignment metrics from a SAM or BAM file.</b>  
    CollectBaseDistributionByCycle               Chart the nucleotide distribution per cycle in a SAM or BAM file
    CollectGcBiasMetrics                         Collect metrics regarding GC bias. 
    CollectHiSeqXPfFailMetrics                   Classify PF-Failing reads in a HiSeqX Illumina Basecalling directory into various categories.
    CollectHsMetrics                             Collects hybrid-selection (HS) metrics for a SAM or BAM file.  
    CollectInsertSizeMetrics                     Collect metrics about the insert size distribution of a paired-end library.
    CollectJumpingLibraryMetrics                 Collect jumping library metrics. 
    CollectMultipleMetrics                       Collect multiple classes of metrics.  
    CollectOxoGMetrics                           Collect metrics to assess oxidative artifacts.
    CollectQualityYieldMetrics                   Collect metrics about reads that pass quality thresholds and Illumina-specific filters.  
    CollectRawWgsMetrics                         Collect whole genome sequencing-related metrics.  
    CollectRnaSeqMetrics                         Produces RNA alignment metrics for a SAM or BAM file.  
    CollectRrbsMetrics                           <b>Collects metrics from reduced representation bisulfite sequencing (Rrbs) data.</b>  
    CollectSequencingArtifactMetrics             Collect metrics to quantify single-base sequencing artifacts.  
    CollectTargetedPcrMetrics                    Calculate PCR-related metrics from targeted sequencing data. 
    CollectVariantCallingMetrics                 Collects per-sample and aggregate (spanning all samples) metrics from the provided VCF file
    CollectWgsMetrics                            Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments.
    CompareMetrics                               Compare two metrics files.
    ConvertSequencingArtifactToOxoG              Extract OxoG metrics from generalized artifacts metrics.  
    EstimateLibraryComplexity                    Estimates the numbers of unique molecules in a sequencing library.  
    MeanQualityByCycle                           Collect mean quality by cycle.
    QualityScoreDistribution                     Chart the distribution of quality scores.  

--------------------------------------------------------------------------------------
Miscellaneous Tools:                             A set of miscellaneous tools.                
    BaitDesigner                                 Designs oligonucleotide baits for hybrid selection reactions.
    FifoBuffer                                   FIFO buffer used to buffer input and output streams with a customizable buffer size 

--------------------------------------------------------------------------------------
SAM/BAM:                                         Tools for manipulating SAM, BAM, or related data.
    AddCommentsToBam                             Adds comments to the header of a BAM file.
    AddOrReplaceReadGroups                       Replace read groups in a BAM file.
    BamIndexStats                                Generate index statistics from a BAM file
    BamToBfq                                     Create BFQ files from a BAM file for use by the maq aligner.  
    BuildBamIndex                                Generates a BAM index ".bai" file.  
    CalculateReadGroupChecksum                   Creates a hash code based on the read groups (RG).  
    CheckTerminatorBlock                         Asserts the provided gzip file's (e.g., BAM) last block is well-formed; RC 100 otherwise
    CleanSam                                     Cleans the provided SAM/BAM, soft-clipping beyond-end-of-reference alignments and setting MAPQ to 0 for unmapped reads
    CompareSAMs                                  Compare two input ".sam" or ".bam" files.  
    DownsampleSam                                Downsample a SAM or BAM file.  
    FastqToSam                                   Converts a FASTQ file to an unaligned BAM or SAM file.  
    FilterSamReads                               Subset read data from a SAM or BAM file
    FixMateInformation                           Verify mate-pair information between mates and fix if needed.
    GatherBamFiles                               Concatenate one or more BAM files as efficiently as possible
    MarkDuplicates                               Identifies duplicate reads.  
    MarkDuplicatesWithMateCigar                  Identifies duplicate reads, accounting for mate CIGAR.  
    MergeBamAlignment                            Merge alignment data from a SAM or BAM with data in an unmapped BAM file.  
    MergeSamFiles                                Merges multiple SAM and/or BAM files into a single file.  
    PositionBasedDownsampleSam                   Downsample a SAM or BAM file to retain a subset of the reads based on the reads location in each tile in the flowcell.
    ReorderSam                                   Reorders reads in a SAM or BAM file to match ordering in reference
    ReplaceSamHeader                             Replaces the SAMFileHeader in a SAM or BAM file.  
    RevertOriginalBaseQualitiesAndAddMateCigar   Reverts the original base qualities and adds the mate cigar tag to read-group BAMs
    RevertSam                                    Reverts SAM or BAM files to a previous state.  
    SamFormatConverter                           Convert a BAM file to a SAM file, or a SAM to a BAM
    SamToFastq                                   Converts a SAM or BAM file to FASTQ.  
    SetNmAndUqTags                               DEPRECATED: Use SetNmMdAndUqTags instead.
    SetNmMdAndUqTags                             Fixes the NM, MD, and UQ tags in a SAM file.  
    SortSam                                      Sorts a SAM or BAM file.  
    SplitSamByLibrary                            Splits a SAM or BAM file into individual files by library
    SplitSamByNumberOfReads                      Splits a SAM or BAM file to multiple BAMs.
    ValidateSamFile                              Validates a SAM or BAM file.  
    ViewSam                                      Prints a SAM or BAM file to the screen

--------------------------------------------------------------------------------------
Unit Testing:                                    Unit testing                                 
    SimpleMarkDuplicatesWithMateCigar            (Experimental) Examines aligned records in the supplied SAM or BAM file to locate duplicate molecules.

--------------------------------------------------------------------------------------
VCF/BCF:                                         Tools for manipulating VCF, BCF, or related data.
    FilterVcf                                    Hard filters a VCF.
    FindMendelianViolations                      Finds mendelian violations of all types within a VCF
    FixVcfHeader                                 Replaces or fixes a VCF header.
    GatherVcfs                                   Gathers multiple VCF files from a scatter operation into a single VCF file
    GenotypeConcordance                          Evaluate genotype concordance between callsets.
    LiftoverVcf                                  Lifts over a VCF file from one reference build to another.  
    MakeSitesOnlyVcf                             Creates a VCF bereft of genotype information from an input VCF or BCF
    MergeVcfs                                    Merges multiple VCF or BCF files into one VCF file or BCF
    RenameSampleInVcf                            Renames a sample within a VCF or BCF.  
    SortVcf                                      Sorts one or more VCF files.  
    SplitVcfs                                    Splits SNPs and INDELs into separate files.  
    UpdateVcfSequenceDictionary                  Takes a VCF and a second file that contains a sequence dictionary and updates the VCF with the new sequence dictionary.
    VcfFormatConverter                           Converts VCF to BCF or BCF to VCF.  
    VcfToIntervalList                            Converts a VCF or BCF file to a Picard Interval List.

--------------------------------------------------------------------------------------

Back to Top

Installation

Source code is obtained from picard

System

64-bit Linux