Pilon-Teaching: Difference between revisions
(Created page with "Category:TeachingCategory:SoftwareCategory:Bioinformatics === Category === Bioinformatics === Program On === Teaching === Version === 1.22 === Au...") |
No edit summary |
||
(2 intermediate revisions by the same user not shown) | |||
Line 23: | Line 23: | ||
The last version of this application is at /usr/local/apps/eb/pilon/1.22-Java-1.8.0_144 | The last version of this application is at /usr/local/apps/eb/pilon/1.22-Java-1.8.0_144 | ||
To use this version, please | To use this version, please load the module with | ||
<pre class="gscript"> | <pre class="gscript"> | ||
ml pilon/1.22-Java-1.8.0_144 | ml pilon/1.22-Java-1.8.0_144 | ||
</pre> | </pre> | ||
Here is an example of a shell script, sub.sh, to run on | Here is an example of a shell script, sub.sh, to run on the batch queue: | ||
<div class="gscript2"> | <div class="gscript2"> | ||
Line 40: | Line 40: | ||
<nowiki>#</nowiki>SBATCH --time=<u>08:00:00</u><br> | <nowiki>#</nowiki>SBATCH --time=<u>08:00:00</u><br> | ||
<nowiki>#</nowiki>SBATCH --output=pilon.%j.out<br> | <nowiki>#</nowiki>SBATCH --output=pilon.%j.out<br> | ||
<nowiki>#</nowiki>SBATCH --error=pilon.%j.err<br> | |||
cd $SLURM_SUBMIT_DIR<br> | cd $SLURM_SUBMIT_DIR<br> | ||
ml pilon/1.22-Java-1.8.0_144<br> | ml pilon/1.22-Java-1.8.0_144<br> | ||
java-jar/usr/local/apps/eb/pilon/1.22-Java-1.8.0_144/pilon-1.22.jar <u>[options]</u><br> | java -jar /usr/local/apps/eb/pilon/1.22-Java-1.8.0_144/pilon-1.22.jar <u>[options]</u><br> | ||
</div> | </div> | ||
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values. | In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values. | ||
Line 59: | Line 60: | ||
<pre class="gcommand"> | <pre class="gcommand"> | ||
ml pilon/1.22-Java-1.8.0_144 | ml pilon/1.22-Java-1.8.0_144 | ||
java-jar/usr/local/apps/eb/pilon/1.22-Java-1.8.0_144/pilon-1.22.jar | java -jar /usr/local/apps/eb/pilon/1.22-Java-1.8.0_144/pilon-1.22.jar --help | ||
Pilon version 1.22 Wed Mar 15 16:38:30 2017 -0400 | |||
Usage: pilon --genome genome.fasta [--frags frags.bam] [--jumps jumps.bam] [--unpaired unpaired.bam] | |||
[...other options...] | |||
pilon --help for option details | |||
INPUTS: | |||
--genome genome.fasta | |||
The input genome we are trying to improve, which must be the reference used | |||
for the bam alignments. At least one of --frags or --jumps must also be given. | |||
--frags frags.bam | |||
A bam file consisting of fragment paired-end alignments, aligned to the --genome | |||
argument using bwa or bowtie2. This argument may be specifed more than once. | |||
--jumps jumps.bam | |||
A bam file consisting of jump (mate pair) paired-end alignments, aligned to the | |||
--genome argument using bwa or bowtie2. This argument may be specifed more than once. | |||
--unpaired unpaired.bam | |||
A bam file consisting of unpaired alignments, aligned to the --genome argument | |||
using bwa or bowtie2. This argument may be specifed more than once. | |||
--bam any.bam | |||
A bam file of unknown type; Pilon will scan it and attempt to classify it as one | |||
of the above bam types. | |||
OUTPUTS: | |||
--output prefix | |||
Prefix for output files | |||
--outdir directory | |||
Use this directory for all output files. | |||
--changes | |||
If specified, a file listing changes in the <output>.fasta will be generated. | |||
--vcf | |||
If specified, a vcf file will be generated | |||
--vcfqe | |||
If specified, the VCF will contain a QE (quality-weighted evidence) field rather | |||
than the default QP (quality-weighted percentage of evidence) field. | |||
--tracks | |||
This options will cause many track files (*.bed, *.wig) suitable for viewing in | |||
a genome browser to be written. | |||
CONTROL: | |||
--variant | |||
Sets up heuristics for variant calling, as opposed to assembly improvement; | |||
equivalent to "--vcf --fix all,breaks". | |||
--chunksize | |||
Input FASTA elements larger than this will be processed in smaller pieces not to | |||
exceed this size (default 10000000). | |||
--diploid | |||
Sample is from diploid organism; will eventually affect calling of heterozygous SNPs | |||
--fix fixlist | |||
A comma-separated list of categories of issues to try to fix: | |||
"snps": try to fix individual base errors; | |||
"indels": try to fix small indels; | |||
"gaps": try to fill gaps; | |||
"local": try to detect and fix local misassemblies; | |||
"all": all of the above (default); | |||
"bases": shorthand for "snps" and "indels" (for back compatibility); | |||
"none": none of the above; new fasta file will not be written. | |||
The following are experimental fix types: | |||
"amb": fix ambiguous bases in fasta output (to most likely alternative); | |||
"breaks": allow local reassembly to open new gaps (with "local"); | |||
"circles": try to close circlar elements when used with long corrected reads; | |||
"novel": assemble novel sequence from unaligned non-jump reads. | |||
--dumpreads | |||
Dump reads for local re-assemblies. | |||
--duplicates | |||
Use reads marked as duplicates in the input BAMs (ignored by default). | |||
--iupac | |||
Output IUPAC ambiguous base codes in the output FASTA file when appropriate. | |||
--nonpf | |||
Use reads which failed sequencer quality filtering (ignored by default). | |||
--targets targetlist | |||
Only process the specified target(s). Targets are comma-separated, and each target | |||
is a fasta element name optionally followed by a base range. | |||
Example: "scaffold00001,scaffold00002:10000-20000" would result in processing all of | |||
scaffold00001 and coordinates 10000-20000 of scaffold00002. | |||
If "targetlist" is the name of a file, each line will be treated as a target | |||
specification. | |||
--threads | |||
Degree of parallelism to use for certain processing (default 1). Experimental. | |||
--verbose | |||
More verbose output. | |||
--debug | |||
Debugging output (implies verbose). | |||
--version | |||
Print version string and exit. | |||
HEURISTICS: | |||
--defaultqual qual | |||
Assumes bases are of this quality if quals are no present in input BAMs (default 15). | |||
--flank nbases | |||
Controls how much of the well-aligned reads will be used; this many bases at each | |||
end of the good reads will be ignored (default 10). | |||
--gapmargin | |||
Closed gaps must be within this number of bases of true size to be closed (100000) | |||
--K | |||
Kmer size used by internal assembler (default 47). | |||
--mindepth depth | |||
Variants (snps and indels) will only be called if there is coverage of good pairs | |||
at this depth or more; if this value is >= 1, it is an absolute depth, if it is a | |||
fraction < 1, then minimum depth is computed by multiplying this value by the mean | |||
coverage for the region, with a minumum value of 5 (default 0.1: min depth to call | |||
is 10% of mean coverage or 5, whichever is greater). | |||
--mingap | |||
Minimum size for unclosed gaps (default 10) | |||
--minmq | |||
Minimum alignment mapping quality for a read to count in pileups (default 0) | |||
--minqual | |||
Minimum base quality to consider for pileups (default 0) | |||
--nostrays | |||
Skip making a pass through the input BAM files to identify stray pairs, that is, | |||
those pairs in which both reads are aligned but not marked valid because they have | |||
inconsistent orientation or separation. Identifying stray pairs can help fill gaps | |||
and assemble larger insertions, especially of repeat content. However, doing so | |||
sometimes consumes considerable memory. | |||
</pre> | </pre> |
Latest revision as of 14:24, 15 August 2018
Category
Bioinformatics
Program On
Teaching
Version
1.22
Author / Distributor
Description
"Pilon is a software tool which can be used to: (i) Automatically improve draft assemblies, (ii)Find variation among strains, including large event detection" More details are at pilon
Running Program
The last version of this application is at /usr/local/apps/eb/pilon/1.22-Java-1.8.0_144
To use this version, please load the module with
ml pilon/1.22-Java-1.8.0_144
Here is an example of a shell script, sub.sh, to run on the batch queue:
#!/bin/bash
#SBATCH --job-name=j_pilon
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=pilon.%j.out
#SBATCH --error=pilon.%j.err
cd $SLURM_SUBMIT_DIR
ml pilon/1.22-Java-1.8.0_144
java -jar /usr/local/apps/eb/pilon/1.22-Java-1.8.0_144/pilon-1.22.jar [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
ml pilon/1.22-Java-1.8.0_144 java -jar /usr/local/apps/eb/pilon/1.22-Java-1.8.0_144/pilon-1.22.jar --help Pilon version 1.22 Wed Mar 15 16:38:30 2017 -0400 Usage: pilon --genome genome.fasta [--frags frags.bam] [--jumps jumps.bam] [--unpaired unpaired.bam] [...other options...] pilon --help for option details INPUTS: --genome genome.fasta The input genome we are trying to improve, which must be the reference used for the bam alignments. At least one of --frags or --jumps must also be given. --frags frags.bam A bam file consisting of fragment paired-end alignments, aligned to the --genome argument using bwa or bowtie2. This argument may be specifed more than once. --jumps jumps.bam A bam file consisting of jump (mate pair) paired-end alignments, aligned to the --genome argument using bwa or bowtie2. This argument may be specifed more than once. --unpaired unpaired.bam A bam file consisting of unpaired alignments, aligned to the --genome argument using bwa or bowtie2. This argument may be specifed more than once. --bam any.bam A bam file of unknown type; Pilon will scan it and attempt to classify it as one of the above bam types. OUTPUTS: --output prefix Prefix for output files --outdir directory Use this directory for all output files. --changes If specified, a file listing changes in the <output>.fasta will be generated. --vcf If specified, a vcf file will be generated --vcfqe If specified, the VCF will contain a QE (quality-weighted evidence) field rather than the default QP (quality-weighted percentage of evidence) field. --tracks This options will cause many track files (*.bed, *.wig) suitable for viewing in a genome browser to be written. CONTROL: --variant Sets up heuristics for variant calling, as opposed to assembly improvement; equivalent to "--vcf --fix all,breaks". --chunksize Input FASTA elements larger than this will be processed in smaller pieces not to exceed this size (default 10000000). --diploid Sample is from diploid organism; will eventually affect calling of heterozygous SNPs --fix fixlist A comma-separated list of categories of issues to try to fix: "snps": try to fix individual base errors; "indels": try to fix small indels; "gaps": try to fill gaps; "local": try to detect and fix local misassemblies; "all": all of the above (default); "bases": shorthand for "snps" and "indels" (for back compatibility); "none": none of the above; new fasta file will not be written. The following are experimental fix types: "amb": fix ambiguous bases in fasta output (to most likely alternative); "breaks": allow local reassembly to open new gaps (with "local"); "circles": try to close circlar elements when used with long corrected reads; "novel": assemble novel sequence from unaligned non-jump reads. --dumpreads Dump reads for local re-assemblies. --duplicates Use reads marked as duplicates in the input BAMs (ignored by default). --iupac Output IUPAC ambiguous base codes in the output FASTA file when appropriate. --nonpf Use reads which failed sequencer quality filtering (ignored by default). --targets targetlist Only process the specified target(s). Targets are comma-separated, and each target is a fasta element name optionally followed by a base range. Example: "scaffold00001,scaffold00002:10000-20000" would result in processing all of scaffold00001 and coordinates 10000-20000 of scaffold00002. If "targetlist" is the name of a file, each line will be treated as a target specification. --threads Degree of parallelism to use for certain processing (default 1). Experimental. --verbose More verbose output. --debug Debugging output (implies verbose). --version Print version string and exit. HEURISTICS: --defaultqual qual Assumes bases are of this quality if quals are no present in input BAMs (default 15). --flank nbases Controls how much of the well-aligned reads will be used; this many bases at each end of the good reads will be ignored (default 10). --gapmargin Closed gaps must be within this number of bases of true size to be closed (100000) --K Kmer size used by internal assembler (default 47). --mindepth depth Variants (snps and indels) will only be called if there is coverage of good pairs at this depth or more; if this value is >= 1, it is an absolute depth, if it is a fraction < 1, then minimum depth is computed by multiplying this value by the mean coverage for the region, with a minumum value of 5 (default 0.1: min depth to call is 10% of mean coverage or 5, whichever is greater). --mingap Minimum size for unclosed gaps (default 10) --minmq Minimum alignment mapping quality for a read to count in pileups (default 0) --minqual Minimum base quality to consider for pileups (default 0) --nostrays Skip making a pass through the input BAM files to identify stray pairs, that is, those pairs in which both reads are aligned but not marked valid because they have inconsistent orientation or separation. Identifying stray pairs can help fill gaps and assemble larger insertions, especially of repeat content. However, doing so sometimes consumes considerable memory.
Installation
Source code is obtained from pilon
System
64-bit Linux