Ont-Guppy-Sapelo2: Difference between revisions

Revision as of 10:45, 25 March 2021

Program On

Sapelo2

Version

4.4.2

Author / Distributor

Oxford Nanopore Technologies, Limited.

Description

Ont-Guppy is a basecalling software. For more information, please see https://nanoporetech.com/

Running Program

Also refer to Running Jobs on Sapelo2

For more information on Environment Modules on Sapelo2 please see the Lmod page.

Version 4.4.2, for GPU

Version 4.4.2, for GPU is installed in /apps/eb/ont-guppy/4.4.2-GPU and it can be run an a P100 or a V100 GPU device. This version does not work on the K20 or K40 GPU devices.

To use this version of Guppy, please first load the module with

module load ont-guppy/4.4.2-GPU

Version 4.4.2, for CPU

Version 4.4.2, for CPU is installed in /apps/eb/ont-guppy/4.4.2-CPU

To use this version of Guppy, please first load the module with

module load ont-guppy/4.4.2-CPU

Sample job submission script (sub.sh) to run guppy_basecaller version 4.4.2 on a GPU node:

#!/bin/bash
#SBATCH --partition=gpu_p
#SBATCH --job-name=guppyjobname
#SBATCH --gres=gpu:P100:1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=48:00:00
#SBATCH --mem=10G

cd $SLURM_SUBMIT_DIR

ml ont-guppy/4.4.2-GPU

guppy_basecaller [options]

where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.

Sample job submission script (sub.sh) to run guppy_basecaller version 4.4.2 on a CPU node:

#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=guppyjobname
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --time=48:00:00
#SBATCH --mem=10G

cd $SLURM_SUBMIT_DIR

ml ont-guppy/4.4.2-CPU

guppy_basecaller [options]

where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.

Submit the job to the queue with

sbatch sub.sh

Documentation

[shtsai@b1-24 ~]$ ml ont-guppy/4.4.2-GPU
[shtsai@b1-24 ~]$ guppy_basecaller -h
: Guppy Basecalling Software, (C) Oxford Nanopore Technologies, Limited. Version 4.4.2+9623c16

Usage:

With config file:"
guppy_basecaller -i <input path> -s <save path> -c <config file> [options]
With flowcell and kit name:
guppy_basecaller -i <input path> -s <save path> --flowcell <flowcell name>
--kit <kit name>
List supported flowcells and kits:
guppy_basecaller --print_workflows

Use GPU for basecalling:
guppy_basecaller -i <input path> -s <save path> -c <config file>
--device <cuda device name> [options]
Command line parameters:
--trim_threshold arg Threshold above which data will be trimmed
(in standard deviations of current level
distribution).
--trim_min_events arg Adapter trimmer minimum stride intervals
after stall that must be seen.
--max_search_len arg Maximum number of samples to search through
for the stall
--override_scaling Manually provide scaling parameters rather
than estimating them from each read.
--scaling_med arg Median current value to use for manual
scaling.
--scaling_mad arg Median absolute deviation to use for manual
scaling.
--trim_strategy arg Trimming strategy to apply: 'dna' or 'rna'
(or 'none' to disable trimming)
--dmean_win_size arg Window size for coarse stall event
detection
--dmean_threshold arg Threshold for coarse stall event detection
--jump_threshold arg Threshold level for rna stall detection
--pt_scaling Enable polyT/adapter max detection for read
scaling.
--pt_median_offset arg Set polyT median offset for setting read
scaling median (default 2.5)
--adapter_pt_range_scale arg Set polyT/adapter range scale for setting
read scaling median absolute deviation
(default 5.2)
--pt_required_adapter_drop arg Set minimum required current drop from
adapter max to polyT detection. (default
30.0)
--pt_minimum_read_start_index arg Set minimum index for read start sample
required to attempt polyT scaling. (default
30)
--as_model_file arg Path to JSON model file for adapter
scaling.
--as_gpu_runners_per_device arg Number of runners per GPU device for
adapter scaling.
--as_cpu_threads_per_scaler arg Number of CPU worker threads per adapter
scaler
--as_reads_per_runner arg Maximum reads per runner for adapter
scaling.
--as_num_scalers arg Number of parallel scalers for adapter
scaling.
-m [ --model_file ] arg Path to JSON model file.
-k [ --kernel_path ] arg Path to GPU kernel files location (only
needed if builtin_scripts is false).
-x [ --device ] arg Specify basecalling device: 'auto', or
'cuda:<device_id>'.
--builtin_scripts arg Whether to use GPU kernels that were
included at compile-time.
--chunk_size arg Stride intervals per chunk.
--chunks_per_runner arg Maximum chunks per runner.
--chunks_per_caller arg Soft limit on number of chunks in each
caller's queue. New reads will not be
queued while this is exceeded.
--high_priority_threshold arg Number of high priority chunks to process
for each medium priority chunk.
--medium_priority_threshold arg Number of medium priority chunks to process
for each low priority chunk.
--overlap arg Overlap between chunks (in stride
intervals).
--gpu_runners_per_device arg Number of runners per GPU device.
--cpu_threads_per_caller arg Number of CPU worker threads per
basecaller.
--num_callers arg Number of parallel basecallers to create.
--post_out Return full posterior matrix in output
fast5 file and/or called read message from
server.
--stay_penalty arg Scaling factor to apply to stay probability
calculation during transducer decode.
--qscore_offset arg Qscore calibration offset.
--qscore_scale arg Qscore calibration scale factor.
--temp_weight arg Temperature adjustment for weight matrix in
softmax layer of RNN.
--temp_bias arg Temperature adjustment for bias vector in
softmax layer of RNN.
--beam_cut arg Beam score cutoff for beam search decoding.
--beam_width arg Beam score cutoff for beam search decoding.
--qscore_filtering Enable filtering of reads into PASS/FAIL
folders based on min qscore.
--min_qscore arg Minimum acceptable qscore for a read to be
filtered into the PASS folder
--reverse_sequence arg Reverse the called sequence (for RNA
sequencing).
--u_substitution arg Substitute 'U' for 'T' in the called
sequence (for RNA sequencing).
--log_speed_frequency arg How often to print out basecalling speed.
--barcode_kits arg Space separated list of barcoding kit(s) or
expansion kit(s) to detect against. Must be
in double quotes.
--trim_barcodes Trim the barcodes from the output sequences
in the FastQ files.
--num_extra_bases_trim arg How vigorous to be in trimming the barcode.
Default is 0 i.e. the length of the
detected barcode. A positive integer means
extra bases will be trimmed, a negative
number is how many fewer bases (less
vigorous) will be trimmed.
--arrangements_files arg Files containing arrangements.
--lamp_arrangements_files arg Files containing lamp arrangements.
--score_matrix_filename arg File containing mismatch score matrix.
--start_gap1 arg Gap penalty for aligning before the
reference.
--end_gap1 arg Gap penalty for aligning after the
reference.
--open_gap1 arg Penalty for opening a new gap in the
reference.
--extend_gap1 arg Penalty for extending a gap in the
reference.
--start_gap2 arg Gap penalty for aligning before the query.
--end_gap2 arg Gap penalty for aligning after the query.
--open_gap2 arg Penalty for opening a new gap in the query.
--extend_gap2 arg Penalty for extending a gap in the query.
--min_score arg Minimum score to consider a valid
alignment.
--min_score_rear_override arg Minimum score to consider a valid alignment
for the rear barcode only (and min_score
will then be used for the front only when
this is set).
--min_score_mask arg Minimum score for a barcode context to
consider a valid alignment.
--front_window_size arg Window size for the beginning barcode.
--rear_window_size arg Window size for the ending barcode.
--require_barcodes_both_ends Reads will only be classified if there is a
barcode above the min_score at both ends of
the read.
--allow_inferior_barcodes Reads will still be classified even if both
the barcodes at the front and rear (if
applicable) were not the best scoring
barcodes above the min_score.
--detect_mid_strand_barcodes Search for barcodes through the entire
length of the read.
--min_score_mid_barcodes arg Minimum score for a barcode to be detected
in the middle of a read.
--lamp_kit arg LAMP barcoding kit to perform LAMP
detection against.
--min_score_lamp arg Minimum score for a LAMP barcode to be
classified.
--min_score_lamp_mask arg Minimum score for a LAMP barcode mask
context to be classified.
--min_score_lamp_target arg Minimum score for a LAMP target to be
classified.
--additional_context_bases arg Number of bases from a lamp FIP barcode
context to append to the front and rear of
the FIP barcode before performing matching.
Default is 2.
--min_length_lamp_context arg Minimum align length for a LAMP barcode
mask context to be classified.
--min_length_lamp_target arg Minimum align length for a LAMP target to
be classified.
--num_barcoding_buffers arg Number of GPU memory buffers to allocate to
perform barcoding into. Controls level of
parallelism on GPU for barcoding.
--num_mid_barcoding_buffers arg Number of GPU memory buffers to allocate to
perform barcoding into. Controls level of
parallelism on GPU for mid barcoding.
--num_barcode_threads arg Number of worker threads to use for
barcoding.
--calib_detect Enable calibration strand detection and
filtering.
--calib_reference arg Reference FASTA file containing calibration
strand.
--calib_min_sequence_length arg Minimum sequence length for reads to be
considered candidate calibration strands.
--calib_max_sequence_length arg Maximum sequence length for reads to be
considered candidate calibration strands.
--calib_min_coverage arg Minimum reference coverage to pass
calibration strand detection.
--print_workflows Output available workflows.
--flowcell arg Flowcell to find a configuration for
--kit arg Kit to find a configuration for
-a [ --align_ref ] arg Path to alignment reference.
--bed_file arg Path to .bed file containing areas of
interest in reference genome.
--num_alignment_threads arg Number of worker threads to use for
alignment.
-z [ --quiet ] Quiet mode. Nothing will be output to
STDOUT if this option is set.
--trace_categories_logs arg Enable trace logs - list of strings with
the desired names.
--verbose_logs Enable verbose logs.
--trace_domains_log arg List of trace domains to include in verbose
logging (if enabled), '*' for all.
--trace_domains_config arg Configuration file containing list of trace
domains to include in verbose logging (if
enabled), this will override
--trace_domain_logs
--disable_pings Disable the transmission of telemetry
pings.
--ping_url arg URL to send pings to
--ping_segment_duration arg Duration in minutes of each ping segment.
--progress_stats_frequency arg Frequency in seconds in which to report
progress statistics, if supplied will
replace the default progress display.
-q [ --records_per_fastq ] arg Maximum number of records per fastq file, 0
means use a single file (per worker, per
run id).
--read_batch_size arg Maximum batch size, in reads, for grouping
input files.
--compress_fastq Compress fastq output files with gzip.
-i [ --input_path ] arg Path to input fast5 files.
--input_file_list arg Optional file containing list of input
fast5 files to process from the input_path.
-s [ --save_path ] arg Path to save fastq files.
-l [ --read_id_list ] arg File containing list of read ids to filter
to
-r [ --recursive ] Search for input files recursively.
--fast5_out Choice of whether to do fast5 output.
--bam_out Choice of whether to do BAM file output.
--bam_methylation_threshold arg The value below which a predicted
methylation probability will not be emitted
into a BAM file, expressed as a percentage.
Default is 5.0(%).
--resume Resume a previous basecall run using the
same output folder.
--client_id arg Optional unique identifier (non-negative
integer) for this instance of the Guppy
Client Basecaller, if supplied will form
part of the output filenames.
--nested_output_folder If flagged output fastq files will be
written to a nested folder structure, based
on: protocol_group/sample/protocol/qscore_p
ass_fail/barcode_arrangement/
--max_queued_reads arg Maximum number of reads to be submitted for
processing at any one time.
-h [ --help ] produce help message
-v [ --version ] print version number
-c [ --config ] arg Config file to use
-d [ --data_path ] arg Path to use for loading any data files the
application requires.

Installation

Binaries downloaded from the Oxford Nanopore Technologies site.

System

64-bit Linux