Ont-Guppy-Sapelo2: Difference between revisions

Latest revision as of 13:53, 13 May 2024

Program On

Sapelo2

Version

6.5.7

Author / Distributor

Oxford Nanopore Technologies, Limited.

Description

Ont-Guppy is a basecalling software. For more information, please see https://nanoporetech.com/

Running Program

Also refer to Running Jobs on Sapelo2

For more information on Environment Modules on Sapelo2 please see the Lmod page.

Version 6.5.7, for GPU

Version 4.4.2, for GPU is installed in /apps/eb/ont-guppy/6.5.7-CUDA-11.7.0 and it can be run an a GPU device.

To use this version of Guppy, please first load the module with

ml ont-guppy/6.5.7-CUDA-11.7.0

Sample job submission script (sub.sh) to run guppy_basecaller version 4.4.2 on a GPU node:

#!/bin/bash
#SBATCH --partition=gpu_p
#SBATCH --job-name=guppyjobname
#SBATCH --gres=gpu:P100:1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --time=48:00:00
#SBATCH --mem=10G

cd $SLURM_SUBMIT_DIR

ml ont-guppy/6.5.7-CUDA-11.7.0

guppy_basecaller -x "cuda:0" [options]

where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well. If the -x "cuda:0" option is not included, guppy_basecaller will default to only use the CPUs.

where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.

Submit the job to the queue with

sbatch sub.sh

Documentation

[cft07037@b1-24 ~]$ ml ont-guppy/6.5.7-CUDA-11.7.0
[cft07037@b1-24 ~]$ guppy_basecaller -h
: Guppy Basecalling Software, (C) Oxford Nanopore Technologies plc. Version 6.5.7+ca6d6af, minimap2 version 2.24-r1122

Use of this software is permitted solely under the terms of the end user license agreement (EULA).
By running, copying or accessing this software, you are demonstrating your acceptance of the EULA.
The EULA may be found in /apps/eb/ont-guppy/6.5.7-CUDA-11.7.0/bin

Usage:

With config file:
guppy_basecaller -i <input path> -s <save path> -c <config file> [options]
With flowcell and kit name:
guppy_basecaller -i <input path> -s <save path> --flowcell <flowcell name>
--kit <kit name>
List supported flowcells and kits:
guppy_basecaller --print_workflows

Use GPU for basecalling:
guppy_basecaller -i <input path> -s <save path> -c <config file>
--device <cuda device name> [options]

Command line parameters:
--adapter_pt_range_scale
Set polyT/adapter range scale for setting read scaling median absolute deviation.
--as_cpu_threads_per_scaler
Number of CPU worker threads per adapter scaler.
--dmean_threshold
Threshold for coarse stall event detection
--dmean_win_size
Window size for coarse stall event detection
--as_gpu_runners_per_device
Number of runners per GPU device for adapter scaling.
--jump_threshold
Threshold level for rna stall detection
--max_search_len
Maximum number of samples to search through for the stall
--as_model_file
Path to JSON model file for adapter scaling.
--noisiest_section_scaling_max_size
Threshold read size in samples under which nosiest-section scaling will be performed.
--as_num_scalers
Number of parallel scalers for adapter scaling.
--override_scaling
Manually provide scaling parameters rather than estimating them from each read.
--pt_median_offset
Set polyT median offset for setting read scaling median.
--pt_minimum_read_start_index
Set minimum index for read start sample required to attempt polyT scaling.
--pt_required_adapter_drop
Set minimum required current drop from adapter max to polyT detection.
--pt_scaling
Enable polyT/adapter max detection for read scaling.
--as_reads_per_runner
Maximum reads per runner for adapter scaling.
--scaling_mad
Median absolute deviation to use for manual scaling.
--scaling_med
Median current value to use for manual scaling.
--trim_min_events
Adapter trimmer minimum stride intervals after stall that must be seen.
--trim_strategy
Trimming strategy to apply: 'dna' or 'rna' (or 'none' to disable trimming)
--trim_threshold
Threshold above which data will be trimmed (in standard deviations of current level distribution).
--use_quantile_scaling
Use quantiles to calculate scaling values when basecalling
--alignment_filtering
Specify whether to filter reads based on their alignment status
--align_type
Specify whether you want full or coarse alignment. Valid values are (auto/full/coarse).
--bed_file
Path to .bed file containing areas of interest in reference genome.
-a [ --align_ref ]
Reference FASTA or index file.
--minimap_opt_string
Specify minimap2 options. See `guppy_basecaller --minimap_opt_string --help` for details).
--num_alignment_threads
Number of worker threads to use for alignment.
--allow_inferior_barcodes
Reads will still be classified even if both the barcodes at the front and rear (if applicable) were not the best scoring barcodes above the min_score.
--barcode_kits
Space separated list of barcoding kit(s) or expansion kit(s) to detect against. Must be in double quotes.
--barcode_list
Optional list of barcodes to look for.
--detect_adapter
Detect adapter sequences at the front and rear of the read.
--detect_barcodes
Detect barcode sequences at the front and rear of the read.
--detect_mid_strand_adapter
Detect adapter sequences within reads.
--detect_mid_strand_barcodes
Search for barcodes through the entire length of the read.
--detect_primer
Detect primer sequences at the front and rear of the read.
--disable_barcode_sample_sheet_restricting
Disable filtering of barcodes based on the sample sheet in use.
--enable_trim_barcodes
Enable trimming of barcodes from the sequences in the output files. By default is false, barcodes will not be trimmed.
--front_window_size
Window size for the beginning barcode.
--min_score_adapter
Minimum score for an adapter to be considered a valid alignment.
--min_score_adapter_mid
Minimum score for a mid-strand adapter to be considered a valid alignment.
--min_score_barcode_front
Minimum score to consider a front barcode to be a valid barcode alignment.
--min_score_barcode_mask
Minimum score for a barcode context to be considered a valid alignment.
--min_score_barcode_mid
Minimum score for a barcode to be detected in the middle of a read.
--min_score_barcode_rear
Minimum score to consider a rear barcode to be a valid alignment (and min_score_front will then be used for the front only when this is set).
--min_score_primer
Minimum score for a primer to be considered to be a valid alignment.
--num_barcoding_buffers
Number of GPU memory buffers to allocate to perform barcoding into. Controls level of parallelism on GPU for barcoding.
--num_barcoding_threads
Number of worker threads to use for barcoding.
--num_extra_bases_trim
How vigorous to be in trimming the barcode. Default is 0 i.e. the length of the detected barcode. A positive integer means extra bases will be trimmed, a negative number is how many fewer bases (less vigorous) will be trimmed.
--num_mid_barcoding_buffers
Number of GPU memory buffers to allocate to perform barcoding into. Controls level of parallelism on GPU for mid barcoding.
--num_reads_per_barcoding_buffer
The maximum number of reads to process at once in each barcoding buffer.
--rear_window_size
Window size for the ending barcode.
--require_barcodes_both_ends
Reads will only be classified if there is a barcode above the min_score at both ends of the read.
--trim_adapters
Trim the adapters from the sequences in the output files.
--trim_primers
Trim the primers from the sequences in the output files.
--beam_cut
Beam score cutoff for beam search decoding.
--beam_width
Beam width to use in beam search decode.
--builtin_scripts
Whether to use GPU kernels that were included at compile-time.
--chunk_size
Stride intervals per chunk.
--chunks_per_caller
Soft limit on number of chunks in each caller's queue. New reads will not be queued while this is exceeded.
--chunks_per_runner
Maximum chunks per runner.
--cpu_threads_per_caller
Number of CPU worker threads per basecaller.
--disable_qscore_filtering
Disable filtering of reads into PASS/FAIL folders based on min qscore.
--dorado_model_path
Path to dorado model folder.
--dorado_modbase_models
Names of Remora models for modified base detection.
--duplex_window_size_max
Maximum window size to use for prefix search in duplex decoding.
--duplex_window_size_min
Minimum window size to use for prefix search in duplex decoding.
--gpu_runners_per_device
Number of runners per GPU device.
--high_priority_threshold
Number of high priority chunks to process for each medium priority chunk.
--int8_mode
Enable quantised int8 mode for kernels which support it.
-k [ --kernel_path ]
Path to GPU kernel files location (only needed if builtin_scripts is false).
--log_speed_frequency
How often to print out basecalling speed.
--medium_priority_threshold
Number of medium priority chunks to process for each low priority chunk.
--min_qscore
Minimum acceptable qscore for a read to be filtered into the PASS folder.
-m [ --model_file ]
Path to JSON model file.
--num_base_mod_threads
The number of threads to use for Remora modified base detection in GPU basecalling mode.
--num_callers
Number of parallel basecallers to create.
--overlap
Overlap between chunks (in stride intervals).
--post_out
Return full posterior matrix in output fast5 file and/or called read message from server.
--qscore_offset
Qscore calibration offset.
--qscore_scale
Qscore calibration scale factor.
--reverse_sequence
Reverse the called sequence (for RNA sequencing).
--stay_penalty
Scaling factor to apply to stay probability calculation during transducer decode.
--temp_bias
Temperature adjustment for bias vector in softmax layer of RNN.
--temp_weight
Temperature adjustment for weight matrix in softmax layer of RNN.
--u_substitution
Substitute 'U' for 'T' in the called sequence (for RNA sequencing).
--calib_detect
Enable calibration strand detection and filtering.
--calib_reference
Reference FASTA file containing calibration strand.
--additional_lamp_context_bases
Number of bases from a lamp FIP barcode context to append to the front and read of the FIP barcode before performing matching. Default is 2.
--lamp_kit
LAMP barcoding kit to perform LAMP detection against.
--min_length_lamp_context
Minimum align length for a LAMP barcode mask context to be classified.
--min_length_lamp_target
Minimum align length for a LAMP target to be classified.
--min_score_lamp
Minimum score for a LAMP barcode to be classified.
--min_score_lamp_mask
Minimum score for a LAMP barcode mask context to be classified.
--min_score_lamp_target
Minimum score for a LAMP target to be classified.
--max_pipeline_reads
Maximum number of reads that can be processed by the pipeline at any one time.
--index
Output BAM index file.
--bam_methylation_threshold
The value below which a predicted methylation probability will not be emitted into a BAM file, expressed as a percentage.
--bam_out
Output BAM files.
--barcode_nested_output_folder
If flagged output FastQ/BAM files will be written to a nested folder structure, based on: protocol_group/barcode_arrangement/sample/protocol/qscore_pass_fail/
--compress_fastq
Compress fastq output files with gzip.
-c [ --config ]
Configuration file for application.
-d [ --data_path ]
Path to use for loading any data files the application requires.
-x [ --device ]
Specify GPU device: 'auto', or 'cuda:<device_id>'.
--flowcell
Flowcell to find a configuration for.
--input_file_list
Optional file containing list of input fast5/pod5 files to process from the input_path.
-i [ --input_path ]
Path to input files.
--kit
Kit to find a configuration for.
--load_scaling_info_from_read_files
If flagged, scaling information in source fast5 or pod5 files will read and used if present.
--moves_out
Return move table in output BAM file.
--nested_output_folder
If flagged output FastQ/BAM files will be written to a nested folder structure, based on: protocol_group/sample/protocol/qscore_pass_fail/barcode_arrangement/
--print_workflows
Output available workflows.
--progress_stats_frequency
Frequency in seconds in which to report progress statistics, if supplied will replace the default progress display.
-z [ --quiet ]
Quiet mode. Nothing will be output to STDOUT if this option is set.
--read_batch_size
Maximum batch size, in reads, for grouping input files.
-l [ --read_id_list ]
File containing list of read ids to filter to.
-q [ --records_per_fastq ]
Maximum number of records per fastq file, 0 means use a single file (per run id, per batch).
-r [ --recursive ]
Search for input file recursively.
--resume
Resume a previous basecall run using the same output folder.
-s [ --save_path ]
Path to save output files.
-h [ --help ]
Display the application usage help.
-v [ --version ]
Display the application version information.
--skip_model_versions
Skip display of model versions in output of available workflows when using --print_workflows.
--trace_category_logs
Enable trace logs - list of strings with the desired names.
--trace_domains_config
Configuration file containing list of trace domains to include in verbose logging (if enabled)
--verbose_logs
Enable verbose logs.
--do_read_splitting
Perform read splitting based on mid-strand adapter detection.
--max_read_split_depth
The maximum number of iterations of read splitting that should be performed.
--min_score_read_splitting
Minimum alignment score for the mid adapter on which to split the read.
--num_read_splitting_buffers
Number of GPU memory buffers to allocate to perform read splitting. Controls level of parallelism on GPU for read splitting using mid adapter detection.
--num_read_splitting_threads
Number of worker threads to use for read splitting.
--sample_sheet
Optional file containing sample sheet. Used to provide an alias for barcode results.
--disable_pings
Disable the transmission of telemetry pings.
--ping_segment_duration
Duration in minutes of each ping segment.
--ping_url
URL to send pings to.

Installation

Binaries downloaded from the Oxford Nanopore Technologies site.

System

64-bit Linux