Ont-Guppy-Sapelo2
Category
Bioinformatics
Program On
Sapelo2
Version
6.5.7
Author / Distributor
Oxford Nanopore Technologies, Limited.
Description
Ont-Guppy is a basecalling software. For more information, please see https://nanoporetech.com/
Running Program
Also refer to Running Jobs on Sapelo2
For more information on Environment Modules on Sapelo2 please see the Lmod page.
Version 6.5.7, for GPU
- Version 4.4.2, for GPU is installed in /apps/eb/ont-guppy/6.5.7-CUDA-11.7.0 and it can be run an a GPU device.
To use this version of Guppy, please first load the module with
ml ont-guppy/6.5.7-CUDA-11.7.0
Sample job submission script (sub.sh) to run guppy_basecaller version 4.4.2 on a GPU node:
#!/bin/bash #SBATCH --partition=gpu_p #SBATCH --job-name=guppyjobname #SBATCH --gres=gpu:P100:1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=4 #SBATCH --time=48:00:00 #SBATCH --mem=10G cd $SLURM_SUBMIT_DIR ml ont-guppy/6.5.7-CUDA-11.7.0 guppy_basecaller -x "cuda:0" [options]
where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well. If the -x "cuda:0"
option is not included, guppy_basecaller will default to only use the CPUs.
where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.
Submit the job to the queue with
sbatch sub.sh
Documentation
[cft07037@b1-24 ~]$ ml ont-guppy/6.5.7-CUDA-11.7.0 [cft07037@b1-24 ~]$ guppy_basecaller -h : Guppy Basecalling Software, (C) Oxford Nanopore Technologies plc. Version 6.5.7+ca6d6af, minimap2 version 2.24-r1122 Use of this software is permitted solely under the terms of the end user license agreement (EULA). By running, copying or accessing this software, you are demonstrating your acceptance of the EULA. The EULA may be found in /apps/eb/ont-guppy/6.5.7-CUDA-11.7.0/bin Usage: With config file: guppy_basecaller -i <input path> -s <save path> -c <config file> [options] With flowcell and kit name: guppy_basecaller -i <input path> -s <save path> --flowcell <flowcell name> --kit <kit name> List supported flowcells and kits: guppy_basecaller --print_workflows Use GPU for basecalling: guppy_basecaller -i <input path> -s <save path> -c <config file> --device <cuda device name> [options] Command line parameters: --adapter_pt_range_scale Set polyT/adapter range scale for setting read scaling median absolute deviation. --as_cpu_threads_per_scaler Number of CPU worker threads per adapter scaler. --dmean_threshold Threshold for coarse stall event detection --dmean_win_size Window size for coarse stall event detection --as_gpu_runners_per_device Number of runners per GPU device for adapter scaling. --jump_threshold Threshold level for rna stall detection --max_search_len Maximum number of samples to search through for the stall --as_model_file Path to JSON model file for adapter scaling. --noisiest_section_scaling_max_size Threshold read size in samples under which nosiest-section scaling will be performed. --as_num_scalers Number of parallel scalers for adapter scaling. --override_scaling Manually provide scaling parameters rather than estimating them from each read. --pt_median_offset Set polyT median offset for setting read scaling median. --pt_minimum_read_start_index Set minimum index for read start sample required to attempt polyT scaling. --pt_required_adapter_drop Set minimum required current drop from adapter max to polyT detection. --pt_scaling Enable polyT/adapter max detection for read scaling. --as_reads_per_runner Maximum reads per runner for adapter scaling. --scaling_mad Median absolute deviation to use for manual scaling. --scaling_med Median current value to use for manual scaling. --trim_min_events Adapter trimmer minimum stride intervals after stall that must be seen. --trim_strategy Trimming strategy to apply: 'dna' or 'rna' (or 'none' to disable trimming) --trim_threshold Threshold above which data will be trimmed (in standard deviations of current level distribution). --use_quantile_scaling Use quantiles to calculate scaling values when basecalling --alignment_filtering Specify whether to filter reads based on their alignment status --align_type Specify whether you want full or coarse alignment. Valid values are (auto/full/coarse). --bed_file Path to .bed file containing areas of interest in reference genome. -a [ --align_ref ] Reference FASTA or index file. --minimap_opt_string Specify minimap2 options. See `guppy_basecaller --minimap_opt_string --help` for details). --num_alignment_threads Number of worker threads to use for alignment. --allow_inferior_barcodes Reads will still be classified even if both the barcodes at the front and rear (if applicable) were not the best scoring barcodes above the min_score. --barcode_kits Space separated list of barcoding kit(s) or expansion kit(s) to detect against. Must be in double quotes. --barcode_list Optional list of barcodes to look for. --detect_adapter Detect adapter sequences at the front and rear of the read. --detect_barcodes Detect barcode sequences at the front and rear of the read. --detect_mid_strand_adapter Detect adapter sequences within reads. --detect_mid_strand_barcodes Search for barcodes through the entire length of the read. --detect_primer Detect primer sequences at the front and rear of the read. --disable_barcode_sample_sheet_restricting Disable filtering of barcodes based on the sample sheet in use. --enable_trim_barcodes Enable trimming of barcodes from the sequences in the output files. By default is false, barcodes will not be trimmed. --front_window_size Window size for the beginning barcode. --min_score_adapter Minimum score for an adapter to be considered a valid alignment. --min_score_adapter_mid Minimum score for a mid-strand adapter to be considered a valid alignment. --min_score_barcode_front Minimum score to consider a front barcode to be a valid barcode alignment. --min_score_barcode_mask Minimum score for a barcode context to be considered a valid alignment. --min_score_barcode_mid Minimum score for a barcode to be detected in the middle of a read. --min_score_barcode_rear Minimum score to consider a rear barcode to be a valid alignment (and min_score_front will then be used for the front only when this is set). --min_score_primer Minimum score for a primer to be considered to be a valid alignment. --num_barcoding_buffers Number of GPU memory buffers to allocate to perform barcoding into. Controls level of parallelism on GPU for barcoding. --num_barcoding_threads Number of worker threads to use for barcoding. --num_extra_bases_trim How vigorous to be in trimming the barcode. Default is 0 i.e. the length of the detected barcode. A positive integer means extra bases will be trimmed, a negative number is how many fewer bases (less vigorous) will be trimmed. --num_mid_barcoding_buffers Number of GPU memory buffers to allocate to perform barcoding into. Controls level of parallelism on GPU for mid barcoding. --num_reads_per_barcoding_buffer The maximum number of reads to process at once in each barcoding buffer. --rear_window_size Window size for the ending barcode. --require_barcodes_both_ends Reads will only be classified if there is a barcode above the min_score at both ends of the read. --trim_adapters Trim the adapters from the sequences in the output files. --trim_primers Trim the primers from the sequences in the output files. --beam_cut Beam score cutoff for beam search decoding. --beam_width Beam width to use in beam search decode. --builtin_scripts Whether to use GPU kernels that were included at compile-time. --chunk_size Stride intervals per chunk. --chunks_per_caller Soft limit on number of chunks in each caller's queue. New reads will not be queued while this is exceeded. --chunks_per_runner Maximum chunks per runner. --cpu_threads_per_caller Number of CPU worker threads per basecaller. --disable_qscore_filtering Disable filtering of reads into PASS/FAIL folders based on min qscore. --dorado_model_path Path to dorado model folder. --dorado_modbase_models Names of Remora models for modified base detection. --duplex_window_size_max Maximum window size to use for prefix search in duplex decoding. --duplex_window_size_min Minimum window size to use for prefix search in duplex decoding. --gpu_runners_per_device Number of runners per GPU device. --high_priority_threshold Number of high priority chunks to process for each medium priority chunk. --int8_mode Enable quantised int8 mode for kernels which support it. -k [ --kernel_path ] Path to GPU kernel files location (only needed if builtin_scripts is false). --log_speed_frequency How often to print out basecalling speed. --medium_priority_threshold Number of medium priority chunks to process for each low priority chunk. --min_qscore Minimum acceptable qscore for a read to be filtered into the PASS folder. -m [ --model_file ] Path to JSON model file. --num_base_mod_threads The number of threads to use for Remora modified base detection in GPU basecalling mode. --num_callers Number of parallel basecallers to create. --overlap Overlap between chunks (in stride intervals). --post_out Return full posterior matrix in output fast5 file and/or called read message from server. --qscore_offset Qscore calibration offset. --qscore_scale Qscore calibration scale factor. --reverse_sequence Reverse the called sequence (for RNA sequencing). --stay_penalty Scaling factor to apply to stay probability calculation during transducer decode. --temp_bias Temperature adjustment for bias vector in softmax layer of RNN. --temp_weight Temperature adjustment for weight matrix in softmax layer of RNN. --u_substitution Substitute 'U' for 'T' in the called sequence (for RNA sequencing). --calib_detect Enable calibration strand detection and filtering. --calib_reference Reference FASTA file containing calibration strand. --additional_lamp_context_bases Number of bases from a lamp FIP barcode context to append to the front and read of the FIP barcode before performing matching. Default is 2. --lamp_kit LAMP barcoding kit to perform LAMP detection against. --min_length_lamp_context Minimum align length for a LAMP barcode mask context to be classified. --min_length_lamp_target Minimum align length for a LAMP target to be classified. --min_score_lamp Minimum score for a LAMP barcode to be classified. --min_score_lamp_mask Minimum score for a LAMP barcode mask context to be classified. --min_score_lamp_target Minimum score for a LAMP target to be classified. --max_pipeline_reads Maximum number of reads that can be processed by the pipeline at any one time. --index Output BAM index file. --bam_methylation_threshold The value below which a predicted methylation probability will not be emitted into a BAM file, expressed as a percentage. --bam_out Output BAM files. --barcode_nested_output_folder If flagged output FastQ/BAM files will be written to a nested folder structure, based on: protocol_group/barcode_arrangement/sample/protocol/qscore_pass_fail/ --compress_fastq Compress fastq output files with gzip. -c [ --config ] Configuration file for application. -d [ --data_path ] Path to use for loading any data files the application requires. -x [ --device ] Specify GPU device: 'auto', or 'cuda:<device_id>'. --flowcell Flowcell to find a configuration for. --input_file_list Optional file containing list of input fast5/pod5 files to process from the input_path. -i [ --input_path ] Path to input files. --kit Kit to find a configuration for. --load_scaling_info_from_read_files If flagged, scaling information in source fast5 or pod5 files will read and used if present. --moves_out Return move table in output BAM file. --nested_output_folder If flagged output FastQ/BAM files will be written to a nested folder structure, based on: protocol_group/sample/protocol/qscore_pass_fail/barcode_arrangement/ --print_workflows Output available workflows. --progress_stats_frequency Frequency in seconds in which to report progress statistics, if supplied will replace the default progress display. -z [ --quiet ] Quiet mode. Nothing will be output to STDOUT if this option is set. --read_batch_size Maximum batch size, in reads, for grouping input files. -l [ --read_id_list ] File containing list of read ids to filter to. -q [ --records_per_fastq ] Maximum number of records per fastq file, 0 means use a single file (per run id, per batch). -r [ --recursive ] Search for input file recursively. --resume Resume a previous basecall run using the same output folder. -s [ --save_path ] Path to save output files. -h [ --help ] Display the application usage help. -v [ --version ] Display the application version information. --skip_model_versions Skip display of model versions in output of available workflows when using --print_workflows. --trace_category_logs Enable trace logs - list of strings with the desired names. --trace_domains_config Configuration file containing list of trace domains to include in verbose logging (if enabled) --verbose_logs Enable verbose logs. --do_read_splitting Perform read splitting based on mid-strand adapter detection. --max_read_split_depth The maximum number of iterations of read splitting that should be performed. --min_score_read_splitting Minimum alignment score for the mid adapter on which to split the read. --num_read_splitting_buffers Number of GPU memory buffers to allocate to perform read splitting. Controls level of parallelism on GPU for read splitting using mid adapter detection. --num_read_splitting_threads Number of worker threads to use for read splitting. --sample_sheet Optional file containing sample sheet. Used to provide an alias for barcode results. --disable_pings Disable the transmission of telemetry pings. --ping_segment_duration Duration in minutes of each ping segment. --ping_url URL to send pings to.
Installation
- Binaries downloaded from the Oxford Nanopore Technologies site.
System
64-bit Linux