Ont-Guppy-Sapelo2: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
 
(2 intermediate revisions by one other user not shown)
Line 5: Line 5:
Sapelo2
Sapelo2
=== Version ===
=== Version ===
4.4.2
6.5.7
===Author / Distributor===
===Author / Distributor===
Oxford Nanopore Technologies, Limited.
Oxford Nanopore Technologies, Limited.
Line 17: Line 17:




'''Version 4.4.2, for GPU'''
'''Version 6.5.7, for GPU'''


*Version 4.4.2, for GPU is installed in /apps/eb/ont-guppy/4.4.2-GPU and it can be run an a P100 or a V100 GPU device. This version does not work on the K20 or K40 GPU devices.
*Version 4.4.2, for GPU is installed in /apps/eb/ont-guppy/6.5.7-CUDA-11.7.0 and it can be run an a GPU device.


To use this version of Guppy, please first load the module with
To use this version of Guppy, please first load the module with
<pre class="gscript">
<pre class="gscript">
module load ont-guppy/4.4.2-GPU
ml ont-guppy/6.5.7-CUDA-11.7.0
</pre>
 
 
'''Version 4.4.2, for CPU'''
 
*Version 4.4.2, for CPU is installed in /apps/eb/ont-guppy/4.4.2-CPU
 
To use this version of Guppy, please first load the module with
<pre class="gscript">
module load ont-guppy/4.4.2-CPU
</pre>  
</pre>  


Line 51: Line 41:
cd $SLURM_SUBMIT_DIR
cd $SLURM_SUBMIT_DIR


ml ont-guppy/4.4.2-GPU
ml ont-guppy/6.5.7-CUDA-11.7.0


guppy_basecaller -x "cuda:0" [options]
guppy_basecaller -x "cuda:0" [options]


</pre>
</pre>
where [options] need to be replaced by the options (command and arguments) you want to use.  Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.  
where [options] need to be replaced by the options (command and arguments) you want to use.  Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well. If the <code> -x "cuda:0"</code>  option is not included, guppy_basecaller will default to only use the CPUs.
 
 
Sample job submission script (sub.sh) to run guppy_basecaller version 4.4.2 on a CPU node:


<pre class="gscript">
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=guppyjobname
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --time=48:00:00
#SBATCH --mem=10G
cd $SLURM_SUBMIT_DIR
ml ont-guppy/4.4.2-CPU
guppy_basecaller [options]
</pre>
where [options] need to be replaced by the options (command and arguments) you want to use.  Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.  
where [options] need to be replaced by the options (command and arguments) you want to use.  Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.  


Line 88: Line 59:


<pre class="gcomment">
<pre class="gcomment">
[cft07037@b1-24 ~]$ ml ont-guppy/6.5.7-CUDA-11.7.0
[shtsai@b1-24 ~]$ ml ont-guppy/4.4.2-GPU
[cft07037@b1-24 ~]$ guppy_basecaller -h
[shtsai@b1-24 ~]$ guppy_basecaller -h
: Guppy Basecalling Software, (C) Oxford Nanopore Technologies plc. Version 6.5.7+ca6d6af, minimap2 version 2.24-r1122
: Guppy Basecalling Software, (C) Oxford Nanopore Technologies, Limited. Version 4.4.2+9623c16
 
Use of this software is permitted solely under the terms of the end user license agreement (EULA).
By running, copying or accessing this software, you are demonstrating your acceptance of the EULA.
The EULA may be found in /apps/eb/ont-guppy/6.5.7-CUDA-11.7.0/bin


Usage:
Usage:


With config file:"
With config file:
   guppy_basecaller -i <input path> -s <save path> -c <config file> [options]
   guppy_basecaller -i <input path> -s <save path> -c <config file> [options]
With flowcell and kit name:
With flowcell and kit name:
Line 106: Line 80:
   guppy_basecaller -i <input path> -s <save path> -c <config file>
   guppy_basecaller -i <input path> -s <save path> -c <config file>
     --device <cuda device name> [options]
     --device <cuda device name> [options]
Command line parameters:
Command line parameters:
  --trim_threshold arg              Threshold above which data will be trimmed
--adapter_pt_range_scale
                                    (in standard deviations of current level
Set polyT/adapter range scale for setting read scaling median absolute deviation.
                                    distribution).
--as_cpu_threads_per_scaler
  --trim_min_events arg            Adapter trimmer minimum stride intervals
Number of CPU worker threads per adapter scaler.
                                    after stall that must be seen.
--dmean_threshold
  --max_search_len arg              Maximum number of samples to search through
Threshold for coarse stall event detection
                                    for the stall
--dmean_win_size
  --override_scaling                Manually provide scaling parameters rather
Window size for coarse stall event detection
                                    than estimating them from each read.
--as_gpu_runners_per_device
  --scaling_med arg                Median current value to use for manual
Number of runners per GPU device for adapter scaling.
                                    scaling.
--jump_threshold
  --scaling_mad arg                Median absolute deviation to use for manual
Threshold level for rna stall detection
                                    scaling.
--max_search_len
  --trim_strategy arg              Trimming strategy to apply: 'dna' or 'rna'
Maximum number of samples to search through for the stall
                                    (or 'none' to disable trimming)
--as_model_file
  --dmean_win_size arg              Window size for coarse stall event
Path to JSON model file for adapter scaling.
                                    detection
--noisiest_section_scaling_max_size
  --dmean_threshold arg            Threshold for coarse stall event detection
Threshold read size in samples under which nosiest-section scaling will be performed.
  --jump_threshold arg              Threshold level for rna stall detection
--as_num_scalers
  --pt_scaling                      Enable polyT/adapter max detection for read
Number of parallel scalers for adapter scaling.
                                    scaling.
--override_scaling
  --pt_median_offset arg            Set polyT median offset for setting read  
Manually provide scaling parameters rather than estimating them from each read.
                                    scaling median (default 2.5)
--pt_median_offset
  --adapter_pt_range_scale arg      Set polyT/adapter range scale for setting
Set polyT median offset for setting read scaling median.
                                    read scaling median absolute deviation
--pt_minimum_read_start_index
                                    (default 5.2)
Set minimum index for read start sample required to attempt polyT scaling.
  --pt_required_adapter_drop arg    Set minimum required current drop from  
--pt_required_adapter_drop
                                    adapter max to polyT detection. (default
Set minimum required current drop from adapter max to polyT detection.
                                    30.0)
--pt_scaling
  --pt_minimum_read_start_index arg Set minimum index for read start sample
Enable polyT/adapter max detection for read scaling.
                                    required to attempt polyT scaling. (default
--as_reads_per_runner
                                    30)
Maximum reads per runner for adapter scaling.
  --as_model_file arg              Path to JSON model file for adapter
--scaling_mad
                                    scaling.
Median absolute deviation to use for manual scaling.
  --as_gpu_runners_per_device arg  Number of runners per GPU device for  
--scaling_med
                                    adapter scaling.
Median current value to use for manual scaling.
  --as_cpu_threads_per_scaler arg  Number of CPU worker threads per adapter
--trim_min_events
                                    scaler
Adapter trimmer minimum stride intervals after stall that must be seen.
  --as_reads_per_runner arg        Maximum reads per runner for adapter
--trim_strategy
                                    scaling.
Trimming strategy to apply: 'dna' or 'rna' (or 'none' to disable trimming)
  --as_num_scalers arg              Number of parallel scalers for adapter
--trim_threshold
                                    scaling.
Threshold above which data will be trimmed (in standard deviations of current level distribution).
  -m [ --model_file ] arg          Path to JSON model file.
--use_quantile_scaling
  -k [ --kernel_path ] arg          Path to GPU kernel files location (only
Use quantiles to calculate scaling values when basecalling
                                    needed if builtin_scripts is false).
--alignment_filtering
  -x [ --device ] arg              Specify basecalling device: 'auto', or
Specify whether to filter reads based on their alignment status
                                    'cuda:<device_id>'.
--align_type
  --builtin_scripts arg            Whether to use GPU kernels that were
Specify whether you want full or coarse alignment. Valid values are (auto/full/coarse).
                                    included at compile-time.
--bed_file
  --chunk_size arg                  Stride intervals per chunk.
Path to .bed file containing areas of interest in reference genome.
  --chunks_per_runner arg          Maximum chunks per runner.
-a [ --align_ref ]
  --chunks_per_caller arg          Soft limit on number of chunks in each
Reference FASTA or index file.
                                    caller's queue. New reads will not be
--minimap_opt_string
                                    queued while this is exceeded.
Specify minimap2 options. See `guppy_basecaller --minimap_opt_string --help` for details).
  --high_priority_threshold arg    Number of high priority chunks to process
--num_alignment_threads
                                    for each medium priority chunk.
Number of worker threads to use for alignment.
  --medium_priority_threshold arg  Number of medium priority chunks to process
--allow_inferior_barcodes
                                    for each low priority chunk.
Reads will still be classified even if both the barcodes at the front and rear (if applicable) were not the best scoring barcodes above the min_score.
  --overlap arg                    Overlap between chunks (in stride
--barcode_kits
                                    intervals).
Space separated list of barcoding kit(s) or expansion kit(s) to detect against. Must be in double quotes.
  --gpu_runners_per_device arg      Number of runners per GPU device.
--barcode_list
  --cpu_threads_per_caller arg      Number of CPU worker threads per
Optional list of barcodes to look for.
                                    basecaller.
--detect_adapter
  --num_callers arg                Number of parallel basecallers to create.
Detect adapter sequences at the front and rear of the read.
  --post_out                        Return full posterior matrix in output
--detect_barcodes
                                    fast5 file and/or called read message from
Detect barcode sequences at the front and rear of the read.
                                    server.
--detect_mid_strand_adapter
  --stay_penalty arg                Scaling factor to apply to stay probability
Detect adapter sequences within reads.
                                    calculation during transducer decode.
--detect_mid_strand_barcodes
  --qscore_offset arg              Qscore calibration offset.
Search for barcodes through the entire length of the read.
  --qscore_scale arg                Qscore calibration scale factor.
--detect_primer
  --temp_weight arg                Temperature adjustment for weight matrix in
Detect primer sequences at the front and rear of the read.
                                    softmax layer of RNN.
--disable_barcode_sample_sheet_restricting
  --temp_bias arg                  Temperature adjustment for bias vector in
Disable filtering of barcodes based on the sample sheet in use.
                                    softmax layer of RNN.
--enable_trim_barcodes
  --beam_cut arg                    Beam score cutoff for beam search decoding.
Enable trimming of barcodes from the sequences in the output files. By default is false, barcodes will not be trimmed.
  --beam_width arg                  Beam score cutoff for beam search decoding.
--front_window_size
  --qscore_filtering                Enable filtering of reads into PASS/FAIL
Window size for the beginning barcode.
                                    folders based on min qscore.
--min_score_adapter
  --min_qscore arg                  Minimum acceptable qscore for a read to be
Minimum score for an adapter to be considered a valid alignment.
                                    filtered into the PASS folder
--min_score_adapter_mid
  --reverse_sequence arg            Reverse the called sequence (for RNA
Minimum score for a mid-strand adapter to be considered a valid alignment.
                                    sequencing).
--min_score_barcode_front
  --u_substitution arg              Substitute 'U' for 'T' in the called
Minimum score to consider a front barcode to be a valid barcode alignment.
                                    sequence (for RNA sequencing).
--min_score_barcode_mask
  --log_speed_frequency arg        How often to print out basecalling speed.
Minimum score for a barcode context to be considered a valid alignment.
  --barcode_kits arg                Space separated list of barcoding kit(s) or
--min_score_barcode_mid
                                    expansion kit(s) to detect against. Must be
Minimum score for a barcode to be detected in the middle of a read.
                                    in double quotes.
--min_score_barcode_rear
  --trim_barcodes                  Trim the barcodes from the output sequences
Minimum score to consider a rear barcode to be a valid alignment (and min_score_front will then be used for the front only when this is set).
                                    in the FastQ files.
--min_score_primer
  --num_extra_bases_trim arg        How vigorous to be in trimming the barcode.
Minimum score for a primer to be considered to be a valid alignment.
                                    Default is 0 i.e. the length of the
--num_barcoding_buffers
                                    detected barcode. A positive integer means
Number of GPU memory buffers to allocate to perform barcoding into. Controls level of parallelism on GPU for barcoding.
                                    extra bases will be trimmed, a negative
--num_barcoding_threads
                                    number is how many fewer bases (less
Number of worker threads to use for barcoding.
                                    vigorous) will be trimmed.
--num_extra_bases_trim
  --arrangements_files arg          Files containing arrangements.
How vigorous to be in trimming the barcode. Default is 0 i.e. the length of the detected barcode. A positive integer means extra bases will be trimmed, a negative number is how many fewer bases (less vigorous) will be trimmed.
  --lamp_arrangements_files arg    Files containing lamp arrangements.
--num_mid_barcoding_buffers
  --score_matrix_filename arg      File containing mismatch score matrix.
Number of GPU memory buffers to allocate to perform barcoding into. Controls level of parallelism on GPU for mid barcoding.
  --start_gap1 arg                  Gap penalty for aligning before the
--num_reads_per_barcoding_buffer
                                    reference.
The maximum number of reads to process at once in each barcoding buffer.
  --end_gap1 arg                    Gap penalty for aligning after the
--rear_window_size
                                    reference.
Window size for the ending barcode.
  --open_gap1 arg                  Penalty for opening a new gap in the
--require_barcodes_both_ends
                                    reference.
Reads will only be classified if there is a barcode above the min_score at both ends of the read.
  --extend_gap1 arg                Penalty for extending a gap in the  
--trim_adapters
                                    reference.
Trim the adapters from the sequences in the output files.
  --start_gap2 arg                  Gap penalty for aligning before the query.
--trim_primers
  --end_gap2 arg                    Gap penalty for aligning after the query.
Trim the primers from the sequences in the output files.
  --open_gap2 arg                  Penalty for opening a new gap in the query.
--beam_cut
  --extend_gap2 arg                Penalty for extending a gap in the query.
Beam score cutoff for beam search decoding.
  --min_score arg                  Minimum score to consider a valid
--beam_width
                                    alignment.
Beam width to use in beam search decode.
  --min_score_rear_override arg    Minimum score to consider a valid alignment
--builtin_scripts
                                    for the rear barcode only (and min_score
Whether to use GPU kernels that were included at compile-time.
                                    will then be used for the front only when
--chunk_size
                                    this is set).
Stride intervals per chunk.
  --min_score_mask arg              Minimum score for a barcode context to
--chunks_per_caller
                                    consider a valid alignment.
Soft limit on number of chunks in each caller's queue. New reads will not be queued while this is exceeded.
  --front_window_size arg          Window size for the beginning barcode.
--chunks_per_runner
  --rear_window_size arg            Window size for the ending barcode.
Maximum chunks per runner.
  --require_barcodes_both_ends      Reads will only be classified if there is a
--cpu_threads_per_caller
                                    barcode above the min_score at both ends of
Number of CPU worker threads per basecaller.
                                    the read.
--disable_qscore_filtering
  --allow_inferior_barcodes        Reads will still be classified even if both
Disable filtering of reads into PASS/FAIL folders based on min qscore.
                                    the barcodes at the front and rear (if
--dorado_model_path
                                    applicable) were not the best scoring
Path to dorado model folder.
                                    barcodes above the min_score.
--dorado_modbase_models
  --detect_mid_strand_barcodes      Search for barcodes through the entire
Names of Remora models for modified base detection.
                                    length of the read.
--duplex_window_size_max
  --min_score_mid_barcodes arg      Minimum score for a barcode to be detected
Maximum window size to use for prefix search in duplex decoding.
                                    in the middle of a read.
--duplex_window_size_min
  --lamp_kit arg                    LAMP barcoding kit to perform LAMP
Minimum window size to use for prefix search in duplex decoding.
                                    detection against.
--gpu_runners_per_device
  --min_score_lamp arg              Minimum score for a LAMP barcode to be
Number of runners per GPU device.
                                    classified.
--high_priority_threshold
  --min_score_lamp_mask arg        Minimum score for a LAMP barcode mask
Number of high priority chunks to process for each medium priority chunk.
                                    context to be classified.
--int8_mode
  --min_score_lamp_target arg      Minimum score for a LAMP target to be
Enable quantised int8 mode for kernels which support it.
                                    classified.
-k [ --kernel_path ]
  --additional_context_bases arg    Number of bases from a lamp FIP barcode
Path to GPU kernel files location (only needed if builtin_scripts is false).
                                    context to append to the front and rear of
--log_speed_frequency
                                    the FIP barcode before performing matching.
How often to print out basecalling speed.
                                    Default is 2.
--medium_priority_threshold
  --min_length_lamp_context arg    Minimum align length for a LAMP barcode
Number of medium priority chunks to process for each low priority chunk.
                                    mask context to be classified.
--min_qscore
  --min_length_lamp_target arg      Minimum align length for a LAMP target to
Minimum acceptable qscore for a read to be filtered into the PASS folder.
                                    be classified.
-m [ --model_file ]
  --num_barcoding_buffers arg      Number of GPU memory buffers to allocate to
Path to JSON model file.
                                    perform barcoding into. Controls level of
--num_base_mod_threads
                                    parallelism on GPU for barcoding.
The number of threads to use for Remora modified base detection in GPU basecalling mode.
  --num_mid_barcoding_buffers arg  Number of GPU memory buffers to allocate to
--num_callers
                                    perform barcoding into. Controls level of
Number of parallel basecallers to create.
                                    parallelism on GPU for mid barcoding.
--overlap
  --num_barcode_threads arg        Number of worker threads to use for  
Overlap between chunks (in stride intervals).
                                    barcoding.
--post_out
  --calib_detect                    Enable calibration strand detection and
Return full posterior matrix in output fast5 file and/or called read message from server.
                                    filtering.
--qscore_offset
  --calib_reference arg            Reference FASTA file containing calibration
Qscore calibration offset.
                                    strand.
--qscore_scale
  --calib_min_sequence_length arg  Minimum sequence length for reads to be
Qscore calibration scale factor.
                                    considered candidate calibration strands.
--reverse_sequence
  --calib_max_sequence_length arg  Maximum sequence length for reads to be
Reverse the called sequence (for RNA sequencing).
                                    considered candidate calibration strands.
--stay_penalty
  --calib_min_coverage arg          Minimum reference coverage to pass
Scaling factor to apply to stay probability calculation during transducer decode.
                                    calibration strand detection.
--temp_bias
  --print_workflows                Output available workflows.
Temperature adjustment for bias vector in softmax layer of RNN.
  --flowcell arg                    Flowcell to find a configuration for
--temp_weight
  --kit arg                        Kit to find a configuration for
Temperature adjustment for weight matrix in softmax layer of RNN.
  -a [ --align_ref ] arg            Path to alignment reference.
--u_substitution
  --bed_file arg                    Path to .bed file containing areas of
Substitute 'U' for 'T' in the called sequence (for RNA sequencing).
                                    interest in reference genome.
--calib_detect
  --num_alignment_threads arg      Number of worker threads to use for
Enable calibration strand detection and filtering.
                                    alignment.
--calib_reference
  -z [ --quiet ]                    Quiet mode. Nothing will be output to  
Reference FASTA file containing calibration strand.
                                    STDOUT if this option is set.
--additional_lamp_context_bases
  --trace_categories_logs arg      Enable trace logs - list of strings with
Number of bases from a lamp FIP barcode context to append to the front and read of the FIP barcode before performing matching. Default is 2.
                                    the desired names.
--lamp_kit
  --verbose_logs                    Enable verbose logs.
LAMP barcoding kit to perform LAMP detection against.
  --trace_domains_log arg          List of trace domains to include in verbose
--min_length_lamp_context
                                    logging (if enabled),  '*' for all.
Minimum align length for a LAMP barcode mask context to be classified.
  --trace_domains_config arg        Configuration file containing list of trace
--min_length_lamp_target
                                    domains to include in verbose logging (if
Minimum align length for a LAMP target to be classified.
                                    enabled), this will override
--min_score_lamp
                                    --trace_domain_logs
Minimum score for a LAMP barcode to be classified.
  --disable_pings                  Disable the transmission of telemetry
--min_score_lamp_mask
                                    pings.
Minimum score for a LAMP barcode mask context to be classified.
  --ping_url arg                    URL to send pings to
--min_score_lamp_target
  --ping_segment_duration arg      Duration in minutes of each ping segment.
Minimum score for a LAMP target to be classified.
  --progress_stats_frequency arg    Frequency in seconds in which to report
--max_pipeline_reads
                                    progress statistics, if supplied will
Maximum number of reads that can be processed by the pipeline at any one time.
                                    replace the default progress display.
--index
  -q [ --records_per_fastq ] arg    Maximum number of records per fastq file, 0
Output BAM index file.
                                    means use a single file (per worker, per
--bam_methylation_threshold
                                    run id).
The value below which a predicted methylation probability will not be emitted into a BAM file, expressed as a percentage.
  --read_batch_size arg            Maximum batch size, in reads, for grouping
--bam_out
                                    input files.
Output BAM files.
  --compress_fastq                  Compress fastq output files with gzip.
--barcode_nested_output_folder
  -i [ --input_path ] arg          Path to input fast5 files.
If flagged output FastQ/BAM files will be written to a nested folder structure, based on: protocol_group/barcode_arrangement/sample/protocol/qscore_pass_fail/
  --input_file_list arg            Optional file containing list of input
--compress_fastq
                                    fast5 files to process from the input_path.
Compress fastq output files with gzip.
  -s [ --save_path ] arg            Path to save fastq files.
-c [ --config ]
  -l [ --read_id_list ] arg        File containing list of read ids to filter
Configuration file for application.
                                    to
-d [ --data_path ]
  -r [ --recursive ]                Search for input files recursively.
Path to use for loading any data files the application requires.
  --fast5_out                      Choice of whether to do fast5 output.
-x [ --device ]
  --bam_out                        Choice of whether to do BAM file output.
Specify GPU device: 'auto', or 'cuda:<device_id>'.
  --bam_methylation_threshold arg  The value below which a predicted  
--flowcell
                                    methylation probability will not be emitted
Flowcell to find a configuration for.
                                    into a BAM file, expressed as a percentage.
--input_file_list
                                    Default is 5.0(%).
Optional file containing list of input fast5/pod5 files to process from the input_path.
  --resume                          Resume a previous basecall run using the
-i [ --input_path ]
                                    same output folder.
Path to input files.
  --client_id arg                  Optional unique identifier (non-negative
--kit
                                    integer) for this instance of the Guppy
Kit to find a configuration for.
                                    Client Basecaller, if supplied will form
--load_scaling_info_from_read_files
                                    part of the output filenames.
If flagged, scaling information in source fast5 or pod5 files will read and used if present.
  --nested_output_folder            If flagged output fastq files will be
--moves_out
                                    written to a nested folder structure, based
Return move table in output BAM file.
                                    on: protocol_group/sample/protocol/qscore_p
--nested_output_folder
                                    ass_fail/barcode_arrangement/
If flagged output FastQ/BAM files will be written to a nested folder structure, based on: protocol_group/sample/protocol/qscore_pass_fail/barcode_arrangement/
  --max_queued_reads arg            Maximum number of reads to be submitted for
--print_workflows
                                    processing at any one time.
Output available workflows.
  -h [ --help ]                     produce help message
--progress_stats_frequency
  -v [ --version ]                  print version number
Frequency in seconds in which to report progress statistics, if supplied will replace the default progress display.
  -c [ --config ] arg              Config file to use
-z [ --quiet ]
  -d [ --data_path ] arg            Path to use for loading any data files the
Quiet mode. Nothing will be output to STDOUT if this option is set.
                                    application requires.
--read_batch_size
 
Maximum batch size, in reads, for grouping input files.
-l [ --read_id_list ]
File containing list of read ids to filter to.
-q [ --records_per_fastq ]
Maximum number of records per fastq file, 0 means use a single file (per run id, per batch).
-r [ --recursive ]
Search for input file recursively.
--resume
Resume a previous basecall run using the same output folder.
-s [ --save_path ]
Path to save output files.
-h [ --help ]
Display the application usage help.
-v [ --version ]
Display the application version information.
--skip_model_versions
Skip display of model versions in output of available workflows when using --print_workflows.
--trace_category_logs
Enable trace logs - list of strings with the desired names.
--trace_domains_config
Configuration file containing list of trace domains to include in verbose logging (if enabled)
--verbose_logs
Enable verbose logs.
--do_read_splitting
Perform read splitting based on mid-strand adapter detection.
--max_read_split_depth
The maximum number of iterations of read splitting that should be performed.
--min_score_read_splitting
Minimum alignment score for the mid adapter on which to split the read.
--num_read_splitting_buffers
Number of GPU memory buffers to allocate to perform read splitting. Controls level of parallelism on GPU for read splitting using mid adapter detection.
--num_read_splitting_threads
Number of worker threads to use for read splitting.
--sample_sheet
Optional file containing sample sheet. Used to provide an alias for barcode results.
--disable_pings
Disable the transmission of telemetry pings.
--ping_segment_duration
Duration in minutes of each ping segment.
--ping_url
URL to send pings to.
</pre>
</pre>



Latest revision as of 13:53, 13 May 2024

Category

Bioinformatics

Program On

Sapelo2

Version

6.5.7

Author / Distributor

Oxford Nanopore Technologies, Limited.

Description

Ont-Guppy is a basecalling software. For more information, please see https://nanoporetech.com/

Running Program

Also refer to Running Jobs on Sapelo2

For more information on Environment Modules on Sapelo2 please see the Lmod page.


Version 6.5.7, for GPU

  • Version 4.4.2, for GPU is installed in /apps/eb/ont-guppy/6.5.7-CUDA-11.7.0 and it can be run an a GPU device.

To use this version of Guppy, please first load the module with

ml ont-guppy/6.5.7-CUDA-11.7.0


Sample job submission script (sub.sh) to run guppy_basecaller version 4.4.2 on a GPU node:

#!/bin/bash
#SBATCH --partition=gpu_p
#SBATCH --job-name=guppyjobname
#SBATCH --gres=gpu:P100:1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --time=48:00:00
#SBATCH --mem=10G

cd $SLURM_SUBMIT_DIR

ml ont-guppy/6.5.7-CUDA-11.7.0

guppy_basecaller -x "cuda:0" [options]

where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well. If the -x "cuda:0" option is not included, guppy_basecaller will default to only use the CPUs.

where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.


Submit the job to the queue with

sbatch sub.sh

Documentation

[cft07037@b1-24 ~]$ ml ont-guppy/6.5.7-CUDA-11.7.0
[cft07037@b1-24 ~]$ guppy_basecaller -h
: Guppy Basecalling Software, (C) Oxford Nanopore Technologies plc. Version 6.5.7+ca6d6af, minimap2 version 2.24-r1122

Use of this software is permitted solely under the terms of the end user license agreement (EULA).
By running, copying or accessing this software, you are demonstrating your acceptance of the EULA.
The EULA may be found in /apps/eb/ont-guppy/6.5.7-CUDA-11.7.0/bin

Usage:

With config file:
  guppy_basecaller -i <input path> -s <save path> -c <config file> [options]
With flowcell and kit name:
  guppy_basecaller -i <input path> -s <save path> --flowcell <flowcell name>
    --kit <kit name>
List supported flowcells and kits:
  guppy_basecaller --print_workflows

Use GPU for basecalling:
  guppy_basecaller -i <input path> -s <save path> -c <config file>
    --device <cuda device name> [options]

Command line parameters:
--adapter_pt_range_scale
	Set polyT/adapter range scale for setting read scaling median absolute deviation.
--as_cpu_threads_per_scaler
	Number of CPU worker threads per adapter scaler.
--dmean_threshold
	Threshold for coarse stall event detection
--dmean_win_size
	Window size for coarse stall event detection
--as_gpu_runners_per_device
	Number of runners per GPU device for adapter scaling.
--jump_threshold
	Threshold level for rna stall detection
--max_search_len
	Maximum number of samples to search through for the stall
--as_model_file
	Path to JSON model file for adapter scaling.
--noisiest_section_scaling_max_size
	Threshold read size in samples under which nosiest-section scaling will be performed.
--as_num_scalers
	Number of parallel scalers for adapter scaling.
--override_scaling
	Manually provide scaling parameters rather than estimating them from each read.
--pt_median_offset
	Set polyT median offset for setting read scaling median.
--pt_minimum_read_start_index
	Set minimum index for read start sample required to attempt polyT scaling.
--pt_required_adapter_drop
	Set minimum required current drop from adapter max to polyT detection.
--pt_scaling
	Enable polyT/adapter max detection for read scaling.
--as_reads_per_runner
	Maximum reads per runner for adapter scaling.
--scaling_mad
	Median absolute deviation to use for manual scaling.
--scaling_med
	Median current value to use for manual scaling.
--trim_min_events
	Adapter trimmer minimum stride intervals after stall that must be seen.
--trim_strategy
	Trimming strategy to apply: 'dna' or 'rna' (or 'none' to disable trimming)
--trim_threshold
	Threshold above which data will be trimmed (in standard deviations of current level distribution).
--use_quantile_scaling
	Use quantiles to calculate scaling values when basecalling
--alignment_filtering
	Specify whether to filter reads based on their alignment status
--align_type
	Specify whether you want full or coarse alignment. Valid values are (auto/full/coarse).
--bed_file
	Path to .bed file containing areas of interest in reference genome.
-a [ --align_ref ]
	Reference FASTA or index file.
--minimap_opt_string
	Specify minimap2 options. See `guppy_basecaller --minimap_opt_string --help` for details).
--num_alignment_threads
	Number of worker threads to use for alignment.
--allow_inferior_barcodes
	Reads will still be classified even if both the barcodes at the front and rear (if applicable) were not the best scoring barcodes above the min_score.
--barcode_kits
	Space separated list of barcoding kit(s) or expansion kit(s) to detect against. Must be in double quotes.
--barcode_list
	Optional list of barcodes to look for.
--detect_adapter
	Detect adapter sequences at the front and rear of the read.
--detect_barcodes
	Detect barcode sequences at the front and rear of the read.
--detect_mid_strand_adapter
	Detect adapter sequences within reads.
--detect_mid_strand_barcodes
	Search for barcodes through the entire length of the read.
--detect_primer
	Detect primer sequences at the front and rear of the read.
--disable_barcode_sample_sheet_restricting
	Disable filtering of barcodes based on the sample sheet in use.
--enable_trim_barcodes
	Enable trimming of barcodes from the sequences in the output files. By default is false, barcodes will not be trimmed.
--front_window_size
	Window size for the beginning barcode.
--min_score_adapter
	Minimum score for an adapter to be considered a valid alignment.
--min_score_adapter_mid
	Minimum score for a mid-strand adapter to be considered a valid alignment.
--min_score_barcode_front
	Minimum score to consider a front barcode to be a valid barcode alignment.
--min_score_barcode_mask
	Minimum score for a barcode context to be considered a valid alignment.
--min_score_barcode_mid
	Minimum score for a barcode to be detected in the middle of a read.
--min_score_barcode_rear
	Minimum score to consider a rear barcode to be a valid alignment (and min_score_front will then be used for the front only when this is set).
--min_score_primer
	Minimum score for a primer to be considered to be a valid alignment.
--num_barcoding_buffers
	Number of GPU memory buffers to allocate to perform barcoding into. Controls level of parallelism on GPU for barcoding.
--num_barcoding_threads
	Number of worker threads to use for barcoding.
--num_extra_bases_trim
	How vigorous to be in trimming the barcode. Default is 0 i.e. the length of the detected barcode. A positive integer means extra bases will be trimmed, a negative number is how many fewer bases (less vigorous) will be trimmed.
--num_mid_barcoding_buffers
	Number of GPU memory buffers to allocate to perform barcoding into. Controls level of parallelism on GPU for mid barcoding.
--num_reads_per_barcoding_buffer
	The maximum number of reads to process at once in each barcoding buffer.
--rear_window_size
	Window size for the ending barcode.
--require_barcodes_both_ends
	Reads will only be classified if there is a barcode above the min_score at both ends of the read.
--trim_adapters
	Trim the adapters from the sequences in the output files.
--trim_primers
	Trim the primers from the sequences in the output files.
--beam_cut
	Beam score cutoff for beam search decoding.
--beam_width
	Beam width to use in beam search decode.
--builtin_scripts
	Whether to use GPU kernels that were included at compile-time.
--chunk_size
	Stride intervals per chunk.
--chunks_per_caller
	Soft limit on number of chunks in each caller's queue. New reads will not be queued while this is exceeded.
--chunks_per_runner
	Maximum chunks per runner.
--cpu_threads_per_caller
	Number of CPU worker threads per basecaller.
--disable_qscore_filtering
	Disable filtering of reads into PASS/FAIL folders based on min qscore.
--dorado_model_path
	Path to dorado model folder.
--dorado_modbase_models
	Names of Remora models for modified base detection.
--duplex_window_size_max
	Maximum window size to use for prefix search in duplex decoding.
--duplex_window_size_min
	Minimum window size to use for prefix search in duplex decoding.
--gpu_runners_per_device
	Number of runners per GPU device.
--high_priority_threshold
	Number of high priority chunks to process for each medium priority chunk.
--int8_mode
	Enable quantised int8 mode for kernels which support it.
-k [ --kernel_path ]
	Path to GPU kernel files location (only needed if builtin_scripts is false).
--log_speed_frequency
	How often to print out basecalling speed.
--medium_priority_threshold
	Number of medium priority chunks to process for each low priority chunk.
--min_qscore
	Minimum acceptable qscore for a read to be filtered into the PASS folder.
-m [ --model_file ]
	Path to JSON model file.
--num_base_mod_threads
	The number of threads to use for Remora modified base detection in GPU basecalling mode.
--num_callers
	Number of parallel basecallers to create.
--overlap
	Overlap between chunks (in stride intervals).
--post_out
	Return full posterior matrix in output fast5 file and/or called read message from server.
--qscore_offset
	Qscore calibration offset.
--qscore_scale
	Qscore calibration scale factor.
--reverse_sequence
	Reverse the called sequence (for RNA sequencing).
--stay_penalty
	Scaling factor to apply to stay probability calculation during transducer decode.
--temp_bias
	Temperature adjustment for bias vector in softmax layer of RNN.
--temp_weight
	Temperature adjustment for weight matrix in softmax layer of RNN.
--u_substitution
	Substitute 'U' for 'T' in the called sequence (for RNA sequencing).
--calib_detect
	Enable calibration strand detection and filtering.
--calib_reference
	Reference FASTA file containing calibration strand.
--additional_lamp_context_bases
	Number of bases from a lamp FIP barcode context to append to the front and read of the FIP barcode before performing matching. Default is 2.
--lamp_kit
	LAMP barcoding kit to perform LAMP detection against.
--min_length_lamp_context
	Minimum align length for a LAMP barcode mask context to be classified.
--min_length_lamp_target
	Minimum align length for a LAMP target to be classified.
--min_score_lamp
	Minimum score for a LAMP barcode to be classified.
--min_score_lamp_mask
	Minimum score for a LAMP barcode mask context to be classified.
--min_score_lamp_target
	Minimum score for a LAMP target to be classified.
--max_pipeline_reads
	Maximum number of reads that can be processed by the pipeline at any one time.
--index
	Output BAM index file.
--bam_methylation_threshold
	The value below which a predicted methylation probability will not be emitted into a BAM file, expressed as a percentage.
--bam_out
	Output BAM files.
--barcode_nested_output_folder
	If flagged output FastQ/BAM files will be written to a nested folder structure, based on: protocol_group/barcode_arrangement/sample/protocol/qscore_pass_fail/
--compress_fastq
	Compress fastq output files with gzip.
-c [ --config ]
	Configuration file for application.
-d [ --data_path ]
	Path to use for loading any data files the application requires.
-x [ --device ]
	Specify GPU device: 'auto', or 'cuda:<device_id>'.
--flowcell
	Flowcell to find a configuration for.
--input_file_list
	Optional file containing list of input fast5/pod5 files to process from the input_path.
-i [ --input_path ]
	Path to input files.
--kit
	Kit to find a configuration for.
--load_scaling_info_from_read_files
	If flagged, scaling information in source fast5 or pod5 files will read and used if present.
--moves_out
	Return move table in output BAM file.
--nested_output_folder
	If flagged output FastQ/BAM files will be written to a nested folder structure, based on: protocol_group/sample/protocol/qscore_pass_fail/barcode_arrangement/
--print_workflows
	Output available workflows.
--progress_stats_frequency
	Frequency in seconds in which to report progress statistics, if supplied will replace the default progress display.
-z [ --quiet ]
	Quiet mode. Nothing will be output to STDOUT if this option is set.
--read_batch_size
	Maximum batch size, in reads, for grouping input files.
-l [ --read_id_list ]
	File containing list of read ids to filter to.
-q [ --records_per_fastq ]
	Maximum number of records per fastq file, 0 means use a single file (per run id, per batch).
-r [ --recursive ]
	Search for input file recursively.
--resume
	Resume a previous basecall run using the same output folder.
-s [ --save_path ]
	Path to save output files.
-h [ --help ]
	Display the application usage help.
-v [ --version ]
	Display the application version information.
--skip_model_versions
	Skip display of model versions in output of available workflows when using --print_workflows.
--trace_category_logs
	Enable trace logs - list of strings with the desired names.
--trace_domains_config
	Configuration file containing list of trace domains to include in verbose logging (if enabled)
--verbose_logs
	Enable verbose logs.
--do_read_splitting
	Perform read splitting based on mid-strand adapter detection.
--max_read_split_depth
	The maximum number of iterations of read splitting that should be performed.
--min_score_read_splitting
	Minimum alignment score for the mid adapter on which to split the read.
--num_read_splitting_buffers
	Number of GPU memory buffers to allocate to perform read splitting. Controls level of parallelism on GPU for read splitting using mid adapter detection.
--num_read_splitting_threads
	Number of worker threads to use for read splitting.
--sample_sheet
	Optional file containing sample sheet. Used to provide an alias for barcode results.
--disable_pings
	Disable the transmission of telemetry pings.
--ping_segment_duration
	Duration in minutes of each ping segment.
--ping_url
	URL to send pings to.

Installation

  • Binaries downloaded from the Oxford Nanopore Technologies site.

System

64-bit Linux