Ont-Guppy-Sapelo2

From Research Computing Center Wiki
Jump to navigation Jump to search

Category

Bioinformatics

Program On

Sapelo2

Version

4.4.2

Author / Distributor

Oxford Nanopore Technologies, Limited.

Description

Ont-Guppy is a basecalling software. For more information, please see https://nanoporetech.com/

Running Program

Also refer to Running Jobs on Sapelo2

For more information on Environment Modules on Sapelo2 please see the Lmod page.


Version 4.4.2, for GPU

  • Version 4.4.2, for GPU is installed in /apps/eb/ont-guppy/4.4.2-GPU and it can be run an a P100 or a V100 GPU device. This version does not work on the K20 or K40 GPU devices.

To use this version of Guppy, please first load the module with

module load ont-guppy/4.4.2-GPU


Version 4.4.2, for CPU

  • Version 4.4.2, for CPU is installed in /apps/eb/ont-guppy/4.4.2-CPU

To use this version of Guppy, please first load the module with

module load ont-guppy/4.4.2-CPU


Sample job submission script (sub.sh) to run guppy_basecaller version 4.4.2 on a GPU node:

#!/bin/bash
#SBATCH --partition=gpu_p
#SBATCH --job-name=guppyjobname
#SBATCH --gres=gpu:P100:1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --time=48:00:00
#SBATCH --mem=10G

cd $SLURM_SUBMIT_DIR

ml ont-guppy/4.4.2-GPU

guppy_basecaller -x "cuda:0" [options]

where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.


Sample job submission script (sub.sh) to run guppy_basecaller version 4.4.2 on a CPU node:

#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=guppyjobname
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --time=48:00:00
#SBATCH --mem=10G

cd $SLURM_SUBMIT_DIR

ml ont-guppy/4.4.2-CPU

guppy_basecaller [options] 

where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.


Submit the job to the queue with

sbatch sub.sh

Documentation

 
[shtsai@b1-24 ~]$ ml ont-guppy/4.4.2-GPU
[shtsai@b1-24 ~]$ guppy_basecaller -h
: Guppy Basecalling Software, (C) Oxford Nanopore Technologies, Limited. Version 4.4.2+9623c16

Usage:

With config file:"
  guppy_basecaller -i <input path> -s <save path> -c <config file> [options]
With flowcell and kit name:
  guppy_basecaller -i <input path> -s <save path> --flowcell <flowcell name>
    --kit <kit name>
List supported flowcells and kits:
  guppy_basecaller --print_workflows

Use GPU for basecalling:
  guppy_basecaller -i <input path> -s <save path> -c <config file>
    --device <cuda device name> [options]
Command line parameters:
  --trim_threshold arg              Threshold above which data will be trimmed 
                                    (in standard deviations of current level 
                                    distribution).
  --trim_min_events arg             Adapter trimmer minimum stride intervals 
                                    after stall that must be seen.
  --max_search_len arg              Maximum number of samples to search through
                                    for the stall
  --override_scaling                Manually provide scaling parameters rather 
                                    than estimating them from each read.
  --scaling_med arg                 Median current value to use for manual 
                                    scaling.
  --scaling_mad arg                 Median absolute deviation to use for manual
                                    scaling.
  --trim_strategy arg               Trimming strategy to apply: 'dna' or 'rna' 
                                    (or 'none' to disable trimming)
  --dmean_win_size arg              Window size for coarse stall event 
                                    detection
  --dmean_threshold arg             Threshold for coarse stall event detection
  --jump_threshold arg              Threshold level for rna stall detection
  --pt_scaling                      Enable polyT/adapter max detection for read
                                    scaling.
  --pt_median_offset arg            Set polyT median offset for setting read 
                                    scaling median (default 2.5)
  --adapter_pt_range_scale arg      Set polyT/adapter range scale for setting 
                                    read scaling median absolute deviation 
                                    (default 5.2)
  --pt_required_adapter_drop arg    Set minimum required current drop from 
                                    adapter max to polyT detection. (default 
                                    30.0)
  --pt_minimum_read_start_index arg Set minimum index for read start sample 
                                    required to attempt polyT scaling. (default
                                    30)
  --as_model_file arg               Path to JSON model file for adapter 
                                    scaling.
  --as_gpu_runners_per_device arg   Number of runners per GPU device for 
                                    adapter scaling.
  --as_cpu_threads_per_scaler arg   Number of CPU worker threads per adapter 
                                    scaler
  --as_reads_per_runner arg         Maximum reads per runner for adapter 
                                    scaling.
  --as_num_scalers arg              Number of parallel scalers for adapter 
                                    scaling.
  -m [ --model_file ] arg           Path to JSON model file.
  -k [ --kernel_path ] arg          Path to GPU kernel files location (only 
                                    needed if builtin_scripts is false).
  -x [ --device ] arg               Specify basecalling device: 'auto', or 
                                    'cuda:<device_id>'.
  --builtin_scripts arg             Whether to use GPU kernels that were 
                                    included at compile-time.
  --chunk_size arg                  Stride intervals per chunk.
  --chunks_per_runner arg           Maximum chunks per runner.
  --chunks_per_caller arg           Soft limit on number of chunks in each 
                                    caller's queue. New reads will not be 
                                    queued while this is exceeded.
  --high_priority_threshold arg     Number of high priority chunks to process 
                                    for each medium priority chunk.
  --medium_priority_threshold arg   Number of medium priority chunks to process
                                    for each low priority chunk.
  --overlap arg                     Overlap between chunks (in stride 
                                    intervals).
  --gpu_runners_per_device arg      Number of runners per GPU device.
  --cpu_threads_per_caller arg      Number of CPU worker threads per 
                                    basecaller.
  --num_callers arg                 Number of parallel basecallers to create.
  --post_out                        Return full posterior matrix in output 
                                    fast5 file and/or called read message from 
                                    server.
  --stay_penalty arg                Scaling factor to apply to stay probability
                                    calculation during transducer decode.
  --qscore_offset arg               Qscore calibration offset.
  --qscore_scale arg                Qscore calibration scale factor.
  --temp_weight arg                 Temperature adjustment for weight matrix in
                                    softmax layer of RNN.
  --temp_bias arg                   Temperature adjustment for bias vector in 
                                    softmax layer of RNN.
  --beam_cut arg                    Beam score cutoff for beam search decoding.
  --beam_width arg                  Beam score cutoff for beam search decoding.
  --qscore_filtering                Enable filtering of reads into PASS/FAIL 
                                    folders based on min qscore.
  --min_qscore arg                  Minimum acceptable qscore for a read to be 
                                    filtered into the PASS folder
  --reverse_sequence arg            Reverse the called sequence (for RNA 
                                    sequencing).
  --u_substitution arg              Substitute 'U' for 'T' in the called 
                                    sequence (for RNA sequencing).
  --log_speed_frequency arg         How often to print out basecalling speed.
  --barcode_kits arg                Space separated list of barcoding kit(s) or
                                    expansion kit(s) to detect against. Must be
                                    in double quotes.
  --trim_barcodes                   Trim the barcodes from the output sequences
                                    in the FastQ files.
  --num_extra_bases_trim arg        How vigorous to be in trimming the barcode.
                                    Default is 0 i.e. the length of the 
                                    detected barcode. A positive integer means 
                                    extra bases will be trimmed, a negative 
                                    number is how many fewer bases (less 
                                    vigorous) will be trimmed.
  --arrangements_files arg          Files containing arrangements.
  --lamp_arrangements_files arg     Files containing lamp arrangements.
  --score_matrix_filename arg       File containing mismatch score matrix.
  --start_gap1 arg                  Gap penalty for aligning before the 
                                    reference.
  --end_gap1 arg                    Gap penalty for aligning after the 
                                    reference.
  --open_gap1 arg                   Penalty for opening a new gap in the 
                                    reference.
  --extend_gap1 arg                 Penalty for extending a gap in the 
                                    reference.
  --start_gap2 arg                  Gap penalty for aligning before the query.
  --end_gap2 arg                    Gap penalty for aligning after the query.
  --open_gap2 arg                   Penalty for opening a new gap in the query.
  --extend_gap2 arg                 Penalty for extending a gap in the query.
  --min_score arg                   Minimum score to consider a valid 
                                    alignment.
  --min_score_rear_override arg     Minimum score to consider a valid alignment
                                    for the rear barcode only (and min_score 
                                    will then be used for the front only when 
                                    this is set).
  --min_score_mask arg              Minimum score for a barcode context to 
                                    consider a valid alignment.
  --front_window_size arg           Window size for the beginning barcode.
  --rear_window_size arg            Window size for the ending barcode.
  --require_barcodes_both_ends      Reads will only be classified if there is a
                                    barcode above the min_score at both ends of
                                    the read.
  --allow_inferior_barcodes         Reads will still be classified even if both
                                    the barcodes at the front and rear (if 
                                    applicable) were not the best scoring 
                                    barcodes above the min_score.
  --detect_mid_strand_barcodes      Search for barcodes through the entire 
                                    length of the read.
  --min_score_mid_barcodes arg      Minimum score for a barcode to be detected 
                                    in the middle of a read.
  --lamp_kit arg                    LAMP barcoding kit to perform LAMP 
                                    detection against.
  --min_score_lamp arg              Minimum score for a LAMP barcode to be 
                                    classified.
  --min_score_lamp_mask arg         Minimum score for a LAMP barcode mask 
                                    context to be classified.
  --min_score_lamp_target arg       Minimum score for a LAMP target to be 
                                    classified.
  --additional_context_bases arg    Number of bases from a lamp FIP barcode 
                                    context to append to the front and rear of 
                                    the FIP barcode before performing matching.
                                    Default is 2.
  --min_length_lamp_context arg     Minimum align length for a LAMP barcode 
                                    mask context to be classified.
  --min_length_lamp_target arg      Minimum align length for a LAMP target to 
                                    be classified.
  --num_barcoding_buffers arg       Number of GPU memory buffers to allocate to
                                    perform barcoding into. Controls level of 
                                    parallelism on GPU for barcoding.
  --num_mid_barcoding_buffers arg   Number of GPU memory buffers to allocate to
                                    perform barcoding into. Controls level of 
                                    parallelism on GPU for mid barcoding.
  --num_barcode_threads arg         Number of worker threads to use for 
                                    barcoding.
  --calib_detect                    Enable calibration strand detection and 
                                    filtering.
  --calib_reference arg             Reference FASTA file containing calibration
                                    strand.
  --calib_min_sequence_length arg   Minimum sequence length for reads to be 
                                    considered candidate calibration strands.
  --calib_max_sequence_length arg   Maximum sequence length for reads to be 
                                    considered candidate calibration strands.
  --calib_min_coverage arg          Minimum reference coverage to pass 
                                    calibration strand detection.
  --print_workflows                 Output available workflows.
  --flowcell arg                    Flowcell to find a configuration for
  --kit arg                         Kit to find a configuration for
  -a [ --align_ref ] arg            Path to alignment reference.
  --bed_file arg                    Path to .bed file containing areas of 
                                    interest in reference genome.
  --num_alignment_threads arg       Number of worker threads to use for 
                                    alignment.
  -z [ --quiet ]                    Quiet mode. Nothing will be output to 
                                    STDOUT if this option is set.
  --trace_categories_logs arg       Enable trace logs - list of strings with 
                                    the desired names.
  --verbose_logs                    Enable verbose logs.
  --trace_domains_log arg           List of trace domains to include in verbose
                                    logging (if enabled),  '*' for all.
  --trace_domains_config arg        Configuration file containing list of trace
                                    domains to include in verbose logging (if 
                                    enabled), this will override 
                                    --trace_domain_logs
  --disable_pings                   Disable the transmission of telemetry 
                                    pings.
  --ping_url arg                    URL to send pings to
  --ping_segment_duration arg       Duration in minutes of each ping segment.
  --progress_stats_frequency arg    Frequency in seconds in which to report 
                                    progress statistics, if supplied will 
                                    replace the default progress display.
  -q [ --records_per_fastq ] arg    Maximum number of records per fastq file, 0
                                    means use a single file (per worker, per 
                                    run id).
  --read_batch_size arg             Maximum batch size, in reads, for grouping 
                                    input files.
  --compress_fastq                  Compress fastq output files with gzip.
  -i [ --input_path ] arg           Path to input fast5 files.
  --input_file_list arg             Optional file containing list of input 
                                    fast5 files to process from the input_path.
  -s [ --save_path ] arg            Path to save fastq files.
  -l [ --read_id_list ] arg         File containing list of read ids to filter 
                                    to
  -r [ --recursive ]                Search for input files recursively.
  --fast5_out                       Choice of whether to do fast5 output.
  --bam_out                         Choice of whether to do BAM file output.
  --bam_methylation_threshold arg   The value below which a predicted 
                                    methylation probability will not be emitted
                                    into a BAM file, expressed as a percentage.
                                    Default is 5.0(%).
  --resume                          Resume a previous basecall run using the 
                                    same output folder.
  --client_id arg                   Optional unique identifier (non-negative 
                                    integer) for this instance of the Guppy 
                                    Client Basecaller, if supplied will form 
                                    part of the output filenames.
  --nested_output_folder            If flagged output fastq files will be 
                                    written to a nested folder structure, based
                                    on: protocol_group/sample/protocol/qscore_p
                                    ass_fail/barcode_arrangement/
  --max_queued_reads arg            Maximum number of reads to be submitted for
                                    processing at any one time.
  -h [ --help ]                     produce help message
  -v [ --version ]                  print version number
  -c [ --config ] arg               Config file to use
  -d [ --data_path ] arg            Path to use for loading any data files the 
                                    application requires.

Installation

  • Binaries downloaded from the Oxford Nanopore Technologies site.

System

64-bit Linux