Ont-Guppy-Sapelo2: Difference between revisions
Line 56: | Line 56: | ||
</pre> | </pre> | ||
where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well. | where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well. If the <code> -x "cuda:0" </code> option is not included, guppy_basecaller will default to only use the CPUs. | ||
Revision as of 10:53, 25 March 2021
Category
Bioinformatics
Program On
Sapelo2
Version
4.4.2
Author / Distributor
Oxford Nanopore Technologies, Limited.
Description
Ont-Guppy is a basecalling software. For more information, please see https://nanoporetech.com/
Running Program
Also refer to Running Jobs on Sapelo2
For more information on Environment Modules on Sapelo2 please see the Lmod page.
Version 4.4.2, for GPU
- Version 4.4.2, for GPU is installed in /apps/eb/ont-guppy/4.4.2-GPU and it can be run an a P100 or a V100 GPU device. This version does not work on the K20 or K40 GPU devices.
To use this version of Guppy, please first load the module with
module load ont-guppy/4.4.2-GPU
Version 4.4.2, for CPU
- Version 4.4.2, for CPU is installed in /apps/eb/ont-guppy/4.4.2-CPU
To use this version of Guppy, please first load the module with
module load ont-guppy/4.4.2-CPU
Sample job submission script (sub.sh) to run guppy_basecaller version 4.4.2 on a GPU node:
#!/bin/bash #SBATCH --partition=gpu_p #SBATCH --job-name=guppyjobname #SBATCH --gres=gpu:P100:1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=4 #SBATCH --time=48:00:00 #SBATCH --mem=10G cd $SLURM_SUBMIT_DIR ml ont-guppy/4.4.2-GPU guppy_basecaller -x "cuda:0" [options]
where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well. If the -x "cuda:0"
option is not included, guppy_basecaller will default to only use the CPUs.
Sample job submission script (sub.sh) to run guppy_basecaller version 4.4.2 on a CPU node:
#!/bin/bash #SBATCH --partition=batch #SBATCH --job-name=guppyjobname #SBATCH --ntasks=1 #SBATCH --cpus-per-task=4 #SBATCH --time=48:00:00 #SBATCH --mem=10G cd $SLURM_SUBMIT_DIR ml ont-guppy/4.4.2-CPU guppy_basecaller [options]
where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.
Submit the job to the queue with
sbatch sub.sh
Documentation
[shtsai@b1-24 ~]$ ml ont-guppy/4.4.2-GPU [shtsai@b1-24 ~]$ guppy_basecaller -h : Guppy Basecalling Software, (C) Oxford Nanopore Technologies, Limited. Version 4.4.2+9623c16 Usage: With config file:" guppy_basecaller -i <input path> -s <save path> -c <config file> [options] With flowcell and kit name: guppy_basecaller -i <input path> -s <save path> --flowcell <flowcell name> --kit <kit name> List supported flowcells and kits: guppy_basecaller --print_workflows Use GPU for basecalling: guppy_basecaller -i <input path> -s <save path> -c <config file> --device <cuda device name> [options] Command line parameters: --trim_threshold arg Threshold above which data will be trimmed (in standard deviations of current level distribution). --trim_min_events arg Adapter trimmer minimum stride intervals after stall that must be seen. --max_search_len arg Maximum number of samples to search through for the stall --override_scaling Manually provide scaling parameters rather than estimating them from each read. --scaling_med arg Median current value to use for manual scaling. --scaling_mad arg Median absolute deviation to use for manual scaling. --trim_strategy arg Trimming strategy to apply: 'dna' or 'rna' (or 'none' to disable trimming) --dmean_win_size arg Window size for coarse stall event detection --dmean_threshold arg Threshold for coarse stall event detection --jump_threshold arg Threshold level for rna stall detection --pt_scaling Enable polyT/adapter max detection for read scaling. --pt_median_offset arg Set polyT median offset for setting read scaling median (default 2.5) --adapter_pt_range_scale arg Set polyT/adapter range scale for setting read scaling median absolute deviation (default 5.2) --pt_required_adapter_drop arg Set minimum required current drop from adapter max to polyT detection. (default 30.0) --pt_minimum_read_start_index arg Set minimum index for read start sample required to attempt polyT scaling. (default 30) --as_model_file arg Path to JSON model file for adapter scaling. --as_gpu_runners_per_device arg Number of runners per GPU device for adapter scaling. --as_cpu_threads_per_scaler arg Number of CPU worker threads per adapter scaler --as_reads_per_runner arg Maximum reads per runner for adapter scaling. --as_num_scalers arg Number of parallel scalers for adapter scaling. -m [ --model_file ] arg Path to JSON model file. -k [ --kernel_path ] arg Path to GPU kernel files location (only needed if builtin_scripts is false). -x [ --device ] arg Specify basecalling device: 'auto', or 'cuda:<device_id>'. --builtin_scripts arg Whether to use GPU kernels that were included at compile-time. --chunk_size arg Stride intervals per chunk. --chunks_per_runner arg Maximum chunks per runner. --chunks_per_caller arg Soft limit on number of chunks in each caller's queue. New reads will not be queued while this is exceeded. --high_priority_threshold arg Number of high priority chunks to process for each medium priority chunk. --medium_priority_threshold arg Number of medium priority chunks to process for each low priority chunk. --overlap arg Overlap between chunks (in stride intervals). --gpu_runners_per_device arg Number of runners per GPU device. --cpu_threads_per_caller arg Number of CPU worker threads per basecaller. --num_callers arg Number of parallel basecallers to create. --post_out Return full posterior matrix in output fast5 file and/or called read message from server. --stay_penalty arg Scaling factor to apply to stay probability calculation during transducer decode. --qscore_offset arg Qscore calibration offset. --qscore_scale arg Qscore calibration scale factor. --temp_weight arg Temperature adjustment for weight matrix in softmax layer of RNN. --temp_bias arg Temperature adjustment for bias vector in softmax layer of RNN. --beam_cut arg Beam score cutoff for beam search decoding. --beam_width arg Beam score cutoff for beam search decoding. --qscore_filtering Enable filtering of reads into PASS/FAIL folders based on min qscore. --min_qscore arg Minimum acceptable qscore for a read to be filtered into the PASS folder --reverse_sequence arg Reverse the called sequence (for RNA sequencing). --u_substitution arg Substitute 'U' for 'T' in the called sequence (for RNA sequencing). --log_speed_frequency arg How often to print out basecalling speed. --barcode_kits arg Space separated list of barcoding kit(s) or expansion kit(s) to detect against. Must be in double quotes. --trim_barcodes Trim the barcodes from the output sequences in the FastQ files. --num_extra_bases_trim arg How vigorous to be in trimming the barcode. Default is 0 i.e. the length of the detected barcode. A positive integer means extra bases will be trimmed, a negative number is how many fewer bases (less vigorous) will be trimmed. --arrangements_files arg Files containing arrangements. --lamp_arrangements_files arg Files containing lamp arrangements. --score_matrix_filename arg File containing mismatch score matrix. --start_gap1 arg Gap penalty for aligning before the reference. --end_gap1 arg Gap penalty for aligning after the reference. --open_gap1 arg Penalty for opening a new gap in the reference. --extend_gap1 arg Penalty for extending a gap in the reference. --start_gap2 arg Gap penalty for aligning before the query. --end_gap2 arg Gap penalty for aligning after the query. --open_gap2 arg Penalty for opening a new gap in the query. --extend_gap2 arg Penalty for extending a gap in the query. --min_score arg Minimum score to consider a valid alignment. --min_score_rear_override arg Minimum score to consider a valid alignment for the rear barcode only (and min_score will then be used for the front only when this is set). --min_score_mask arg Minimum score for a barcode context to consider a valid alignment. --front_window_size arg Window size for the beginning barcode. --rear_window_size arg Window size for the ending barcode. --require_barcodes_both_ends Reads will only be classified if there is a barcode above the min_score at both ends of the read. --allow_inferior_barcodes Reads will still be classified even if both the barcodes at the front and rear (if applicable) were not the best scoring barcodes above the min_score. --detect_mid_strand_barcodes Search for barcodes through the entire length of the read. --min_score_mid_barcodes arg Minimum score for a barcode to be detected in the middle of a read. --lamp_kit arg LAMP barcoding kit to perform LAMP detection against. --min_score_lamp arg Minimum score for a LAMP barcode to be classified. --min_score_lamp_mask arg Minimum score for a LAMP barcode mask context to be classified. --min_score_lamp_target arg Minimum score for a LAMP target to be classified. --additional_context_bases arg Number of bases from a lamp FIP barcode context to append to the front and rear of the FIP barcode before performing matching. Default is 2. --min_length_lamp_context arg Minimum align length for a LAMP barcode mask context to be classified. --min_length_lamp_target arg Minimum align length for a LAMP target to be classified. --num_barcoding_buffers arg Number of GPU memory buffers to allocate to perform barcoding into. Controls level of parallelism on GPU for barcoding. --num_mid_barcoding_buffers arg Number of GPU memory buffers to allocate to perform barcoding into. Controls level of parallelism on GPU for mid barcoding. --num_barcode_threads arg Number of worker threads to use for barcoding. --calib_detect Enable calibration strand detection and filtering. --calib_reference arg Reference FASTA file containing calibration strand. --calib_min_sequence_length arg Minimum sequence length for reads to be considered candidate calibration strands. --calib_max_sequence_length arg Maximum sequence length for reads to be considered candidate calibration strands. --calib_min_coverage arg Minimum reference coverage to pass calibration strand detection. --print_workflows Output available workflows. --flowcell arg Flowcell to find a configuration for --kit arg Kit to find a configuration for -a [ --align_ref ] arg Path to alignment reference. --bed_file arg Path to .bed file containing areas of interest in reference genome. --num_alignment_threads arg Number of worker threads to use for alignment. -z [ --quiet ] Quiet mode. Nothing will be output to STDOUT if this option is set. --trace_categories_logs arg Enable trace logs - list of strings with the desired names. --verbose_logs Enable verbose logs. --trace_domains_log arg List of trace domains to include in verbose logging (if enabled), '*' for all. --trace_domains_config arg Configuration file containing list of trace domains to include in verbose logging (if enabled), this will override --trace_domain_logs --disable_pings Disable the transmission of telemetry pings. --ping_url arg URL to send pings to --ping_segment_duration arg Duration in minutes of each ping segment. --progress_stats_frequency arg Frequency in seconds in which to report progress statistics, if supplied will replace the default progress display. -q [ --records_per_fastq ] arg Maximum number of records per fastq file, 0 means use a single file (per worker, per run id). --read_batch_size arg Maximum batch size, in reads, for grouping input files. --compress_fastq Compress fastq output files with gzip. -i [ --input_path ] arg Path to input fast5 files. --input_file_list arg Optional file containing list of input fast5 files to process from the input_path. -s [ --save_path ] arg Path to save fastq files. -l [ --read_id_list ] arg File containing list of read ids to filter to -r [ --recursive ] Search for input files recursively. --fast5_out Choice of whether to do fast5 output. --bam_out Choice of whether to do BAM file output. --bam_methylation_threshold arg The value below which a predicted methylation probability will not be emitted into a BAM file, expressed as a percentage. Default is 5.0(%). --resume Resume a previous basecall run using the same output folder. --client_id arg Optional unique identifier (non-negative integer) for this instance of the Guppy Client Basecaller, if supplied will form part of the output filenames. --nested_output_folder If flagged output fastq files will be written to a nested folder structure, based on: protocol_group/sample/protocol/qscore_p ass_fail/barcode_arrangement/ --max_queued_reads arg Maximum number of reads to be submitted for processing at any one time. -h [ --help ] produce help message -v [ --version ] print version number -c [ --config ] arg Config file to use -d [ --data_path ] arg Path to use for loading any data files the application requires.
Installation
- Binaries downloaded from the Oxford Nanopore Technologies site.
System
64-bit Linux