Minimap2-Teaching

From Research Computing Center Wiki
Jump to navigation Jump to search

Category

Bioinformatics

Program On

Teaching

Version

2.10, 2.13, 2.17

Author / Distributor

minimap2

Description

"Minimap2 is a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database." More details are at minimap2

Running Program

Also refer to Running Jobs on the teaching cluster


  • version 2.10, installed in /usr/local/apps/eb/minimap2/2.10-foss-2016b.

To use this version of minimap2, please first load the module with

ml minimap2/2.10-foss-2016b 
  • version 2.13, installed in /usr/local/apps/eb/minimap2/2.13-foss-2016b.

To use this version of minimap2, please first load the module with

ml minimap2/2.13-foss-2016b 
  • version 2.17, installed in /usr/local/apps/eb/minimap2/2.17-foss-2018a

To use this version of minimap2, please first load the module with

ml minimap2/2.17-foss-2018a

Sample job submission script (sub.sh) to run minimap2 v. 2.10 in a batch job:

#!/bin/bash
#SBATCH --job-name=j_minimap2
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=2gb
#SBATCH --time=08:00:00
#SBATCH --output=minimap2.%j.out
#SBATCH --error=minimap2.%j.err

cd $SLURM_SUBMIT_DIR
ml minimap2/2.10-foss-2016b
minimap2 [options]

In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.


Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.


Here is an example of job submission command:

sbatch ./sub.sh 


Documentation

ml minimap2/2.10-foss-2016b 
minimap2 -h
Usage: minimap2 [options] <target.fa>|<target.idx> [query.fa] [...]
Options:
  Indexing:
    -H           use homopolymer-compressed k-mer
    -k INT       k-mer size (no larger than 28) [15]
    -w INT       minizer window size [10]
    -I NUM       split index for every ~NUM input bases [4G]
    -d FILE      dump index to FILE []
  Mapping:
    -f FLOAT     filter out top FLOAT fraction of repetitive minimizers [0.0002]
    -g NUM       stop chain enlongation if there are no minimizers in INT-bp [5000]
    -G NUM       max intron length (effective with -xsplice; changing -r) [200k]
    -F NUM       max fragment length (effective with -xsr or in the fragment mode) [800]
    -r NUM       bandwidth used in chaining and DP-based alignment [500]
    -n INT       minimal number of minimizers on a chain [3]
    -m INT       minimal chaining score (matching bases minus log gap penalty) [40]
    -X           skip self and dual mappings (for the all-vs-all mode)
    -p FLOAT     min secondary-to-primary score ratio [0.8]
    -N INT       retain at most INT secondary alignments [5]
  Alignment:
    -A INT       matching score [2]
    -B INT       mismatch penalty [4]
    -O INT[,INT] gap open penalty [4,24]
    -E INT[,INT] gap extension penalty; a k-long gap costs min{O1+k*E1,O2+k*E2} [2,1]
    -z INT[,INT] Z-drop score and inversion Z-drop score [400,200]
    -s INT       minimal peak DP alignment score [80]
    -u CHAR      how to find GT-AG. f:transcript strand, b:both strands, n:don't match GT-AG [n]
  Input/Output:
    -a           output in the SAM format (PAF by default)
    -Q           don't output base quality in SAM
    -L           write CIGAR with >65535 ops at the CG tag
    -R STR       SAM read group line in a format like '@RG\tID:foo\tSM:bar' []
    -c           output CIGAR in PAF
    --cs[=STR]   output the cs tag; STR is 'short' (if absent) or 'long' [none]
    --MD         output the MD tag
    -Y           use soft clipping for supplementary alignments
    -t INT       number of threads [3]
    -K NUM       minibatch size for mapping [500M]
    --version    show version number
  Preset:
    -x STR       preset (always applied before other options) []
                 map-pb: -Hk19 (PacBio vs reference mapping)
                 map-ont: -k15 (Oxford Nanopore vs reference mapping)
                 asm5: -k19 -w19 -A1 -B19 -O39,81 -E3,1 -s200 -z200 (asm to ref mapping; break at 5% div.)
                 asm10: -k19 -w19 -A1 -B9 -O16,41 -E2,1 -s200 -z200 (asm to ref mapping; break at 10% div.)
                 ava-pb: -Hk19 -Xw5 -m100 -g10000 --max-chain-skip 25 (PacBio read overlap)
                 ava-ont: -k15 -Xw5 -m100 -g10000 -r2000 --max-chain-skip 25 (ONT read overlap)
                 splice: long-read spliced alignment (see minimap2.1 for details)
                 sr: short single-end reads without splicing (see minimap2.1 for details)

See `man ./minimap2.1' for detailed description of command-line options.

Back to Top

Installation

source code from minimap2

System

64-bit Linux