STAR-Teaching: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 9: Line 9:


=== Version ===
=== Version ===
2.6.0c
2.5.3a
   
   
=== Author / Distributor ===
=== Author / Distributor ===
Line 21: Line 21:
=== Running Program ===
=== Running Program ===


The last version of this application is at /usr/local/apps/eb/STAR/2.6.0c-foss-2016b
The last version of this application is at /usr/local/apps/eb/STAR/2.5.3a-foss-2016b


To use this version, please load the module with
To use this version, please load the module with
<pre class="gscript">
<pre class="gscript">
ml STAR/2.6.0c-foss-2016b  
ml STAR/2.5.3a-foss-2016b  
</pre>  
</pre>  


Line 43: Line 43:
   
   
cd $SLURM_SUBMIT_DIR<br>
cd $SLURM_SUBMIT_DIR<br>
ml STAR/2.6.0c-foss-2016b<br>     
ml STAR/2.5.3a-foss-2016b<br>     
STAR <u>[options]</u><br>   
STAR <u>[options]</u><br>   
</div>
</div>
Line 59: Line 59:
   
   
<pre  class="gcommand">
<pre  class="gcommand">
ml STAR/2.6.0c-foss-2016b  
ml STAR/2.5.3a-foss-2016b  
STAR  
STAR  
Usage: STAR  [options]... --genomeDir REFERENCE  --readFilesIn R1.fq R2.fq
Usage: STAR  [options]... --genomeDir REFERENCE  --readFilesIn R1.fq R2.fq
Line 76: Line 76:
### System
### System
sysShell            -
sysShell            -
     string: path to the shell binary, preferably bash, e.g. /bin/bash.
     string: path to the shell binary, preferrably bash, e.g. /bin/bash.
                     - ... the default shell is executed, typically /bin/sh. This was reported to fail on some Ubuntu systems - then you need to specify path to bash.
                     - ... the default shell is executed, typically /bin/sh. This was reported to fail on some Ubuntu systems - then you need to specify path to bash.


### Run Parameters
### Run Parameters
runMode                        alignReads
runMode                        alignReads
     string: type of the run.
     string: type of the run:
 
                                 alignReads            ... map reads
                                 alignReads            ... map reads
                                 genomeGenerate        ... generate genome files
                                 genomeGenerate        ... generate genome files
Line 99: Line 99:
     int: random number generator seed.
     int: random number generator seed.


### Genome Parameters


### Genome Parameters
genomeDir                  ./GenomeDir/
genomeDir                  ./GenomeDir/
     string: path to the directory where genome files are stored (for --runMode alignReads) or will be generated (for --runMode generateGenome)
     string: path to the directory where genome files are stored (if runMode!=generateGenome) or will be generated (if runMode==generateGenome)


genomeLoad                NoSharedMemory
genomeLoad                NoSharedMemory
     string: mode of shared memory usage for the genome files. Only used with --runMode alignReads.
     string: mode of shared memory usage for the genome files
                           LoadAndKeep    ... load genome into shared and keep it in memory after run
                           LoadAndKeep    ... load genome into shared and keep it in memory after run
                           LoadAndRemove  ... load genome into shared but remove it after run
                           LoadAndRemove  ... load genome into shared but remove it after run
Line 112: Line 112:
                           NoSharedMemory  ... do not use shared memory, each job will have its own private copy of the genome
                           NoSharedMemory  ... do not use shared memory, each job will have its own private copy of the genome


genomeFastaFiles            -
    string(s): path(s) to the fasta files with the genome sequences, separated by spaces. These files should be plain text FASTA files, they *cannot* be zipped.
                            Required for the genome generation (--runMode genomeGenerate). Can also be used in the mapping (--runMode alignReads) to add extra (new) sequences to the genome (e.g. spike-ins).


genomeChainFiles            -
    string: chain files for genomic liftover. Only used with --runMode liftOver .


genomeFileSizes            0
### Genome Generation Parameters
    uint(s)>0: genome files exact sizes in bytes. Typically, this should not be defined by the user.


genomeConsensusFile        -
genomeFastaFiles            -
     string: VCF file with consensus SNPs (i.e. alternative allele is the major (AF>0.5) allele)
     string(s): path(s) to the fasta files with genomic sequences for genome generation, separated by spaces. Only used if runMode==genomeGenerate. These files should be plain text FASTA files, they *cannot* be zipped.


### Genome Indexing Parameters - only used with --runMode genomeGenerate
genomeChrBinNbits          18
genomeChrBinNbits          18
     int: =log2(chrBin), where chrBin is the size of the bins for genome storage: each chromosome will occupy an integer number of bins. For a genome with large number of contigs, it is recommended to scale this parameter as min(18, log2[max(GenomeLength/NumberOfReferences,ReadLength)]).
     int: =log2(chrBin), where chrBin is the size of the bins for genome storage: each chromosome will occupy an integer number of bins


genomeSAindexNbases        14
genomeSAindexNbases        14
     int: length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. For small genomes, the parameter --genomeSAindexNbases must be scaled down to min(14, log2(GenomeLength)/2 - 1).
     int: length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches.


genomeSAsparseD            1
genomeSAsparseD            1
Line 138: Line 131:
     int: maximum length of the suffixes, has to be longer than read length. -1 = infinite.
     int: maximum length of the suffixes, has to be longer than read length. -1 = infinite.


genomeChainFiles            -
    string: chain files for genomic liftover
genomeFileSizes            0
    uint(s)>0: genome files exact sizes in bytes. Typically, this should not be defined by the user.


### Splice Junctions Database
### Splice Junctions Database
Line 168: Line 166:
Basic ... only small junction / transcript files
Basic ... only small junction / transcript files
All  ... all files including big Genome, SA and SAindex - this will create a complete genome directory
All  ... all files including big Genome, SA and SAindex - this will create a complete genome directory
### Variation parameters
varVCFfile                              -
    string: path to the VCF file that contains variation data.


### Input Files
### Input Files
Line 178: Line 172:


### Read Parameters
### Read Parameters
readFilesType              Fastx
    string: format of input read files
                            Fastx      ... FASTA or FASTQ
                            SAM SE      ... SAM or BAM single-end reads; for BAM use --readFilesCommand samtools view
                            SAM PE      ... SAM or BAM paired-end reads; for BAM use --readFilesCommand samtools view


readFilesIn                Read1 Read2
readFilesIn                Read1 Read2
     string(s): paths to files that contain input read1 (and, if needed,  read2)
     string(s): paths to files that contain input read1 (and, if needed,  read2)
readFilesPrefix            -
    string: preifx for the read files names, i.e. it will be added in front of the strings in --readFilesIn
                            -: no prefix


readFilesCommand            -
readFilesCommand            -
Line 222: Line 207:


### Limits
### Limits
limitGenomeGenerateRAM              31000000000
limitGenomeGenerateRAM              31000000000
     int>0: maximum available RAM (bytes) for genome generation
     int>0: maximum available RAM (bytes) for genome generation
Line 229: Line 215:


limitOutSAMoneReadBytes              100000
limitOutSAMoneReadBytes              100000
     int>0: max size of the SAM record (bytes) for one read. Recommended value: >(2*(LengthMate1+LengthMate2+100)*outFilterMultimapNmax
     int>0: max size of the SAM record for one read. Recommended value: >(2*(LengthMate1+LengthMate2+100)*outFilterMultimapNmax


limitOutSJoneRead                    1000
limitOutSJoneRead                    1000
Line 238: Line 224:


limitBAMsortRAM                        0
limitBAMsortRAM                        0
     int>=0: maximum available RAM (bytes) for sorting BAM. If =0, it will be set to the genome index size. 0 value can only be used with --genomeLoad NoSharedMemory option.
     int>=0: maximum available RAM for sorting BAM. If =0, it will be set to the genome index size. 0 value can only be used with --genomeLoad NoSharedMemory option.


limitSjdbInsertNsj                    1000000
limitSjdbInsertNsj                    1000000
Line 302: Line 288:
outSAMattributes                Standard
outSAMattributes                Standard
     string: a string of desired SAM attributes, in the order desired for the output SAM
     string: a string of desired SAM attributes, in the order desired for the output SAM
                                 NH HI AS nM NM MD jM jI XS MC ch ... any combination in any order
                                 NH HI AS nM NM MD jM jI XS ch ... any combination in any order
                                None        ... no attributes
                                 Standard   ... NH HI AS nM
                                 Standard   ... NH HI AS nM
                                 All       ... NH HI AS nM NM MD jM jI ch
                                 All         ... NH HI AS nM NM MD jM jI MC ch
                                 None      ... no attributes
                                 vA          ... variant allele
                                vG          ... genomic coordiante of the variant overlapped by the read
                                vW          ... 0/1 - alignment does not pass / passes WASP filtering. Requires --waspOutputMode SAMtag .
                                Unsupported/undocumented:
                                rB          ... alignment block read/genomic coordinates
                                vR          ... read coordinate of the variant


outSAMattrIHstart              1
outSAMattrIHstart              1
Line 322: Line 302:
                                 Within ... output unmapped reads within the main SAM file (i.e. Aligned.out.sam)
                                 Within ... output unmapped reads within the main SAM file (i.e. Aligned.out.sam)
                                 2nd word:
                                 2nd word:
                                 KeepPairs ... record unmapped mate for each alignment, and, in case of unsorted output, keep it adjacent to its mapped mate. Only affects multi-mapping reads.
                                 KeepPairs ... record unmapped mate for each alignment, and, in case of unsorted output, keep it adjacent to its mapped mate.
                                              Only affects multi-mapping reads


outSAMorder                    Paired
outSAMorder                    Paired
Line 372: Line 353:
     int: max number of multiple alignments for a read that will be output to the SAM/BAM files.
     int: max number of multiple alignments for a read that will be output to the SAM/BAM files.
                         -1 ... all alignments (up to --outFilterMultimapNmax) will be output
                         -1 ... all alignments (up to --outFilterMultimapNmax) will be output
outSAMtlen              1
    int: calculation method for the TLEN field in the SAM/BAM files
                        1 ... leftmost base of the (+)strand mate to rightmost base of the (-)mate. (+)sign for the (+)strand mate
                        2 ... leftmost base of any mate to rightmost base of any mate. (+)sign for the mate with the leftmost base. This is different from 1 for overlapping mates with protruding ends


outBAMcompression      1
outBAMcompression      1
Line 384: Line 360:
     int: >=0: number of threads for BAM sorting. 0 will default to min(6,--runThreadN).
     int: >=0: number of threads for BAM sorting. 0 will default to min(6,--runThreadN).


outBAMsortingBinsN      50
### BAM processing
    int: >0:  number of genome bins fo coordinate-sorting


### BAM processing
bamRemoveDuplicatesType  -
bamRemoveDuplicatesType  -
     string: mark duplicates in the BAM file, for now only works with (i) sorted BAM fed with inputBAMfile, and (ii) for paired-end alignments only
     string: mark duplicates in the BAM file, for now only works with (i) sorted BAM feeded with inputBAMfile, and (ii) for paired-end alignments only
                         -                      ... no duplicate removal/marking
                         -                      ... no duplicate removal/marking
                         UniqueIdentical        ... mark all multimappers, and duplicate unique mappers. The coordinates, FLAG, CIGAR must be identical
                         UniqueIdentical        ... mark all multimappers, and duplicate unique mappers. The coordinates, FLAG, CIGAR must be identical
Line 438: Line 412:


outFilterMismatchNoverLmax      0.3
outFilterMismatchNoverLmax      0.3
     real: alignment will be output only if its ratio of mismatches to *mapped* length is less than or equal to this value.
     float: alignment will be output only if its ratio of mismatches to *mapped* length is less than or equal to this value.


outFilterMismatchNoverReadLmax  1.0
outFilterMismatchNoverReadLmax  1.0
     real: alignment will be output only if its ratio of mismatches to *read* length is less than or equal to this value.
     float: alignment will be output only if its ratio of mismatches to *read* length is less than or equal to this value.




Line 448: Line 422:


outFilterScoreMinOverLread      0.66
outFilterScoreMinOverLread      0.66
     real: same as outFilterScoreMin, but  normalized to read length (sum of mates' lengths for paired-end reads)
     float: same as outFilterScoreMin, but  normalized to read length (sum of mates' lengths for paired-end reads)


outFilterMatchNmin              0
outFilterMatchNmin              0
Line 454: Line 428:


outFilterMatchNminOverLread    0.66
outFilterMatchNminOverLread    0.66
     real: sam as outFilterMatchNmin, but normalized to the read length (sum of mates' lengths for paired-end reads).
     float: sam as outFilterMatchNmin, but normalized to the read length (sum of mates' lengths for paired-end reads).


outFilterIntronMotifs          None
outFilterIntronMotifs          None
Line 462: Line 436:
RemoveNoncanonicalUnannotated  ... filter out alignments that contain non-canonical unannotated junctions when using annotated splice junctions database. The annotated non-canonical junctions will be kept.
RemoveNoncanonicalUnannotated  ... filter out alignments that contain non-canonical unannotated junctions when using annotated splice junctions database. The annotated non-canonical junctions will be kept.


outFilterIntronStrands          RemoveInconsistentStrands
 
    string: filter alignments
                RemoveInconsistentStrands      ... remove alignments that have junctions with inconsistent strands
                None                          ... no filtering


### Output Filtering: Splice Junctions
### Output Filtering: Splice Junctions
Line 534: Line 505:


seedSearchStartLmaxOverLread    1.0
seedSearchStartLmaxOverLread    1.0
     real: seedSearchStartLmax normalized to read length (sum of mates' lengths for paired-end reads)
     float: seedSearchStartLmax normalized to read length (sum of mates' lengths for paired-end reads)


seedSearchLmax      0
seedSearchLmax      0
Line 550: Line 521:
seedNoneLociPerWindow    10
seedNoneLociPerWindow    10
     int>0: max number of one seed loci per window
     int>0: max number of one seed loci per window
seedSplitMin                12
    int>0: min length of the seed sequences split by Ns or mate gap


alignIntronMin              21
alignIntronMin              21
Line 577: Line 545:


alignSplicedMateMapLminOverLmate 0.66
alignSplicedMateMapLminOverLmate 0.66
     real>0: alignSplicedMateMapLmin normalized to mate length
     float>0: alignSplicedMateMapLmin normalized to mate length


alignWindowsPerReadNmax    10000
alignWindowsPerReadNmax    10000
Line 602: Line 570:
                                             DiscordantPair ... report alignments with non-zero protrusion as discordant pairs
                                             DiscordantPair ... report alignments with non-zero protrusion as discordant pairs


alignSoftClipAtReferenceEnds   Yes
 
alignSoftClipAtReferenceEnds Yes
     string: allow the soft-clipping of the alignments past the end of the chromosomes
     string: allow the soft-clipping of the alignments past the end of the chromosomes
                                Yes ... allow
                        Yes ... allow
                                No  ... prohibit, useful for compatibility with Cufflinks
                        No  ... prohibit, useful for compatibility with Cufflinks
 
alignInsertionFlush    None
    string: how to flush ambiguous insertion positions
                        None    ... insertions are not flushed
                        Right  ... insertions are flushed to the right
 
### Paired-End reads: presently unsupported/undocumented
peOverlapNbasesMin          0
    int>=0:            minimum number of overlap bases to trigger mates merging and realignment
 
peOverlapMMp                0.1
    real, >=0 & <1:    maximum proportion of mismatched bases in the overlap area


### Windows, Anchors, Binning
### Windows, Anchors, Binning
Line 634: Line 591:


winReadCoverageRelativeMin      0.5
winReadCoverageRelativeMin      0.5
     real>=0: minimum relative coverage of the read sequence by the seeds in a window, for STARlong algorithm only.
     float>=0: minimum relative coverage of the read sequence by the seeds in a window, for STARlong algorithm only.


winReadCoverageBasesMin      0
winReadCoverageBasesMin      0
Line 640: Line 597:


### Chimeric Alignments
### Chimeric Alignments
chimOutType                Junctions
chimOutType                SeparateSAMold
     string(s): type of chimeric output
     string(s): type of chimeric output
                             Junctions      ... Chimeric.out.junction
                             1st word:
                             SeparateSAMold  ... output old SAM into separate Chimeric.out.sam file
                             SeparateSAMold  ... output old SAM into separate Chimeric.out.sam file
                             WithinBAM      ... output into main aligned BAM files (Aligned.*.bam)
                             WithinBAM      ... output into main aligned BAM files (Aligned.*.bam)
                             WithinBAM HardClip  ... (default) hard-clipping in the CIGAR for supplemental chimeric alignments (defaultif no 2nd word is present)
                            2nd word:
                             WithinBAM HardClip  ... hard-clipping in the CIGAR for supplemental chimeric alignments (defaultif no 2nd word is present)
                             WithinBAM SoftClip  ... soft-clipping in the CIGAR for supplemental chimeric alignments
                             WithinBAM SoftClip  ... soft-clipping in the CIGAR for supplemental chimeric alignments


Line 655: Line 613:


chimScoreDropMax            20
chimScoreDropMax            20
     int>=0: max drop (difference) of chimeric score (the sum of scores of all chimeric segments) from the read length
     int>=0: max drop (difference) of chimeric score (the sum of scores of all chimeric segements) from the read length


chimScoreSeparation        10
chimScoreSeparation        10
Line 676: Line 634:
chimMainSegmentMultNmax        10
chimMainSegmentMultNmax        10
     int>=1: maximum number of multi-alignments for the main chimeric segment. =1 will prohibit multimapping main segments.
     int>=1: maximum number of multi-alignments for the main chimeric segment. =1 will prohibit multimapping main segments.
chimMultimapNmax                    0
    int>=0: maximum number of chimeric multi-alignments
                                0 ... use the old scheme for chimeric detection which only considered unique alignments
chimMultimapScoreRange          1
    int>=0: the score range for multi-mapping chimeras below the best chimeric score. Only works with --chimMultimapNmax > 1
chimNonchimScoreDropMin        20
    int>=0: to trigger chimeric detection, the drop in the best non-chimeric alignment score with respect to the read lenght has to be smaller than this value


### Quantification of Annotations
### Quantification of Annotations
Line 711: Line 659:
     int: number of reads to process for the 1st step. Use very large number (or default -1) to map all reads in the first step.
     int: number of reads to process for the 1st step. Use very large number (or default -1) to map all reads in the first step.


 
For more details see:
### WASP parameters
waspOutputMode              None
    string: WASP allele-specific output type. This is re-implemenation of the original WASP mappability filtering by Bryce van de Geijn, Graham McVicker, Yoav Gilad & Jonathan K Pritchard. Please cite the original WASP paper: Nature Methods 12, 1061–1063 (2015), https://www.nature.com/articles/nmeth.3582 .
                            SAMtag      ... add WASP tags to the alignments that pass WASP filtering
 
 
�For more details see:
<https://github.com/alexdobin/STAR>
<https://github.com/alexdobin/STAR>
<https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf>
<https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf>

Revision as of 14:45, 15 August 2018

Category

Bioinformatics

Program On

Teaching

Version

2.5.3a

Author / Distributor

STAR

Description

"STAR aligns RNA-seq reads to a reference genome using uncompressed suffix arrays." More details are at STAR

Running Program

The last version of this application is at /usr/local/apps/eb/STAR/2.5.3a-foss-2016b

To use this version, please load the module with

ml STAR/2.5.3a-foss-2016b 

Here is an example of a shell script, sub.sh, to run on the batch queue:

#!/bin/bash
#SBATCH --job-name=j_STAR
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=STAR.%j.out
#SBATCH --error=STAR.%j.err

cd $SLURM_SUBMIT_DIR
ml STAR/2.5.3a-foss-2016b
STAR [options]

In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.

Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.


Here is an example of job submission command:

sbatch ./sub.sh 

Documentation

ml STAR/2.5.3a-foss-2016b 
STAR 
Usage: STAR  [options]... --genomeDir REFERENCE   --readFilesIn R1.fq R2.fq
Spliced Transcripts Alignment to a Reference (c) Alexander Dobin, 2009-2015

### versions
versionSTAR             020201
    int>0: STAR release numeric ID. Please do not change this value!
versionGenome           020101 020200
    int>0: oldest value of the Genome version compatible with this STAR release. Please do not change this value!

### Parameter Files
parametersFiles          -
    string: name of a user-defined parameters file, "-": none. Can only be defined on the command line.

### System
sysShell            -
    string: path to the shell binary, preferrably bash, e.g. /bin/bash.
                    - ... the default shell is executed, typically /bin/sh. This was reported to fail on some Ubuntu systems - then you need to specify path to bash.

### Run Parameters

runMode                         alignReads
    string: type of the run:
                                alignReads             ... map reads
                                genomeGenerate         ... generate genome files
                                inputAlignmentsFromBAM ... input alignments from BAM. Presently only works with --outWigType and --bamRemoveDuplicates.
                                liftOver               ... lift-over of GTF files (--sjdbGTFfile) between genome assemblies using chain file(s) from --genomeChainFiles.

runThreadN                      1
    int: number of threads to run STAR

runDirPerm                      User_RWX
    string: permissions for the directories created at the run-time.
                                User_RWX ... user-read/write/execute
                                All_RWX  ... all-read/write/execute (same as chmod 777)

runRNGseed                      777
    int: random number generator seed.

### Genome Parameters

genomeDir                   ./GenomeDir/
    string: path to the directory where genome files are stored (if runMode!=generateGenome) or will be generated (if runMode==generateGenome)

genomeLoad                NoSharedMemory
    string: mode of shared memory usage for the genome files
                          LoadAndKeep     ... load genome into shared and keep it in memory after run
                          LoadAndRemove   ... load genome into shared but remove it after run
                          LoadAndExit     ... load genome into shared memory and exit, keeping the genome in memory for future runs
                          Remove          ... do not map anything, just remove loaded genome from memory
                          NoSharedMemory  ... do not use shared memory, each job will have its own private copy of the genome



### Genome Generation Parameters

genomeFastaFiles            -
    string(s): path(s) to the fasta files with genomic sequences for genome generation, separated by spaces. Only used if runMode==genomeGenerate. These files should be plain text FASTA files, they *cannot* be zipped.

genomeChrBinNbits           18
    int: =log2(chrBin), where chrBin is the size of the bins for genome storage: each chromosome will occupy an integer number of bins

genomeSAindexNbases         14
    int: length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches.

genomeSAsparseD             1
    int>0: suffux array sparsity, i.e. distance between indices: use bigger numbers to decrease needed RAM at the cost of mapping speed reduction

genomeSuffixLengthMax       -1
    int: maximum length of the suffixes, has to be longer than read length. -1 = infinite.

genomeChainFiles            -
    string: chain files for genomic liftover

genomeFileSizes             0
    uint(s)>0: genome files exact sizes in bytes. Typically, this should not be defined by the user.

### Splice Junctions Database
sjdbFileChrStartEnd                     -
    string(s): path to the files with genomic coordinates (chr <tab> start <tab> end <tab> strand) for the splice junction introns. Multiple files can be supplied wand will be concatenated.

sjdbGTFfile                             -
    string: path to the GTF file with annotations

sjdbGTFchrPrefix                        -
    string: prefix for chromosome names in a GTF file (e.g. 'chr' for using ENSMEBL annotations with UCSC genomes)

sjdbGTFfeatureExon                      exon
    string: feature type in GTF file to be used as exons for building transcripts

sjdbGTFtagExonParentTranscript          transcript_id
    string: tag name to be used as exons' transcript-parents (default "transcript_id" works for GTF files)

sjdbGTFtagExonParentGene                gene_id
    string: tag name to be used as exons' gene-parents (default "gene_id" works for GTF files)

sjdbOverhang                            100
    int>0: length of the donor/acceptor sequence on each side of the junctions, ideally = (mate_length - 1)

sjdbScore                               2
    int: extra alignment score for alignmets that cross database junctions

sjdbInsertSave                          Basic
    string: which files to save when sjdb junctions are inserted on the fly at the mapping step
					Basic ... only small junction / transcript files
					All   ... all files including big Genome, SA and SAindex - this will create a complete genome directory

### Input Files
inputBAMfile                -
    string: path to BAM input file, to be used with --runMode inputAlignmentsFromBAM

### Read Parameters

readFilesIn                 Read1 Read2
    string(s): paths to files that contain input read1 (and, if needed,  read2)

readFilesCommand             -
    string(s): command line to execute for each of the input file. This command should generate FASTA or FASTQ text and send it to stdout
               For example: zcat - to uncompress .gz files, bzcat - to uncompress .bz2 files, etc.

readMapNumber               -1
    int: number of reads to map from the beginning of the file
                            -1: map all reads

readMatesLengthsIn          NotEqual
    string: Equal/NotEqual - lengths of names,sequences,qualities for both mates are the same  / not the same. NotEqual is safe in all situations.

readNameSeparator           /
    string(s): character(s) separating the part of the read names that will be trimmed in output (read name after space is always trimmed)

clip3pNbases                 0
    int(s): number(s) of bases to clip from 3p of each mate. If one value is given, it will be assumed the same for both mates.

clip5pNbases                 0
    int(s): number(s) of bases to clip from 5p of each mate. If one value is given, it will be assumed the same for both mates.

clip3pAdapterSeq            -
    string(s): adapter sequences to clip from 3p of each mate.  If one value is given, it will be assumed the same for both mates.

clip3pAdapterMMp            0.1
    double(s): max proportion of mismatches for 3p adpater clipping for each mate.  If one value is given, it will be assumed the same for both mates.

clip3pAfterAdapterNbases    0
    int(s): number of bases to clip from 3p of each mate after the adapter clipping. If one value is given, it will be assumed the same for both mates.


### Limits

limitGenomeGenerateRAM               31000000000
    int>0: maximum available RAM (bytes) for genome generation

limitIObufferSize                    150000000
    int>0: max available buffers size (bytes) for input/output, per thread

limitOutSAMoneReadBytes              100000
    int>0: max size of the SAM record for one read. Recommended value: >(2*(LengthMate1+LengthMate2+100)*outFilterMultimapNmax

limitOutSJoneRead                    1000
    int>0: max number of junctions for one read (including all multi-mappers)

limitOutSJcollapsed                  1000000
    int>0: max number of collapsed junctions

limitBAMsortRAM                         0
    int>=0: maximum available RAM for sorting BAM. If =0, it will be set to the genome index size. 0 value can only be used with --genomeLoad NoSharedMemory option.

limitSjdbInsertNsj                     1000000
    int>=0: maximum number of junction to be inserted to the genome on the fly at the mapping stage, including those from annotations and those detected in the 1st step of the 2-pass run


### Output: general
outFileNamePrefix               ./
    string: output files name prefix (including full or relative path). Can only be defined on the command line.

outTmpDir                       -
    string: path to a directory that will be used as temporary by STAR. All contents of this directory will be removed!
            - the temp directory will default to outFileNamePrefix_STARtmp

outTmpKeep                      None
    string: whether to keep the tempporary files after STAR runs is finished
                                None ... remove all temporary files
                                All .. keep all files

outStd                          Log
    string: which output will be directed to stdout (standard out)
                                Log                    ... log messages
                                SAM                    ... alignments in SAM format (which normally are output to Aligned.out.sam file), normal standard output will go into Log.std.out
                                BAM_Unsorted           ... alignments in BAM format, unsorted. Requires --outSAMtype BAM Unsorted
                                BAM_SortedByCoordinate ... alignments in BAM format, unsorted. Requires --outSAMtype BAM SortedByCoordinate
                                BAM_Quant              ... alignments to transcriptome in BAM format, unsorted. Requires --quantMode TranscriptomeSAM

outReadsUnmapped                None
   string: output of unmapped and partially mapped (i.e. mapped only one mate of a paired end read) reads in separate file(s).
                                None    ... no output
                                Fastx   ... output in separate fasta/fastq files, Unmapped.out.mate1/2

outQSconversionAdd              0
   int: add this number to the quality score (e.g. to convert from Illumina to Sanger, use -31)

outMultimapperOrder             Old_2.4
    string: order of multimapping alignments in the output files
                                Old_2.4             ... quasi-random order used before 2.5.0
                                Random              ... random order of alignments for each multi-mapper. Read mates (pairs) are always adjacent, all alignment for each read stay together. This option will become default in the future releases.

### Output: SAM and BAM
outSAMtype                      SAM
    strings: type of SAM/BAM output
                                1st word:
                                BAM  ... output BAM without sorting
                                SAM  ... output SAM without sorting
                                None ... no SAM/BAM output
                                2nd, 3rd:
                                Unsorted           ... standard unsorted
                                SortedByCoordinate ... sorted by coordinate. This option will allocate extra memory for sorting which can be specified by --limitBAMsortRAM.

outSAMmode                      Full
    string: mode of SAM output
                                None ... no SAM output
                                Full ... full SAM output
                                NoQS ... full SAM but without quality scores

outSAMstrandField                               None
    string: Cufflinks-like strand field flag
                                None        ... not used
                                intronMotif ... strand derived from the intron motif. Reads with inconsistent and/or non-canonical introns are filtered out.

outSAMattributes                Standard
    string: a string of desired SAM attributes, in the order desired for the output SAM
                                NH HI AS nM NM MD jM jI XS ch ... any combination in any order
                                Standard   ... NH HI AS nM
                                All        ... NH HI AS nM NM MD jM jI ch
                                None       ... no attributes

outSAMattrIHstart               1
    int>=0:                     start value for the IH attribute. 0 may be required by some downstream software, such as Cufflinks or StringTie.

outSAMunmapped                  None
    string(s): output of unmapped reads in the SAM format
                                1st word:
                                None   ... no output
                                Within ... output unmapped reads within the main SAM file (i.e. Aligned.out.sam)
                                2nd word:
                                KeepPairs ... record unmapped mate for each alignment, and, in case of unsorted output, keep it adjacent to its mapped mate.
                                              Only affects multi-mapping reads

outSAMorder                     Paired
    string: type of sorting for the SAM output
                                Paired: one mate after the other for all paired alignments
                                PairedKeepInputOrder: one mate after the other for all paired alignments, the order is kept the same as in the input FASTQ files

outSAMprimaryFlag		OneBestScore
    string: which alignments are considered primary - all others will be marked with 0x100 bit in the FLAG
                                OneBestScore ... only one alignment with the best score is primary
                                AllBestScore ... all alignments with the best score are primary

outSAMreadID			Standard
    string: read ID record type
                                Standard ... first word (until space) from the FASTx read ID line, removing /1,/2 from the end
                                Number   ... read number (index) in the FASTx file

outSAMmapqUnique        255
    int: 0 to 255: the MAPQ value for unique mappers

outSAMflagOR           0
    int: 0 to 65535: sam FLAG will be bitwise OR'd with this value, i.e. FLAG=FLAG | outSAMflagOR. This is applied after all flags have been set by STAR, and after outSAMflagAND. Can be used to set specific bits that are not set otherwise.

outSAMflagAND           65535
    int: 0 to 65535: sam FLAG will be bitwise AND'd with this value, i.e. FLAG=FLAG & outSAMflagOR. This is applied after all flags have been set by STAR, but before outSAMflagOR. Can be used to unset specific bits that are not set otherwise.

outSAMattrRGline        -
    string(s): SAM/BAM read group line. The first word contains the read group identifier and must start with "ID:", e.g. --outSAMattrRGline ID:xxx CN:yy "DS:z z z".
            xxx will be added as RG tag to each output alignment. Any spaces in the tag values have to be double quoted.
            Comma separated RG lines correspons to different (comma separated) input files in --readFilesIn. Commas have to be surrounded by spaces, e.g.
            --outSAMattrRGline ID:xxx , ID:zzz "DS:z z" , ID:yyy DS:yyyy

outSAMheaderHD          -
    strings: @HD (header) line of the SAM header

outSAMheaderPG          -
    strings: extra @PG (software) line of the SAM header (in addition to STAR)

outSAMheaderCommentFile -
    string: path to the file with @CO (comment) lines of the SAM header

outSAMfilter            None
    string(s): filter the output into main SAM/BAM files
                        KeepOnlyAddedReferences ... only keep the reads for which all alignments are to the extra reference sequences added with --genomeFastaFiles at the mapping stage.
                        KeepAllAddedReferences ...  keep all alignments to the extra reference sequences added with --genomeFastaFiles at the mapping stage.


outSAMmultNmax          -1
    int: max number of multiple alignments for a read that will be output to the SAM/BAM files.
                        -1 ... all alignments (up to --outFilterMultimapNmax) will be output

outBAMcompression       1
    int: -1 to 10  BAM compression level, -1=default compression (6?), 0=no compression, 10=maximum compression

outBAMsortingThreadN    0
    int: >=0: number of threads for BAM sorting. 0 will default to min(6,--runThreadN).

### BAM processing

bamRemoveDuplicatesType  -
    string: mark duplicates in the BAM file, for now only works with (i) sorted BAM feeded with inputBAMfile, and (ii) for paired-end alignments only
                        -                       ... no duplicate removal/marking
                        UniqueIdentical         ... mark all multimappers, and duplicate unique mappers. The coordinates, FLAG, CIGAR must be identical
                        UniqueIdenticalNotMulti  ... mark duplicate unique mappers but not multimappers.

bamRemoveDuplicatesMate2basesN   0
    int>0: number of bases from the 5' of mate 2 to use in collapsing (e.g. for RAMPAGE)

### Output Wiggle
outWigType          None
    string(s): type of signal output, e.g. "bedGraph" OR "bedGraph read1_5p". Requires sorted BAM: --outSAMtype BAM SortedByCoordinate .
                    1st word:
                    None       ... no signal output
                    bedGraph   ... bedGraph format
                    wiggle     ... wiggle format
                    2nd word:
                    read1_5p   ... signal from only 5' of the 1st read, useful for CAGE/RAMPAGE etc
                    read2      ... signal from only 2nd read

outWigStrand        Stranded
    string: strandedness of wiggle/bedGraph output
                    Stranded   ...  separate strands, str1 and str2
                    Unstranded ...  collapsed strands

outWigReferencesPrefix    -
    string: prefix matching reference names to include in the output wiggle file, e.g. "chr", default "-" - include all references

outWigNorm              RPM
    string: type of normalization for the signal
                        RPM    ... reads per million of mapped reads
                        None   ... no normalization, "raw" counts

### Output Filtering
outFilterType                   Normal
    string: type of filtering
                                Normal  ... standard filtering using only current alignment
                                BySJout ... keep only those reads that contain junctions that passed filtering into SJ.out.tab

outFilterMultimapScoreRange     1
    int: the score range below the maximum score for multimapping alignments

outFilterMultimapNmax           10
    int: maximum number of loci the read is allowed to map to. Alignments (all of them) will be output only if the read maps to no more loci than this value.
         Otherwise no alignments will be output, and the read will be counted as "mapped to too many loci" in the Log.final.out .

outFilterMismatchNmax           10
    int: alignment will be output only if it has no more mismatches than this value.

outFilterMismatchNoverLmax      0.3
    float: alignment will be output only if its ratio of mismatches to *mapped* length is less than or equal to this value.

outFilterMismatchNoverReadLmax  1.0
    float: alignment will be output only if its ratio of mismatches to *read* length is less than or equal to this value.


outFilterScoreMin               0
    int: alignment will be output only if its score is higher than or equal to this value.

outFilterScoreMinOverLread      0.66
    float: same as outFilterScoreMin, but  normalized to read length (sum of mates' lengths for paired-end reads)

outFilterMatchNmin              0
    int: alignment will be output only if the number of matched bases is higher than or equal to this value.

outFilterMatchNminOverLread     0.66
    float: sam as outFilterMatchNmin, but normalized to the read length (sum of mates' lengths for paired-end reads).

outFilterIntronMotifs           None
    string: filter alignment using their motifs
				None                           ... no filtering
				RemoveNoncanonical             ... filter out alignments that contain non-canonical junctions
				RemoveNoncanonicalUnannotated  ... filter out alignments that contain non-canonical unannotated junctions when using annotated splice junctions database. The annotated non-canonical junctions will be kept.



### Output Filtering: Splice Junctions
outSJfilterReads                All
    string: which reads to consider for collapsed splice junctions output
                All: all reads, unique- and multi-mappers
                Unique: uniquely mapping reads only

outSJfilterOverhangMin          30  12  12  12
    4 integers:    minimum overhang length for splice junctions on both sides for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif
                                does not apply to annotated junctions

outSJfilterCountUniqueMin       3   1   1   1
    4 integers: minimum uniquely mapping read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif
                                Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied
                                does not apply to annotated junctions

outSJfilterCountTotalMin     3   1   1   1
    4 integers: minimum total (multi-mapping+unique) read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif
                                Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied
                                does not apply to annotated junctions

outSJfilterDistToOtherSJmin     10  0   5   10
    4 integers>=0: minimum allowed distance to other junctions' donor/acceptor
                                does not apply to annotated junctions

outSJfilterIntronMaxVsReadN        50000 100000 200000
    N integers>=0: maximum gap allowed for junctions supported by 1,2,3,,,N reads
                                i.e. by default junctions supported by 1 read can have gaps <=50000b, by 2 reads: <=100000b, by 3 reads: <=200000. by >=4 reads any gap <=alignIntronMax
                                does not apply to annotated junctions

### Scoring
scoreGap                     0
    int: splice junction penalty (independent on intron motif)

scoreGapNoncan               -8
    int: non-canonical junction penalty (in addition to scoreGap)

scoreGapGCAG                 -4
    GC/AG and CT/GC junction penalty (in addition to scoreGap)

scoreGapATAC                 -8
    AT/AC  and GT/AT junction penalty  (in addition to scoreGap)

scoreGenomicLengthLog2scale   -0.25
    extra score logarithmically scaled with genomic length of the alignment: scoreGenomicLengthLog2scale*log2(genomicLength)

scoreDelOpen                 -2
    deletion open penalty

scoreDelBase                 -2
    deletion extension penalty per base (in addition to scoreDelOpen)

scoreInsOpen                 -2
    insertion open penalty

scoreInsBase                 -2
    insertion extension penalty per base (in addition to scoreInsOpen)

scoreStitchSJshift           1
    maximum score reduction while searching for SJ boundaries inthe stitching step


### Alignments and Seeding

seedSearchStartLmax             50
    int>0: defines the search start point through the read - the read is split into pieces no longer than this value

seedSearchStartLmaxOverLread    1.0
    float: seedSearchStartLmax normalized to read length (sum of mates' lengths for paired-end reads)

seedSearchLmax       0
    int>=0: defines the maximum length of the seeds, if =0 max seed lengthis infinite

seedMultimapNmax      10000
    int>0: only pieces that map fewer than this value are utilized in the stitching procedure

seedPerReadNmax       1000
    int>0: max number of seeds per read

seedPerWindowNmax     50
    int>0: max number of seeds per window

seedNoneLociPerWindow    10
    int>0: max number of one seed loci per window

alignIntronMin              21
    minimum intron size: genomic gap is considered intron if its length>=alignIntronMin, otherwise it is considered Deletion

alignIntronMax              0
    maximum intron size, if 0, max intron size will be determined by (2^winBinNbits)*winAnchorDistNbins

alignMatesGapMax            0
    maximum gap between two mates, if 0, max intron gap will be determined by (2^winBinNbits)*winAnchorDistNbins

alignSJoverhangMin          5
    int>0: minimum overhang (i.e. block size) for spliced alignments

alignSJstitchMismatchNmax   0 -1 0 0
    4*int>=0: maximum number of mismatches for stitching of the splice junctions (-1: no limit).
                            (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif.

alignSJDBoverhangMin        3
    int>0: minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments

alignSplicedMateMapLmin     0
    int>0: minimum mapped length for a read mate that is spliced

alignSplicedMateMapLminOverLmate 0.66
    float>0: alignSplicedMateMapLmin normalized to mate length

alignWindowsPerReadNmax     10000
    int>0: max number of windows per read

alignTranscriptsPerWindowNmax       100
    int>0: max number of transcripts per window

alignTranscriptsPerReadNmax               10000
    int>0: max number of different alignments per read to consider

alignEndsType           Local
    string: type of read ends alignment
                        Local             ... standard local alignment with soft-clipping allowed
                        EndToEnd          ... force end-to-end read alignment, do not soft-clip
                        Extend5pOfRead1   ... fully extend only the 5p of the read1, all other ends: local alignment
                        Extend5pOfReads12 ... fully extend only the 5p of the both read1 and read2, all other ends: local alignment

alignEndsProtrude       0    ConcordantPair
    int, string:        allow protrusion of alignment ends, i.e. start (end) of the +strand mate downstream of the start (end) of the -strand mate
                        1st word: int: maximum number of protrusion bases allowed
                        2nd word: string: 
                                            ConcordantPair ... report alignments with non-zero protrusion as concordant pairs
                                            DiscordantPair ... report alignments with non-zero protrusion as discordant pairs


alignSoftClipAtReferenceEnds Yes
    string: allow the soft-clipping of the alignments past the end of the chromosomes
                        Yes ... allow
                        No  ... prohibit, useful for compatibility with Cufflinks

### Windows, Anchors, Binning

winAnchorMultimapNmax           50
    int>0: max number of loci anchors are allowed to map to

winBinNbits                     16
    int>0: =log2(winBin), where winBin is the size of the bin for the windows/clustering, each window will occupy an integer number of bins.

winAnchorDistNbins              9
    int>0: max number of bins between two anchors that allows aggregation of anchors into one window

winFlankNbins                   4
    int>0: log2(winFlank), where win Flank is the size of the left and right flanking regions for each window

winReadCoverageRelativeMin      0.5
    float>=0: minimum relative coverage of the read sequence by the seeds in a window, for STARlong algorithm only.

winReadCoverageBasesMin      0
    int>0: minimum number of bases covered by the seeds in a window , for STARlong algorithm only.

### Chimeric Alignments
chimOutType                 SeparateSAMold
    string(s): type of chimeric output
                            1st word:
                            SeparateSAMold  ... output old SAM into separate Chimeric.out.sam file
                            WithinBAM       ... output into main aligned BAM files (Aligned.*.bam)
                            2nd word:
                            WithinBAM HardClip  ... hard-clipping in the CIGAR for supplemental chimeric alignments (defaultif no 2nd word is present)
                            WithinBAM SoftClip  ... soft-clipping in the CIGAR for supplemental chimeric alignments

chimSegmentMin              0
    int>=0: minimum length of chimeric segment length, if ==0, no chimeric output

chimScoreMin                0
    int>=0: minimum total (summed) score of the chimeric segments

chimScoreDropMax            20
    int>=0: max drop (difference) of chimeric score (the sum of scores of all chimeric segements) from the read length

chimScoreSeparation         10
    int>=0: minimum difference (separation) between the best chimeric score and the next one

chimScoreJunctionNonGTAG    -1
    int: penalty for a non-GT/AG chimeric junction

chimJunctionOverhangMin     20
    int>=0: minimum overhang for a chimeric junction

chimSegmentReadGapMax       0
    int>=0: maximum gap in the read sequence between chimeric segments

chimFilter                  banGenomicN
    string(s): different filters for chimeric alignments
                            None ... no filtering
                            banGenomicN ... Ns are not allowed in the genome sequence around the chimeric junction

chimMainSegmentMultNmax        10
    int>=1: maximum number of multi-alignments for the main chimeric segment. =1 will prohibit multimapping main segments.

### Quantification of Annotations
quantMode                   -
    string(s): types of quantification requested
                            -                ... none
                            TranscriptomeSAM ... output SAM/BAM alignments to transcriptome into a separate file
                            GeneCounts       ... count reads per gene

quantTranscriptomeBAMcompression    1       1
    int: -1 to 10  transcriptome BAM compression level, -1=default compression (6?), 0=no compression, 10=maximum compression

quantTranscriptomeBan       IndelSoftclipSingleend
    string: prohibit various alignment type
                            IndelSoftclipSingleend  ... prohibit indels, soft clipping and single-end alignments - compatible with RSEM
                            Singleend               ... prohibit single-end alignments

### 2-pass Mapping
twopassMode                 None
    string: 2-pass mapping mode.
                            None        ... 1-pass mapping
                            Basic       ... basic 2-pass mapping, with all 1st pass junctions inserted into the genome indices on the fly

twopass1readsN              -1
    int: number of reads to process for the 1st step. Use very large number (or default -1) to map all reads in the first step.

For more details see:
<https://github.com/alexdobin/STAR>
<https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf>

Back to Top

Installation

Source code is obtained from STAR

System

64-bit Linux