STAR-Teaching: Difference between revisions
No edit summary |
No edit summary |
||
Line 9: | Line 9: | ||
=== Version === | === Version === | ||
2. | 2.5.3a | ||
=== Author / Distributor === | === Author / Distributor === | ||
Line 21: | Line 21: | ||
=== Running Program === | === Running Program === | ||
The last version of this application is at /usr/local/apps/eb/STAR/2. | The last version of this application is at /usr/local/apps/eb/STAR/2.5.3a-foss-2016b | ||
To use this version, please load the module with | To use this version, please load the module with | ||
<pre class="gscript"> | <pre class="gscript"> | ||
ml STAR/2. | ml STAR/2.5.3a-foss-2016b | ||
</pre> | </pre> | ||
Line 43: | Line 43: | ||
cd $SLURM_SUBMIT_DIR<br> | cd $SLURM_SUBMIT_DIR<br> | ||
ml STAR/2. | ml STAR/2.5.3a-foss-2016b<br> | ||
STAR <u>[options]</u><br> | STAR <u>[options]</u><br> | ||
</div> | </div> | ||
Line 59: | Line 59: | ||
<pre class="gcommand"> | <pre class="gcommand"> | ||
ml STAR/2. | ml STAR/2.5.3a-foss-2016b | ||
STAR | STAR | ||
Usage: STAR [options]... --genomeDir REFERENCE --readFilesIn R1.fq R2.fq | Usage: STAR [options]... --genomeDir REFERENCE --readFilesIn R1.fq R2.fq | ||
Line 76: | Line 76: | ||
### System | ### System | ||
sysShell - | sysShell - | ||
string: path to the shell binary, | string: path to the shell binary, preferrably bash, e.g. /bin/bash. | ||
- ... the default shell is executed, typically /bin/sh. This was reported to fail on some Ubuntu systems - then you need to specify path to bash. | - ... the default shell is executed, typically /bin/sh. This was reported to fail on some Ubuntu systems - then you need to specify path to bash. | ||
### Run Parameters | ### Run Parameters | ||
runMode alignReads | runMode alignReads | ||
string: type of the run | string: type of the run: | ||
alignReads ... map reads | alignReads ... map reads | ||
genomeGenerate ... generate genome files | genomeGenerate ... generate genome files | ||
Line 99: | Line 99: | ||
int: random number generator seed. | int: random number generator seed. | ||
### Genome Parameters | |||
genomeDir ./GenomeDir/ | genomeDir ./GenomeDir/ | ||
string: path to the directory where genome files are stored ( | string: path to the directory where genome files are stored (if runMode!=generateGenome) or will be generated (if runMode==generateGenome) | ||
genomeLoad NoSharedMemory | genomeLoad NoSharedMemory | ||
string: mode of shared memory usage for the genome files | string: mode of shared memory usage for the genome files | ||
LoadAndKeep ... load genome into shared and keep it in memory after run | LoadAndKeep ... load genome into shared and keep it in memory after run | ||
LoadAndRemove ... load genome into shared but remove it after run | LoadAndRemove ... load genome into shared but remove it after run | ||
Line 112: | Line 112: | ||
NoSharedMemory ... do not use shared memory, each job will have its own private copy of the genome | NoSharedMemory ... do not use shared memory, each job will have its own private copy of the genome | ||
### Genome Generation Parameters | |||
genomeFastaFiles - | |||
string: | string(s): path(s) to the fasta files with genomic sequences for genome generation, separated by spaces. Only used if runMode==genomeGenerate. These files should be plain text FASTA files, they *cannot* be zipped. | ||
genomeChrBinNbits 18 | genomeChrBinNbits 18 | ||
int: =log2(chrBin), where chrBin is the size of the bins for genome storage: each chromosome will occupy an integer number of bins | int: =log2(chrBin), where chrBin is the size of the bins for genome storage: each chromosome will occupy an integer number of bins | ||
genomeSAindexNbases 14 | genomeSAindexNbases 14 | ||
int: length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches | int: length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. | ||
genomeSAsparseD 1 | genomeSAsparseD 1 | ||
Line 138: | Line 131: | ||
int: maximum length of the suffixes, has to be longer than read length. -1 = infinite. | int: maximum length of the suffixes, has to be longer than read length. -1 = infinite. | ||
genomeChainFiles - | |||
string: chain files for genomic liftover | |||
genomeFileSizes 0 | |||
uint(s)>0: genome files exact sizes in bytes. Typically, this should not be defined by the user. | |||
### Splice Junctions Database | ### Splice Junctions Database | ||
Line 168: | Line 166: | ||
Basic ... only small junction / transcript files | Basic ... only small junction / transcript files | ||
All ... all files including big Genome, SA and SAindex - this will create a complete genome directory | All ... all files including big Genome, SA and SAindex - this will create a complete genome directory | ||
### Input Files | ### Input Files | ||
Line 178: | Line 172: | ||
### Read Parameters | ### Read Parameters | ||
readFilesIn Read1 Read2 | readFilesIn Read1 Read2 | ||
string(s): paths to files that contain input read1 (and, if needed, read2) | string(s): paths to files that contain input read1 (and, if needed, read2) | ||
readFilesCommand - | readFilesCommand - | ||
Line 222: | Line 207: | ||
### Limits | ### Limits | ||
limitGenomeGenerateRAM 31000000000 | limitGenomeGenerateRAM 31000000000 | ||
int>0: maximum available RAM (bytes) for genome generation | int>0: maximum available RAM (bytes) for genome generation | ||
Line 229: | Line 215: | ||
limitOutSAMoneReadBytes 100000 | limitOutSAMoneReadBytes 100000 | ||
int>0: max size of the SAM record | int>0: max size of the SAM record for one read. Recommended value: >(2*(LengthMate1+LengthMate2+100)*outFilterMultimapNmax | ||
limitOutSJoneRead 1000 | limitOutSJoneRead 1000 | ||
Line 238: | Line 224: | ||
limitBAMsortRAM 0 | limitBAMsortRAM 0 | ||
int>=0: maximum available RAM | int>=0: maximum available RAM for sorting BAM. If =0, it will be set to the genome index size. 0 value can only be used with --genomeLoad NoSharedMemory option. | ||
limitSjdbInsertNsj 1000000 | limitSjdbInsertNsj 1000000 | ||
Line 302: | Line 288: | ||
outSAMattributes Standard | outSAMattributes Standard | ||
string: a string of desired SAM attributes, in the order desired for the output SAM | string: a string of desired SAM attributes, in the order desired for the output SAM | ||
NH HI AS nM NM MD jM jI XS | NH HI AS nM NM MD jM jI XS ch ... any combination in any order | ||
Standard ... NH HI AS nM | |||
Standard | All ... NH HI AS nM NM MD jM jI ch | ||
All | None ... no attributes | ||
outSAMattrIHstart 1 | outSAMattrIHstart 1 | ||
Line 322: | Line 302: | ||
Within ... output unmapped reads within the main SAM file (i.e. Aligned.out.sam) | Within ... output unmapped reads within the main SAM file (i.e. Aligned.out.sam) | ||
2nd word: | 2nd word: | ||
KeepPairs ... record unmapped mate for each alignment, and, in case of unsorted output, keep it adjacent to its mapped mate. Only affects multi-mapping reads | KeepPairs ... record unmapped mate for each alignment, and, in case of unsorted output, keep it adjacent to its mapped mate. | ||
Only affects multi-mapping reads | |||
outSAMorder Paired | outSAMorder Paired | ||
Line 372: | Line 353: | ||
int: max number of multiple alignments for a read that will be output to the SAM/BAM files. | int: max number of multiple alignments for a read that will be output to the SAM/BAM files. | ||
-1 ... all alignments (up to --outFilterMultimapNmax) will be output | -1 ... all alignments (up to --outFilterMultimapNmax) will be output | ||
outBAMcompression 1 | outBAMcompression 1 | ||
Line 384: | Line 360: | ||
int: >=0: number of threads for BAM sorting. 0 will default to min(6,--runThreadN). | int: >=0: number of threads for BAM sorting. 0 will default to min(6,--runThreadN). | ||
### BAM processing | |||
bamRemoveDuplicatesType - | bamRemoveDuplicatesType - | ||
string: mark duplicates in the BAM file, for now only works with (i) sorted BAM | string: mark duplicates in the BAM file, for now only works with (i) sorted BAM feeded with inputBAMfile, and (ii) for paired-end alignments only | ||
- ... no duplicate removal/marking | - ... no duplicate removal/marking | ||
UniqueIdentical ... mark all multimappers, and duplicate unique mappers. The coordinates, FLAG, CIGAR must be identical | UniqueIdentical ... mark all multimappers, and duplicate unique mappers. The coordinates, FLAG, CIGAR must be identical | ||
Line 438: | Line 412: | ||
outFilterMismatchNoverLmax 0.3 | outFilterMismatchNoverLmax 0.3 | ||
float: alignment will be output only if its ratio of mismatches to *mapped* length is less than or equal to this value. | |||
outFilterMismatchNoverReadLmax 1.0 | outFilterMismatchNoverReadLmax 1.0 | ||
float: alignment will be output only if its ratio of mismatches to *read* length is less than or equal to this value. | |||
Line 448: | Line 422: | ||
outFilterScoreMinOverLread 0.66 | outFilterScoreMinOverLread 0.66 | ||
float: same as outFilterScoreMin, but normalized to read length (sum of mates' lengths for paired-end reads) | |||
outFilterMatchNmin 0 | outFilterMatchNmin 0 | ||
Line 454: | Line 428: | ||
outFilterMatchNminOverLread 0.66 | outFilterMatchNminOverLread 0.66 | ||
float: sam as outFilterMatchNmin, but normalized to the read length (sum of mates' lengths for paired-end reads). | |||
outFilterIntronMotifs None | outFilterIntronMotifs None | ||
Line 462: | Line 436: | ||
RemoveNoncanonicalUnannotated ... filter out alignments that contain non-canonical unannotated junctions when using annotated splice junctions database. The annotated non-canonical junctions will be kept. | RemoveNoncanonicalUnannotated ... filter out alignments that contain non-canonical unannotated junctions when using annotated splice junctions database. The annotated non-canonical junctions will be kept. | ||
### Output Filtering: Splice Junctions | ### Output Filtering: Splice Junctions | ||
Line 534: | Line 505: | ||
seedSearchStartLmaxOverLread 1.0 | seedSearchStartLmaxOverLread 1.0 | ||
float: seedSearchStartLmax normalized to read length (sum of mates' lengths for paired-end reads) | |||
seedSearchLmax 0 | seedSearchLmax 0 | ||
Line 550: | Line 521: | ||
seedNoneLociPerWindow 10 | seedNoneLociPerWindow 10 | ||
int>0: max number of one seed loci per window | int>0: max number of one seed loci per window | ||
alignIntronMin 21 | alignIntronMin 21 | ||
Line 577: | Line 545: | ||
alignSplicedMateMapLminOverLmate 0.66 | alignSplicedMateMapLminOverLmate 0.66 | ||
float>0: alignSplicedMateMapLmin normalized to mate length | |||
alignWindowsPerReadNmax 10000 | alignWindowsPerReadNmax 10000 | ||
Line 602: | Line 570: | ||
DiscordantPair ... report alignments with non-zero protrusion as discordant pairs | DiscordantPair ... report alignments with non-zero protrusion as discordant pairs | ||
alignSoftClipAtReferenceEnds | |||
alignSoftClipAtReferenceEnds Yes | |||
string: allow the soft-clipping of the alignments past the end of the chromosomes | string: allow the soft-clipping of the alignments past the end of the chromosomes | ||
Yes ... allow | |||
No ... prohibit, useful for compatibility with Cufflinks | |||
### Windows, Anchors, Binning | ### Windows, Anchors, Binning | ||
Line 634: | Line 591: | ||
winReadCoverageRelativeMin 0.5 | winReadCoverageRelativeMin 0.5 | ||
float>=0: minimum relative coverage of the read sequence by the seeds in a window, for STARlong algorithm only. | |||
winReadCoverageBasesMin 0 | winReadCoverageBasesMin 0 | ||
Line 640: | Line 597: | ||
### Chimeric Alignments | ### Chimeric Alignments | ||
chimOutType | chimOutType SeparateSAMold | ||
string(s): type of chimeric output | string(s): type of chimeric output | ||
1st word: | |||
SeparateSAMold ... output old SAM into separate Chimeric.out.sam file | SeparateSAMold ... output old SAM into separate Chimeric.out.sam file | ||
WithinBAM ... output into main aligned BAM files (Aligned.*.bam) | WithinBAM ... output into main aligned BAM files (Aligned.*.bam) | ||
WithinBAM HardClip ... | 2nd word: | ||
WithinBAM HardClip ... hard-clipping in the CIGAR for supplemental chimeric alignments (defaultif no 2nd word is present) | |||
WithinBAM SoftClip ... soft-clipping in the CIGAR for supplemental chimeric alignments | WithinBAM SoftClip ... soft-clipping in the CIGAR for supplemental chimeric alignments | ||
Line 655: | Line 613: | ||
chimScoreDropMax 20 | chimScoreDropMax 20 | ||
int>=0: max drop (difference) of chimeric score (the sum of scores of all chimeric | int>=0: max drop (difference) of chimeric score (the sum of scores of all chimeric segements) from the read length | ||
chimScoreSeparation 10 | chimScoreSeparation 10 | ||
Line 676: | Line 634: | ||
chimMainSegmentMultNmax 10 | chimMainSegmentMultNmax 10 | ||
int>=1: maximum number of multi-alignments for the main chimeric segment. =1 will prohibit multimapping main segments. | int>=1: maximum number of multi-alignments for the main chimeric segment. =1 will prohibit multimapping main segments. | ||
### Quantification of Annotations | ### Quantification of Annotations | ||
Line 711: | Line 659: | ||
int: number of reads to process for the 1st step. Use very large number (or default -1) to map all reads in the first step. | int: number of reads to process for the 1st step. Use very large number (or default -1) to map all reads in the first step. | ||
For more details see: | |||
<https://github.com/alexdobin/STAR> | <https://github.com/alexdobin/STAR> | ||
<https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf> | <https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf> |
Revision as of 14:10, 10 August 2018
Category
Bioinformatics
Program On
Teaching
Version
2.5.3a
Author / Distributor
Description
"STAR aligns RNA-seq reads to a reference genome using uncompressed suffix arrays." More details are at STAR
Running Program
The last version of this application is at /usr/local/apps/eb/STAR/2.5.3a-foss-2016b
To use this version, please load the module with
ml STAR/2.5.3a-foss-2016b
Here is an example of a shell script, sub.sh, to run on the batch queue:
#!/bin/bash
#SBATCH --job-name=j_STAR
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=STAR.%j.out
#SBATCH --error=STAR.%j.err
cd $SLURM_SUBMIT_DIR
ml STAR/2.5.3a-foss-2016b
STAR [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
ml STAR/2.5.3a-foss-2016b STAR Usage: STAR [options]... --genomeDir REFERENCE --readFilesIn R1.fq R2.fq Spliced Transcripts Alignment to a Reference (c) Alexander Dobin, 2009-2015 ### versions versionSTAR 020201 int>0: STAR release numeric ID. Please do not change this value! versionGenome 020101 020200 int>0: oldest value of the Genome version compatible with this STAR release. Please do not change this value! ### Parameter Files parametersFiles - string: name of a user-defined parameters file, "-": none. Can only be defined on the command line. ### System sysShell - string: path to the shell binary, preferrably bash, e.g. /bin/bash. - ... the default shell is executed, typically /bin/sh. This was reported to fail on some Ubuntu systems - then you need to specify path to bash. ### Run Parameters runMode alignReads string: type of the run: alignReads ... map reads genomeGenerate ... generate genome files inputAlignmentsFromBAM ... input alignments from BAM. Presently only works with --outWigType and --bamRemoveDuplicates. liftOver ... lift-over of GTF files (--sjdbGTFfile) between genome assemblies using chain file(s) from --genomeChainFiles. runThreadN 1 int: number of threads to run STAR runDirPerm User_RWX string: permissions for the directories created at the run-time. User_RWX ... user-read/write/execute All_RWX ... all-read/write/execute (same as chmod 777) runRNGseed 777 int: random number generator seed. ### Genome Parameters genomeDir ./GenomeDir/ string: path to the directory where genome files are stored (if runMode!=generateGenome) or will be generated (if runMode==generateGenome) genomeLoad NoSharedMemory string: mode of shared memory usage for the genome files LoadAndKeep ... load genome into shared and keep it in memory after run LoadAndRemove ... load genome into shared but remove it after run LoadAndExit ... load genome into shared memory and exit, keeping the genome in memory for future runs Remove ... do not map anything, just remove loaded genome from memory NoSharedMemory ... do not use shared memory, each job will have its own private copy of the genome ### Genome Generation Parameters genomeFastaFiles - string(s): path(s) to the fasta files with genomic sequences for genome generation, separated by spaces. Only used if runMode==genomeGenerate. These files should be plain text FASTA files, they *cannot* be zipped. genomeChrBinNbits 18 int: =log2(chrBin), where chrBin is the size of the bins for genome storage: each chromosome will occupy an integer number of bins genomeSAindexNbases 14 int: length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. genomeSAsparseD 1 int>0: suffux array sparsity, i.e. distance between indices: use bigger numbers to decrease needed RAM at the cost of mapping speed reduction genomeSuffixLengthMax -1 int: maximum length of the suffixes, has to be longer than read length. -1 = infinite. genomeChainFiles - string: chain files for genomic liftover genomeFileSizes 0 uint(s)>0: genome files exact sizes in bytes. Typically, this should not be defined by the user. ### Splice Junctions Database sjdbFileChrStartEnd - string(s): path to the files with genomic coordinates (chr <tab> start <tab> end <tab> strand) for the splice junction introns. Multiple files can be supplied wand will be concatenated. sjdbGTFfile - string: path to the GTF file with annotations sjdbGTFchrPrefix - string: prefix for chromosome names in a GTF file (e.g. 'chr' for using ENSMEBL annotations with UCSC genomes) sjdbGTFfeatureExon exon string: feature type in GTF file to be used as exons for building transcripts sjdbGTFtagExonParentTranscript transcript_id string: tag name to be used as exons' transcript-parents (default "transcript_id" works for GTF files) sjdbGTFtagExonParentGene gene_id string: tag name to be used as exons' gene-parents (default "gene_id" works for GTF files) sjdbOverhang 100 int>0: length of the donor/acceptor sequence on each side of the junctions, ideally = (mate_length - 1) sjdbScore 2 int: extra alignment score for alignmets that cross database junctions sjdbInsertSave Basic string: which files to save when sjdb junctions are inserted on the fly at the mapping step Basic ... only small junction / transcript files All ... all files including big Genome, SA and SAindex - this will create a complete genome directory ### Input Files inputBAMfile - string: path to BAM input file, to be used with --runMode inputAlignmentsFromBAM ### Read Parameters readFilesIn Read1 Read2 string(s): paths to files that contain input read1 (and, if needed, read2) readFilesCommand - string(s): command line to execute for each of the input file. This command should generate FASTA or FASTQ text and send it to stdout For example: zcat - to uncompress .gz files, bzcat - to uncompress .bz2 files, etc. readMapNumber -1 int: number of reads to map from the beginning of the file -1: map all reads readMatesLengthsIn NotEqual string: Equal/NotEqual - lengths of names,sequences,qualities for both mates are the same / not the same. NotEqual is safe in all situations. readNameSeparator / string(s): character(s) separating the part of the read names that will be trimmed in output (read name after space is always trimmed) clip3pNbases 0 int(s): number(s) of bases to clip from 3p of each mate. If one value is given, it will be assumed the same for both mates. clip5pNbases 0 int(s): number(s) of bases to clip from 5p of each mate. If one value is given, it will be assumed the same for both mates. clip3pAdapterSeq - string(s): adapter sequences to clip from 3p of each mate. If one value is given, it will be assumed the same for both mates. clip3pAdapterMMp 0.1 double(s): max proportion of mismatches for 3p adpater clipping for each mate. If one value is given, it will be assumed the same for both mates. clip3pAfterAdapterNbases 0 int(s): number of bases to clip from 3p of each mate after the adapter clipping. If one value is given, it will be assumed the same for both mates. ### Limits limitGenomeGenerateRAM 31000000000 int>0: maximum available RAM (bytes) for genome generation limitIObufferSize 150000000 int>0: max available buffers size (bytes) for input/output, per thread limitOutSAMoneReadBytes 100000 int>0: max size of the SAM record for one read. Recommended value: >(2*(LengthMate1+LengthMate2+100)*outFilterMultimapNmax limitOutSJoneRead 1000 int>0: max number of junctions for one read (including all multi-mappers) limitOutSJcollapsed 1000000 int>0: max number of collapsed junctions limitBAMsortRAM 0 int>=0: maximum available RAM for sorting BAM. If =0, it will be set to the genome index size. 0 value can only be used with --genomeLoad NoSharedMemory option. limitSjdbInsertNsj 1000000 int>=0: maximum number of junction to be inserted to the genome on the fly at the mapping stage, including those from annotations and those detected in the 1st step of the 2-pass run ### Output: general outFileNamePrefix ./ string: output files name prefix (including full or relative path). Can only be defined on the command line. outTmpDir - string: path to a directory that will be used as temporary by STAR. All contents of this directory will be removed! - the temp directory will default to outFileNamePrefix_STARtmp outTmpKeep None string: whether to keep the tempporary files after STAR runs is finished None ... remove all temporary files All .. keep all files outStd Log string: which output will be directed to stdout (standard out) Log ... log messages SAM ... alignments in SAM format (which normally are output to Aligned.out.sam file), normal standard output will go into Log.std.out BAM_Unsorted ... alignments in BAM format, unsorted. Requires --outSAMtype BAM Unsorted BAM_SortedByCoordinate ... alignments in BAM format, unsorted. Requires --outSAMtype BAM SortedByCoordinate BAM_Quant ... alignments to transcriptome in BAM format, unsorted. Requires --quantMode TranscriptomeSAM outReadsUnmapped None string: output of unmapped and partially mapped (i.e. mapped only one mate of a paired end read) reads in separate file(s). None ... no output Fastx ... output in separate fasta/fastq files, Unmapped.out.mate1/2 outQSconversionAdd 0 int: add this number to the quality score (e.g. to convert from Illumina to Sanger, use -31) outMultimapperOrder Old_2.4 string: order of multimapping alignments in the output files Old_2.4 ... quasi-random order used before 2.5.0 Random ... random order of alignments for each multi-mapper. Read mates (pairs) are always adjacent, all alignment for each read stay together. This option will become default in the future releases. ### Output: SAM and BAM outSAMtype SAM strings: type of SAM/BAM output 1st word: BAM ... output BAM without sorting SAM ... output SAM without sorting None ... no SAM/BAM output 2nd, 3rd: Unsorted ... standard unsorted SortedByCoordinate ... sorted by coordinate. This option will allocate extra memory for sorting which can be specified by --limitBAMsortRAM. outSAMmode Full string: mode of SAM output None ... no SAM output Full ... full SAM output NoQS ... full SAM but without quality scores outSAMstrandField None string: Cufflinks-like strand field flag None ... not used intronMotif ... strand derived from the intron motif. Reads with inconsistent and/or non-canonical introns are filtered out. outSAMattributes Standard string: a string of desired SAM attributes, in the order desired for the output SAM NH HI AS nM NM MD jM jI XS ch ... any combination in any order Standard ... NH HI AS nM All ... NH HI AS nM NM MD jM jI ch None ... no attributes outSAMattrIHstart 1 int>=0: start value for the IH attribute. 0 may be required by some downstream software, such as Cufflinks or StringTie. outSAMunmapped None string(s): output of unmapped reads in the SAM format 1st word: None ... no output Within ... output unmapped reads within the main SAM file (i.e. Aligned.out.sam) 2nd word: KeepPairs ... record unmapped mate for each alignment, and, in case of unsorted output, keep it adjacent to its mapped mate. Only affects multi-mapping reads outSAMorder Paired string: type of sorting for the SAM output Paired: one mate after the other for all paired alignments PairedKeepInputOrder: one mate after the other for all paired alignments, the order is kept the same as in the input FASTQ files outSAMprimaryFlag OneBestScore string: which alignments are considered primary - all others will be marked with 0x100 bit in the FLAG OneBestScore ... only one alignment with the best score is primary AllBestScore ... all alignments with the best score are primary outSAMreadID Standard string: read ID record type Standard ... first word (until space) from the FASTx read ID line, removing /1,/2 from the end Number ... read number (index) in the FASTx file outSAMmapqUnique 255 int: 0 to 255: the MAPQ value for unique mappers outSAMflagOR 0 int: 0 to 65535: sam FLAG will be bitwise OR'd with this value, i.e. FLAG=FLAG | outSAMflagOR. This is applied after all flags have been set by STAR, and after outSAMflagAND. Can be used to set specific bits that are not set otherwise. outSAMflagAND 65535 int: 0 to 65535: sam FLAG will be bitwise AND'd with this value, i.e. FLAG=FLAG & outSAMflagOR. This is applied after all flags have been set by STAR, but before outSAMflagOR. Can be used to unset specific bits that are not set otherwise. outSAMattrRGline - string(s): SAM/BAM read group line. The first word contains the read group identifier and must start with "ID:", e.g. --outSAMattrRGline ID:xxx CN:yy "DS:z z z". xxx will be added as RG tag to each output alignment. Any spaces in the tag values have to be double quoted. Comma separated RG lines correspons to different (comma separated) input files in --readFilesIn. Commas have to be surrounded by spaces, e.g. --outSAMattrRGline ID:xxx , ID:zzz "DS:z z" , ID:yyy DS:yyyy outSAMheaderHD - strings: @HD (header) line of the SAM header outSAMheaderPG - strings: extra @PG (software) line of the SAM header (in addition to STAR) outSAMheaderCommentFile - string: path to the file with @CO (comment) lines of the SAM header outSAMfilter None string(s): filter the output into main SAM/BAM files KeepOnlyAddedReferences ... only keep the reads for which all alignments are to the extra reference sequences added with --genomeFastaFiles at the mapping stage. KeepAllAddedReferences ... keep all alignments to the extra reference sequences added with --genomeFastaFiles at the mapping stage. outSAMmultNmax -1 int: max number of multiple alignments for a read that will be output to the SAM/BAM files. -1 ... all alignments (up to --outFilterMultimapNmax) will be output outBAMcompression 1 int: -1 to 10 BAM compression level, -1=default compression (6?), 0=no compression, 10=maximum compression outBAMsortingThreadN 0 int: >=0: number of threads for BAM sorting. 0 will default to min(6,--runThreadN). ### BAM processing bamRemoveDuplicatesType - string: mark duplicates in the BAM file, for now only works with (i) sorted BAM feeded with inputBAMfile, and (ii) for paired-end alignments only - ... no duplicate removal/marking UniqueIdentical ... mark all multimappers, and duplicate unique mappers. The coordinates, FLAG, CIGAR must be identical UniqueIdenticalNotMulti ... mark duplicate unique mappers but not multimappers. bamRemoveDuplicatesMate2basesN 0 int>0: number of bases from the 5' of mate 2 to use in collapsing (e.g. for RAMPAGE) ### Output Wiggle outWigType None string(s): type of signal output, e.g. "bedGraph" OR "bedGraph read1_5p". Requires sorted BAM: --outSAMtype BAM SortedByCoordinate . 1st word: None ... no signal output bedGraph ... bedGraph format wiggle ... wiggle format 2nd word: read1_5p ... signal from only 5' of the 1st read, useful for CAGE/RAMPAGE etc read2 ... signal from only 2nd read outWigStrand Stranded string: strandedness of wiggle/bedGraph output Stranded ... separate strands, str1 and str2 Unstranded ... collapsed strands outWigReferencesPrefix - string: prefix matching reference names to include in the output wiggle file, e.g. "chr", default "-" - include all references outWigNorm RPM string: type of normalization for the signal RPM ... reads per million of mapped reads None ... no normalization, "raw" counts ### Output Filtering outFilterType Normal string: type of filtering Normal ... standard filtering using only current alignment BySJout ... keep only those reads that contain junctions that passed filtering into SJ.out.tab outFilterMultimapScoreRange 1 int: the score range below the maximum score for multimapping alignments outFilterMultimapNmax 10 int: maximum number of loci the read is allowed to map to. Alignments (all of them) will be output only if the read maps to no more loci than this value. Otherwise no alignments will be output, and the read will be counted as "mapped to too many loci" in the Log.final.out . outFilterMismatchNmax 10 int: alignment will be output only if it has no more mismatches than this value. outFilterMismatchNoverLmax 0.3 float: alignment will be output only if its ratio of mismatches to *mapped* length is less than or equal to this value. outFilterMismatchNoverReadLmax 1.0 float: alignment will be output only if its ratio of mismatches to *read* length is less than or equal to this value. outFilterScoreMin 0 int: alignment will be output only if its score is higher than or equal to this value. outFilterScoreMinOverLread 0.66 float: same as outFilterScoreMin, but normalized to read length (sum of mates' lengths for paired-end reads) outFilterMatchNmin 0 int: alignment will be output only if the number of matched bases is higher than or equal to this value. outFilterMatchNminOverLread 0.66 float: sam as outFilterMatchNmin, but normalized to the read length (sum of mates' lengths for paired-end reads). outFilterIntronMotifs None string: filter alignment using their motifs None ... no filtering RemoveNoncanonical ... filter out alignments that contain non-canonical junctions RemoveNoncanonicalUnannotated ... filter out alignments that contain non-canonical unannotated junctions when using annotated splice junctions database. The annotated non-canonical junctions will be kept. ### Output Filtering: Splice Junctions outSJfilterReads All string: which reads to consider for collapsed splice junctions output All: all reads, unique- and multi-mappers Unique: uniquely mapping reads only outSJfilterOverhangMin 30 12 12 12 4 integers: minimum overhang length for splice junctions on both sides for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif does not apply to annotated junctions outSJfilterCountUniqueMin 3 1 1 1 4 integers: minimum uniquely mapping read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied does not apply to annotated junctions outSJfilterCountTotalMin 3 1 1 1 4 integers: minimum total (multi-mapping+unique) read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied does not apply to annotated junctions outSJfilterDistToOtherSJmin 10 0 5 10 4 integers>=0: minimum allowed distance to other junctions' donor/acceptor does not apply to annotated junctions outSJfilterIntronMaxVsReadN 50000 100000 200000 N integers>=0: maximum gap allowed for junctions supported by 1,2,3,,,N reads i.e. by default junctions supported by 1 read can have gaps <=50000b, by 2 reads: <=100000b, by 3 reads: <=200000. by >=4 reads any gap <=alignIntronMax does not apply to annotated junctions ### Scoring scoreGap 0 int: splice junction penalty (independent on intron motif) scoreGapNoncan -8 int: non-canonical junction penalty (in addition to scoreGap) scoreGapGCAG -4 GC/AG and CT/GC junction penalty (in addition to scoreGap) scoreGapATAC -8 AT/AC and GT/AT junction penalty (in addition to scoreGap) scoreGenomicLengthLog2scale -0.25 extra score logarithmically scaled with genomic length of the alignment: scoreGenomicLengthLog2scale*log2(genomicLength) scoreDelOpen -2 deletion open penalty scoreDelBase -2 deletion extension penalty per base (in addition to scoreDelOpen) scoreInsOpen -2 insertion open penalty scoreInsBase -2 insertion extension penalty per base (in addition to scoreInsOpen) scoreStitchSJshift 1 maximum score reduction while searching for SJ boundaries inthe stitching step ### Alignments and Seeding seedSearchStartLmax 50 int>0: defines the search start point through the read - the read is split into pieces no longer than this value seedSearchStartLmaxOverLread 1.0 float: seedSearchStartLmax normalized to read length (sum of mates' lengths for paired-end reads) seedSearchLmax 0 int>=0: defines the maximum length of the seeds, if =0 max seed lengthis infinite seedMultimapNmax 10000 int>0: only pieces that map fewer than this value are utilized in the stitching procedure seedPerReadNmax 1000 int>0: max number of seeds per read seedPerWindowNmax 50 int>0: max number of seeds per window seedNoneLociPerWindow 10 int>0: max number of one seed loci per window alignIntronMin 21 minimum intron size: genomic gap is considered intron if its length>=alignIntronMin, otherwise it is considered Deletion alignIntronMax 0 maximum intron size, if 0, max intron size will be determined by (2^winBinNbits)*winAnchorDistNbins alignMatesGapMax 0 maximum gap between two mates, if 0, max intron gap will be determined by (2^winBinNbits)*winAnchorDistNbins alignSJoverhangMin 5 int>0: minimum overhang (i.e. block size) for spliced alignments alignSJstitchMismatchNmax 0 -1 0 0 4*int>=0: maximum number of mismatches for stitching of the splice junctions (-1: no limit). (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. alignSJDBoverhangMin 3 int>0: minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments alignSplicedMateMapLmin 0 int>0: minimum mapped length for a read mate that is spliced alignSplicedMateMapLminOverLmate 0.66 float>0: alignSplicedMateMapLmin normalized to mate length alignWindowsPerReadNmax 10000 int>0: max number of windows per read alignTranscriptsPerWindowNmax 100 int>0: max number of transcripts per window alignTranscriptsPerReadNmax 10000 int>0: max number of different alignments per read to consider alignEndsType Local string: type of read ends alignment Local ... standard local alignment with soft-clipping allowed EndToEnd ... force end-to-end read alignment, do not soft-clip Extend5pOfRead1 ... fully extend only the 5p of the read1, all other ends: local alignment Extend5pOfReads12 ... fully extend only the 5p of the both read1 and read2, all other ends: local alignment alignEndsProtrude 0 ConcordantPair int, string: allow protrusion of alignment ends, i.e. start (end) of the +strand mate downstream of the start (end) of the -strand mate 1st word: int: maximum number of protrusion bases allowed 2nd word: string: ConcordantPair ... report alignments with non-zero protrusion as concordant pairs DiscordantPair ... report alignments with non-zero protrusion as discordant pairs alignSoftClipAtReferenceEnds Yes string: allow the soft-clipping of the alignments past the end of the chromosomes Yes ... allow No ... prohibit, useful for compatibility with Cufflinks ### Windows, Anchors, Binning winAnchorMultimapNmax 50 int>0: max number of loci anchors are allowed to map to winBinNbits 16 int>0: =log2(winBin), where winBin is the size of the bin for the windows/clustering, each window will occupy an integer number of bins. winAnchorDistNbins 9 int>0: max number of bins between two anchors that allows aggregation of anchors into one window winFlankNbins 4 int>0: log2(winFlank), where win Flank is the size of the left and right flanking regions for each window winReadCoverageRelativeMin 0.5 float>=0: minimum relative coverage of the read sequence by the seeds in a window, for STARlong algorithm only. winReadCoverageBasesMin 0 int>0: minimum number of bases covered by the seeds in a window , for STARlong algorithm only. ### Chimeric Alignments chimOutType SeparateSAMold string(s): type of chimeric output 1st word: SeparateSAMold ... output old SAM into separate Chimeric.out.sam file WithinBAM ... output into main aligned BAM files (Aligned.*.bam) 2nd word: WithinBAM HardClip ... hard-clipping in the CIGAR for supplemental chimeric alignments (defaultif no 2nd word is present) WithinBAM SoftClip ... soft-clipping in the CIGAR for supplemental chimeric alignments chimSegmentMin 0 int>=0: minimum length of chimeric segment length, if ==0, no chimeric output chimScoreMin 0 int>=0: minimum total (summed) score of the chimeric segments chimScoreDropMax 20 int>=0: max drop (difference) of chimeric score (the sum of scores of all chimeric segements) from the read length chimScoreSeparation 10 int>=0: minimum difference (separation) between the best chimeric score and the next one chimScoreJunctionNonGTAG -1 int: penalty for a non-GT/AG chimeric junction chimJunctionOverhangMin 20 int>=0: minimum overhang for a chimeric junction chimSegmentReadGapMax 0 int>=0: maximum gap in the read sequence between chimeric segments chimFilter banGenomicN string(s): different filters for chimeric alignments None ... no filtering banGenomicN ... Ns are not allowed in the genome sequence around the chimeric junction chimMainSegmentMultNmax 10 int>=1: maximum number of multi-alignments for the main chimeric segment. =1 will prohibit multimapping main segments. ### Quantification of Annotations quantMode - string(s): types of quantification requested - ... none TranscriptomeSAM ... output SAM/BAM alignments to transcriptome into a separate file GeneCounts ... count reads per gene quantTranscriptomeBAMcompression 1 1 int: -1 to 10 transcriptome BAM compression level, -1=default compression (6?), 0=no compression, 10=maximum compression quantTranscriptomeBan IndelSoftclipSingleend string: prohibit various alignment type IndelSoftclipSingleend ... prohibit indels, soft clipping and single-end alignments - compatible with RSEM Singleend ... prohibit single-end alignments ### 2-pass Mapping twopassMode None string: 2-pass mapping mode. None ... 1-pass mapping Basic ... basic 2-pass mapping, with all 1st pass junctions inserted into the genome indices on the fly twopass1readsN -1 int: number of reads to process for the 1st step. Use very large number (or default -1) to map all reads in the first step. For more details see: <https://github.com/alexdobin/STAR> <https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf>
Installation
Source code is obtained from STAR
System
64-bit Linux