HTSeq-Teaching
Category
Bioinformatics
Program On
Teaching
Version
0.9.1
Author / Distributor
Simon Anders
Description
A framework to process and analyze data from high-throughput sequencing (HTS) assays. More information: http://www-huber.embl.de/users/anders/HTSeq/
Running Program
- Version 0.9.1, installed in /usr/local/apps/eb/HTSeq/0.9.1-foss-2016b-Python-2.7.14
To use this version of HTSeq, please first load the module with
ml HTSeq/0.9.1-foss-2016b-Python-2.7.14
Here is an example of a shell script, sub.sh, to run on the batch queue:
#!/bin/bash
#SBATCH --job-name=j_BEDTools
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=HTSeq.%j.out
#SBATCH --error=HTSeq.%j.err
cd $SLURM_SUBMIT_DIR
ml HTSeq/0.9.1-foss-2016b-Python-2.7.14
htseq-count [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
module load HTSeq/0.9.1-foss-2016b-Python-2.7.14 htseq-count -h usage: htseq-count [options] alignment_file gff_file This script takes one or more alignment files in SAM/BAM format and a feature file in GFF format and calculates for each feature the number of reads mapping to it. See http://htseq.readthedocs.io/en/master/count.html for details. positional arguments: samfilenames Path to the SAM/BAM files containing the mapped reads. If '-' is selected, read from standard input featuresfilename Path to the file containing the features optional arguments: -h, --help show this help message and exit -f {sam,bam}, --format {sam,bam} type of <alignment_file> data, either 'sam' or 'bam' (default: sam) -r {pos,name}, --order {pos,name} 'pos' or 'name'. Sorting order of <alignment_file> (default: name). Paired-end sequencing data must be sorted either by position or by read name, and the sorting order must be specified. Ignored for single- end data. --max-reads-in-buffer MAX_BUFFER_SIZE When <alignment_file> is paired end sorted by position, allow only so many reads to stay in memory until the mates are found (raising this number will use more memory). Has no effect for single end or paired end sorted by name -s {yes,no,reverse}, --stranded {yes,no,reverse} whether the data is from a strand-specific assay. Specify 'yes', 'no', or 'reverse' (default: yes). 'reverse' means 'yes' with reversed strand interpretation -a MINAQUAL, --minaqual MINAQUAL skip all reads with alignment quality lower than the given minimum value (default: 10) -t FEATURETYPE, --type FEATURETYPE feature type (3rd column in GFF file) to be used, all features of other type are ignored (default, suitable for Ensembl GTF files: exon) -i IDATTR, --idattr IDATTR GFF attribute to be used as feature ID (default, suitable for Ensembl GTF files: gene_id) --additional-attr ADDITIONAL_ATTR [ADDITIONAL_ATTR ...] Additional feature attributes (default: none, suitable for Ensembl GTF files: gene_name) -m {union,intersection-strict,intersection-nonempty}, --mode {union,intersection-strict,intersection-nonempty} mode to handle reads overlapping more than one feature (choices: union, intersection-strict, intersection- nonempty; default: union) --nonunique {none,all} Whether to score reads that are not uniquely aligned or ambiguously assigned to features --secondary-alignments {score,ignore} Whether to score secondary alignments (0x100 flag) --supplementary-alignments {score,ignore} Whether to score supplementary alignments (0x800 flag) -o SAMOUTS [SAMOUTS ...], --samout SAMOUTS [SAMOUTS ...] write out all SAM alignment records into an output SAM file called SAMOUT, annotating each line with its feature assignment (as an optional field with tag 'XF') -q, --quiet suppress progress report Written by Simon Anders (sanders@fs.tum.de), European Molecular Biology Laboratory (EMBL). (c) 2010. Released under the terms of the GNU General Public License v3. Part of the 'HTSeq' framework, version 0.9.1. htseq-qa -h Usage: htseq-qa [options] read_file This script take a file with high-throughput sequencing reads (supported formats: SAM, Solexa _export.txt, FASTQ, Solexa _sequence.txt) and performs a simply quality assessment by producing plots showing the distribution of called bases and base-call quality scores by position within the reads. The plots are output as a PDF file. Options: -h, --help show this help message and exit -t TYPE, --type=TYPE type of read_file (one of: sam [default], bam, solexa- export, fastq, solexa-fastq) -o OUTFILE, --outfile=OUTFILE output filename (default is <read_file>.pdf) -r READLEN, --readlength=READLEN the maximum read length (when not specified, the script guesses from the file -g GAMMA, --gamma=GAMMA the gamma factor for the contrast adjustment of the quality score plot -n, --nosplit do not split reads in unaligned and aligned ones -m MAXQUAL, --maxqual=MAXQUAL the maximum quality score that appears in the data (default: 41) Written by Simon Anders (sanders@fs.tum.de), European Molecular Biology Laboratory (EMBL). (c) 2010. Released under the terms of the GNU General Public License v3. Part of the 'HTSeq' framework, version 0.9.1.
Installation
- Version 0.9.1, source code downloaded from http://www-huber.embl.de/users/anders/HTSeq/, installed in /usr/local/apps/eb/HTSeq/0.9.1-foss-2016b-Python-2.7.14
System
64-bit Linux