Homer-Teaching
Category
Bioinformatics
Program On
Teaching
Version
4.9.1, 4.10, 4.11
Author / Distributor
Description
"HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and ChIP-Seq analysis. It is a collection of command line programs for unix-style operating systems written in mostly perl and c++. Homer was primarily written as a de novo motif discovery algorithm that is well suited for finding 8-12 bp motifs in large scale genomics data." Homer
Running Program
Also refer to Running Jobs on the teaching cluster
- version 4.9.1 is installed in /usr/local/apps/eb/Homer/4.9.1-foss-2016b. To use this version of Homer, please first load the module with
module load Homer/4.9.1-foss-2016b
- version 4.10 is installed in /usr/local/apps/eb/Homer/4.10-foss-2016b. To use this version of Homer, please first load the module with
module load Homer/4.10-foss-2016b
- version 4.11 is installed in /usr/local/apps/eb/Homer/4.11-foss-2016b. To use this version of Homer, please first load the module with
module load Homer/4.11-foss-2016b
Example of script sub.sh to run findMotifs.pl
#!/bin/bash
#SBATCH --job-name=j_HOMER
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=2gb
#SBATCH --time=08:00:00
#SBATCH --output=HOMER.%j.out
#SBATCH --error=HOMER.%j.err
cd $SLURM_SUBMIT_DIR
ml Homer/4.11-foss-2016b
findMotifs.pl <input list> <promoter set> <output directory> [additional options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
An example of script sub.sh to run makeTagDirectory:
#!/bin/bash
#SBATCH --job-name=j_HOMER
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=2gb
#SBATCH --time=08:00:00
#SBATCH --output=HOMER.%j.out
#SBATCH --error=HOMER.%j.err
cd $SLURM_SUBMIT_DIR
ml Homer/4.11-foss-2016b
makeTagDirectory <directory> <alignment file 1> [file 2] ... [options] [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.
Example of submission to the queue:
sbatch sub.sh
Documentation
module load Homer/4.11-foss-2016b perl $EBROOTHOMER/configureHomer.pl -list Current base directory for HOMER is /usr/local/apps/eb/Homer/4.11-foss-2016b/ --2019-11-13 16:32:15-- http://homer.ucsd.edu/homer/update.txt Resolving homer.ucsd.edu (homer.ucsd.edu)... 169.228.63.226 Connecting to homer.ucsd.edu (homer.ucsd.edu)|169.228.63.226|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 17859 (17K) [text/plain] Saving to: ‘/usr/local/apps/eb/Homer/4.11-foss-2016b//update.txt’ 100%[================================================================================================================================================================>] 17,859 --.-K/s in 0.06s 2019-11-13 16:32:15 (277 KB/s) - ‘/usr/local/apps/eb/Homer/4.11-foss-2016b//update.txt’ saved [17859/17859] Updating Settings... Packages with name conflicts have a trailing -o, -p, or -g Version Installed Package Version Description SOFTWARE + homer v4.11.1 Code/Executables, ontologies, motifs for HOMER ORGANISMS + zebrafish-o v6.3 Danio rerio (zebrafish) accession and ontology information + lamprey v6.3 Petromyzon marinus (lamprey) accession and ontology information + dog v6.3 Canis lupus familiaris (dog) accession and ontology information + human-o v6.3 Homo sapiens (human) accession and ontology information + anemone v6.3 Nematostella vectensis (anemone) accession and ontology information + pig v6.3 Sus scrofa (pig) accession and ontology information + chicken-o v6.3 Gallus gallus (chicken) accession and ontology information + corn v6.3 Zea mays (corn) accession and ontology information + urchin v6.3 Strongylocentrotus purpuratus (urchin) accession and ontology information + rice v6.3 Oryza sativa (rice) accession and ontology information + fugu v6.3 Takifugu rubripes (fugu) accession and ontology information + mosquito v6.3 Anopheles gambiae (mosquito) accession and ontology information + chlamy v6.3 Chlamydomonas reinhardtii (chlamy) accession and ontology information + cocci v6.3 Coccidioides immitis RS (cocci) accession and ontology information + selaginella v6.3 Selaginella moellendorffii (selaginella) accession and ontology information + tomato v6.3 Solanum lycopersicum (tomato) accession and ontology information + fly-o v6.3 Drosophila melanogaster (fly) accession and ontology information + ciona v6.3 Ciona intestinalis (ciona) accession and ontology information + rat-o v6.3 Rattus norvegicus (rat) accession and ontology information + volvox v6.3 Volvox carteri (volvox) accession and ontology information + mushroom v6.3 Agaricus bisporus (mushroom) accession and ontology information + cow v5.4 Bos taurus (cow) accession and ontology information + frog-o v6.3 Xenopus tropicalis (frog) accession and ontology information + worm-o v6.3 Caenorhabditis elegans (worm) accession and ontology information + ciliate v6.3 Tetrahymena thermophila (ciliate) accession and ontology information + ascomycetes v6.0 Neurospora crassa (ascomycetes) accession and ontology information + balbc_cocci v5.2 Balbc/J mouse and Coccidioides genome combined + bee v6.3 Apis mellifera (bee) accession and ontology information + rhesus v6.3 Macaca mulatta (rhesus) accession and ontology information + seahare v6.3 Aplysia californica (seahare) accession and ontology information + mouse-o v6.3 Mus musculus (mouse) accession and ontology information + patens v6.3 Physcomitrella patens (patens) accession and ontology information + ncrassa v6.3 Neurospora crassa (ncrassa) accession and ontology information + diatom v6.3 Phaeodactylum tricornutum (diatom) accession and ontology information + pseudonana v6.3 Thalassiosira pseudonana (pseudonana) accession and ontology information + pombe v6.3 Schizosaccharomyces pombe (pombe) accession and ontology information + arabidopsis-o v6.3 Arabidopsis thaliana (arabidopsis) accession and ontology information + hydra v6.3 Hydra vulgaris (hydra) accession and ontology information + zebrafinch v6.3 Taeniopygia guttata (zebrafinch) accession and ontology information + laevis v6.3 Xenopus laevis (laevis) accession and ontology information + dicty v6.3 Dictyostelium discoideum (dicty) accession and ontology information + yeast-o v6.3 Saccharomyces cerevisiae (yeast) accession and ontology information PROMOTERS + arabidopsis-p v6.3 arabidopsis promoters (arabidopsis) + yeast-p v5.5 yeast promoters (yeast) + chicken-p v5.5 chicken promoters (chicken) + mouse-p v5.5 mouse promoters (mouse) + rat-p v5.5 rat promoters (rat) + fly-p v5.5 fly promoters (fly) + human-p v5.5 human promoters (human) + worm-p v5.5 worm promoters (worm) + zebrafish-p v5.5 zebrafish promoters (zebrafish) + frog-p v5.5 frog promoters (frog) GENOMES + hg17 v6.4 human genome and annotation for UCSC hg17 + susScr3 v6.4 pig genome and annotation for UCSC susScr3 + dm6 v6.4 fly genome and annotation for UCSC dm6 + rheMac8 v6.4 rhesus genome and annotation for UCSC rheMac8 + hg19 v6.4 human genome and annotation for UCSC hg19 + sacCer2 v6.4 yeast genome and annotation for UCSC sacCer2 + gorGor5 v6.4 human genome and annotation for UCSC gorGor5 + AGPv3 v5.10 corn genome and annotation (AGPv3) + ce6 v6.4 worm genome and annotation for UCSC ce6 + patens.ASM242v1 v5.10 patens genome and annotation (patens.ASM242v1) + rn5 v6.4 rat genome and annotation for UCSC rn5 + danRer7 v6.4 zebrafish genome and annotation for UCSC danRer7 + apiMel3 v6.4 bee genome and annotation for UCSC apiMel3 + fr3 v6.4 fugu genome and annotation for UCSC fr3 + ce10 v6.4 worm genome and annotation for UCSC ce10 + xenTro3 v6.4 frog genome and annotation for UCSC xenTro3 + strPur2 v6.0 urchin genome and annotation for UCSC strPur2 + rice.IRGSP-1.0 v5.10 rice genome and annotation (rice.IRGSP-1.0) + mm9 v6.4 mouse genome and annotation for UCSC mm9 + mm10 v6.4 mouse genome and annotation for UCSC mm10 + panPan2 v6.4 human genome and annotation for UCSC panPan2 + ce11 v6.4 worm genome and annotation for UCSC ce11 + mm8 v6.4 mouse genome and annotation for UCSC mm8 + rn6 v6.4 rat genome and annotation for UCSC rn6 + corn.AGPv3 v5.10 corn genome and annotation (corn.AGPv3) + rn4 v6.4 rat genome and annotation for UCSC rn4 + papAnu2 v6.4 human genome and annotation for UCSC papAnu2 + gorGor3 v6.4 human genome and annotation for UCSC gorGor3 + danRer10 v6.4 zebrafish genome and annotation for UCSC danRer10 + petMar3 v6.4 lamprey genome and annotation for UCSC petMar3 + galGal4 v6.4 chicken genome and annotation for UCSC galGal4 + panTro3 v6.4 human genome and annotation for UCSC panTro3 + hg18 v6.4 human genome and annotation for UCSC hg18 + ci2 v6.4 ciona genome and annotation for UCSC ci2 + canFam3 v6.4 dog genome and annotation for UCSC canFam3 + galGal5 v6.4 chicken genome and annotation for UCSC galGal5 + taeGut2 v6.4 zebrafinch genome and annotation for UCSC taeGut2 + anoGam1 v6.4 mosquito genome and annotation for UCSC anoGam1 + panTro5 v6.4 human genome and annotation for UCSC panTro5 + galGal6 v6.4 chicken genome and annotation for UCSC galGal6 + apiMel2 v6.4 bee genome and annotation for UCSC apiMel2 + danRer11 v6.4 zebrafish genome and annotation for UCSC danRer11 + xenTro2 v6.4 frog genome and annotation for UCSC xenTro2 + hg38 v6.4 human genome and annotation for UCSC hg38 + sacCer3 v6.4 yeast genome and annotation for UCSC sacCer3 + petMar2 v6.4 lamprey genome and annotation for UCSC petMar2 + dm3 v6.0 fly genome and annotation for UCSC dm3 + panPan1 v6.4 human genome and annotation for UCSC panPan1 + ci3 v6.4 ciona genome and annotation for UCSC ci3 + aplCal1 v6.4 seahare genome and annotation for UCSC aplCal1 + panTro6 v6.4 human genome and annotation for UCSC panTro6 + panTro4 v6.4 human genome and annotation for UCSC panTro4 + xenTro9 v6.4 frog genome and annotation for UCSC xenTro9 + susScr11 v6.4 pig genome and annotation for UCSC susScr11 + rheMac2 v6.4 rhesus genome and annotation for UCSC rheMac2 + gorGor4 v6.4 human genome and annotation for UCSC gorGor4 + tetNig2 v6.4 fugu genome and annotation for UCSC tetNig2 + tair10 v6.0 arabidopsis genome and annotation (tair10) + rheMac3 v6.4 rhesus genome and annotation for UCSC rheMac3 + xenTro7 v6.4 frog genome and annotation for UCSC xenTro7 SETTINGS
findMotifs.pl Program will find de novo and known motifs in a gene list Usage: findMotifs.pl <input list> <promoter set> <output directory> [additoinal options] example: findMotifs.pl genelist.txt mouse motifResults/ -len 10 FASTA example: findMotifs.pl targets.fa fasta motifResults/ -fasta background.fa Available Promoter Sets: Add custom promoters sets with loadPromoters.pl worm worm /usr/local/apps/eb/Homer/4.11-foss-2016b/.//data/promoters/ -2000 2000 refseq zebrafish zebrafish /usr/local/apps/eb/Homer/4.11-foss-2016b/.//data/promoters/ -2000 2000 refseq rat rat /usr/local/apps/eb/Homer/4.11-foss-2016b/.//data/promoters/ -2000 2000 refseq fly fly /usr/local/apps/eb/Homer/4.11-foss-2016b/.//data/promoters/ -2000 2000 refseq yeast yeast /usr/local/apps/eb/Homer/4.11-foss-2016b/.//data/promoters/ -2000 2000 orf mouse mouse /usr/local/apps/eb/Homer/4.11-foss-2016b/.//data/promoters/ -2000 2000 refseq arabidopsis arabidopsis /usr/local/apps/eb/Homer/4.11-foss-2016b/.//data/promoters/ -2000 2000 refseq chicken chicken /usr/local/apps/eb/Homer/4.11-foss-2016b/.//data/promoters/ -2000 2000 refseq frog frog /usr/local/apps/eb/Homer/4.11-foss-2016b/.//data/promoters/ -2000 2000 refseq human human /usr/local/apps/eb/Homer/4.11-foss-2016b/.//data/promoters/ -2000 2000 refseq Try typing "perl /usr/local/apps/eb/Homer/4.11-foss-2016b/.//configureHomer.pl -list" to see available promoter sets Typing "perl /usr/local/apps/eb/Homer/4.11-foss-2016b/.//configureHomer.pl -install NNN" to install promoter set NNN Basic options: -len <#>[,<#>,<#>...] (motif length, default=8,10,12) [NOTE: values greater 12 may cause the program to run out of memmory - in these cases decrease the number of sequences analyzed] -bg <background file> (ids to use as background, default: all genes) -start <#> (offset from TSS, default=-300) [max=based on Promoter Set] -end <#> (offset from TSS, default=50) [max=based on Promoter Set] -rna (output RNA motif logos and compare to RNA motif database, automatically sets -norevopp) -mask/-nomask (use/don't use repeatmasked files, default: -mask) -S <#> (Number of motifs to optimize, default: 25) -mis <#> (global optimization: searches for strings with # mismatches, default: 1) -noconvert (will not worry about converting input files into unigene ids) -norevopp (do not search the reverse strand for motifs) -nomotif (don't search for de novo motif enrichment) Scanning sequence for motifs -find <motif file> (This will cause the program to only scan for motifs) Including Enhancers - peak files of enhancer location, peak ID should be gene ID -enhancers <peak file> <genome verion> (enhancers to include in search space, peaks/sequences should be named with a gene ID If multiple enhancers per gene, use the same gene ID, and all will be included) -enhancersOnly (do not include promoter sequence in motif search) FASTA files: If you prefer to use your own fasta files, place target sequences and background sequences in two separate FASTA formated files (must have unique identifiers) Target File - use in place of <input list> (i.e. the first argument) Background File - after output directory (with additional options) use the argument: -fastaBg <background fasta file> (This is recommended for fasta based analysis) In place of the promoter set use "fasta", or any valid set (this parameter is ignored) When finding motifs [-find], only the target file with be searched) -chopify (chops up background regions to match size of target regions) i.e. if background is a full genome or all mRNAs Known Motif Options/Visualization: -mset <vertebrates|insects|worms|plants|yeast|all> (check against motif collects, default: auto) -basic (don't check de novo motifs for similarity to known motifs) -bits (scale sequence logos by information content, default: doesn't scale) -nocheck (don't check for similarity between novo motif motifs and known motifs) -mcheck <motif file> (known motifs to check against de novo motifs, -noknown (don't search for known motif enrichment, default: -known) -mknown <motif file> (known motifs to check for enrichment, -nofacts (omit humor) -seqlogo (uses weblogo/seqlogo/ghostscript to visualize motifs, default uses SVG) Advanced options: -b (use binomial distribution to calculate p-values, hypergeometric is default) -nogo (don't search for gene ontology enrichment) -humanGO (Convert IDs to human for GO analysis) -ontology <ont.genes> [ont.genes] ... (custom ontologies for GO analysis) -noweight (no CG correction) -noredun (Don't remove predetermined redundant promoters/sequences) -g (input file is a group file, i.e. 1st column = id, 2nd = 0 or 1 [1=target,0=back]) -cpg (use CpG% instead of GC% for sequence normalization) -rand (randomize labels for target and backgound sequences) -maskMotif <motif file 1> [motif file 2] ... (motifs to mask before motif finding) -opt <motif file 1> [motif file 2] ... (motifs to optimize/change length) -peaks (will produce peak file of promoters to use with findMotifsGenome.pl) -nowarn (no warnings) -keepFiles (don't delete temporary files) -dumpFasta (create target.fa and background.fa files) -min <#> (remove sequences shorter than #, default: 0) -max <#> (remove sequences longer than #, default: 1e10) -reuse (rerun homer using old seq files etc. with new options and ignores input list, organism) -fdr <#> (Calculate empirical FDR for de novo discovery #=number of randomizations) homer2 specific options: -homer2 (use homer2 instead of original homer, default) -nlen <#> (length of lower-order oligos to normalize - general sequences, default: 3) -nmax <#> (Max normalization iterations, default: 160) -neutral (weight sequences to neutral frequencies, i.e. 25%, 6.25%, etc.) -olen <#> (lower-order oligo normalization for oligo table, use if -nlen isn't working well) -p <#> (Number of processors to use, default: 1) -e <#> (Maximum expected motif instance per bp in random sequence, default: 0.01) -cache <#> (size in MB for statistics cache, default: 500) -quickMask (skip full masking after finding motifs, similar to original homer) -homer1 (to force the use of the original homer) -minlp <#> (stop looking for motifs when seed logp score gets above #, default: -10) Original homer specific options: -float (allow adjustment of the degeneracy threshold for known motifs to improve p-value[dangerous]) -homer1 (to force the use of the original homer) -depth [low|med|high|allnight] (time spent on local optimization default: med)
Installation
Source code from Homer
System
64-bit Linux