Homer-Teaching: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
(Created page with "Category:TeachingCategory:SoftwareCategory:Bioinformatics === Category === Bioinformatics === Program On === Teaching === Version === 4.9.1, 4.10, 4.11 =...")
 
 
Line 67: Line 67:
<nowiki>#</nowiki>SBATCH --mail-user=<u>username@uga.edu</u><br>   
<nowiki>#</nowiki>SBATCH --mail-user=<u>username@uga.edu</u><br>   
<nowiki>#</nowiki>SBATCH --ntasks=<u>1</u><br>   
<nowiki>#</nowiki>SBATCH --ntasks=<u>1</u><br>   
<nowiki>#</nowiki>SBATCH --mem=<u>10gb</u><br>     
<nowiki>#</nowiki>SBATCH --mem=<u>2gb</u><br>     
<nowiki>#</nowiki>SBATCH --time=<u>08:00:00</u><br>   
<nowiki>#</nowiki>SBATCH --time=<u>08:00:00</u><br>   
<nowiki>#</nowiki>SBATCH --output=HOMER.%j.out<br>
<nowiki>#</nowiki>SBATCH --output=HOMER.%j.out<br>

Latest revision as of 14:18, 14 November 2019

Category

Bioinformatics

Program On

Teaching

Version

4.9.1, 4.10, 4.11

Author / Distributor

Homer

Description

"HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and ChIP-Seq analysis. It is a collection of command line programs for unix-style operating systems written in mostly perl and c++. Homer was primarily written as a de novo motif discovery algorithm that is well suited for finding 8-12 bp motifs in large scale genomics data." Homer

Running Program

Also refer to Running Jobs on the teaching cluster

  • version 4.9.1 is installed in /usr/local/apps/eb/Homer/4.9.1-foss-2016b. To use this version of Homer, please first load the module with
module load Homer/4.9.1-foss-2016b
  • version 4.10 is installed in /usr/local/apps/eb/Homer/4.10-foss-2016b. To use this version of Homer, please first load the module with
module load Homer/4.10-foss-2016b
  • version 4.11 is installed in /usr/local/apps/eb/Homer/4.11-foss-2016b. To use this version of Homer, please first load the module with
module load Homer/4.11-foss-2016b

Example of script sub.sh to run findMotifs.pl

#!/bin/bash
#SBATCH --job-name=j_HOMER
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=2gb
#SBATCH --time=08:00:00
#SBATCH --output=HOMER.%j.out
#SBATCH --error=HOMER.%j.err

cd $SLURM_SUBMIT_DIR
ml Homer/4.11-foss-2016b
findMotifs.pl <input list> <promoter set> <output directory> [additional options]

In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.


An example of script sub.sh to run makeTagDirectory:

#!/bin/bash
#SBATCH --job-name=j_HOMER
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=2gb
#SBATCH --time=08:00:00
#SBATCH --output=HOMER.%j.out
#SBATCH --error=HOMER.%j.err

cd $SLURM_SUBMIT_DIR
ml Homer/4.11-foss-2016b
makeTagDirectory <directory> <alignment file 1> [file 2] ... [options] [options]

In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.

Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.


Example of submission to the queue:

sbatch sub.sh

Documentation

Homer

jump to motif finding

module load Homer/4.11-foss-2016b

perl $EBROOTHOMER/configureHomer.pl -list

	Current base directory for HOMER is /usr/local/apps/eb/Homer/4.11-foss-2016b/

--2019-11-13 16:32:15--  http://homer.ucsd.edu/homer/update.txt
Resolving homer.ucsd.edu (homer.ucsd.edu)... 169.228.63.226
Connecting to homer.ucsd.edu (homer.ucsd.edu)|169.228.63.226|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 17859 (17K) [text/plain]
Saving to: ‘/usr/local/apps/eb/Homer/4.11-foss-2016b//update.txt’

100%[================================================================================================================================================================>] 17,859      --.-K/s   in 0.06s   

2019-11-13 16:32:15 (277 KB/s) - ‘/usr/local/apps/eb/Homer/4.11-foss-2016b//update.txt’ saved [17859/17859]

	Updating Settings...
Packages with name conflicts have a trailing -o, -p, or -g
Version Installed	Package	Version	Description
SOFTWARE
+	homer	v4.11.1	Code/Executables, ontologies, motifs for HOMER
ORGANISMS
+	zebrafish-o	v6.3	Danio rerio (zebrafish) accession and ontology information
+	lamprey	v6.3	Petromyzon marinus (lamprey) accession and ontology information
+	dog	v6.3	Canis lupus familiaris (dog) accession and ontology information
+	human-o	v6.3	Homo sapiens (human) accession and ontology information
+	anemone	v6.3	Nematostella vectensis (anemone) accession and ontology information
+	pig	v6.3	Sus scrofa (pig) accession and ontology information
+	chicken-o	v6.3	Gallus gallus (chicken) accession and ontology information
+	corn	v6.3	Zea mays (corn) accession and ontology information
+	urchin	v6.3	Strongylocentrotus purpuratus (urchin) accession and ontology information
+	rice	v6.3	Oryza sativa (rice) accession and ontology information
+	fugu	v6.3	Takifugu rubripes (fugu) accession and ontology information
+	mosquito	v6.3	Anopheles gambiae (mosquito) accession and ontology information
+	chlamy	v6.3	Chlamydomonas reinhardtii (chlamy) accession and ontology information
+	cocci	v6.3	Coccidioides immitis RS (cocci) accession and ontology information
+	selaginella	v6.3	Selaginella moellendorffii (selaginella) accession and ontology information
+	tomato	v6.3	Solanum lycopersicum (tomato) accession and ontology information
+	fly-o	v6.3	Drosophila melanogaster (fly) accession and ontology information
+	ciona	v6.3	Ciona intestinalis (ciona) accession and ontology information
+	rat-o	v6.3	Rattus norvegicus (rat) accession and ontology information
+	volvox	v6.3	Volvox carteri (volvox) accession and ontology information
+	mushroom	v6.3	Agaricus bisporus (mushroom) accession and ontology information
+	cow	v5.4	Bos taurus (cow) accession and ontology information
+	frog-o	v6.3	Xenopus tropicalis (frog) accession and ontology information
+	worm-o	v6.3	Caenorhabditis elegans (worm) accession and ontology information
+	ciliate	v6.3	Tetrahymena thermophila (ciliate) accession and ontology information
+	ascomycetes	v6.0	Neurospora crassa (ascomycetes) accession and ontology information
+	balbc_cocci	v5.2	Balbc/J mouse and Coccidioides genome combined
+	bee	v6.3	Apis mellifera (bee) accession and ontology information
+	rhesus	v6.3	Macaca mulatta (rhesus) accession and ontology information
+	seahare	v6.3	Aplysia californica (seahare) accession and ontology information
+	mouse-o	v6.3	Mus musculus (mouse) accession and ontology information
+	patens	v6.3	Physcomitrella patens (patens) accession and ontology information
+	ncrassa	v6.3	Neurospora crassa (ncrassa) accession and ontology information
+	diatom	v6.3	Phaeodactylum tricornutum (diatom) accession and ontology information
+	pseudonana	v6.3	Thalassiosira pseudonana (pseudonana) accession and ontology information
+	pombe	v6.3	Schizosaccharomyces pombe (pombe) accession and ontology information
+	arabidopsis-o	v6.3	Arabidopsis thaliana (arabidopsis) accession and ontology information
+	hydra	v6.3	Hydra vulgaris (hydra) accession and ontology information
+	zebrafinch	v6.3	Taeniopygia guttata (zebrafinch) accession and ontology information
+	laevis	v6.3	Xenopus laevis (laevis) accession and ontology information
+	dicty	v6.3	Dictyostelium discoideum (dicty) accession and ontology information
+	yeast-o	v6.3	Saccharomyces cerevisiae (yeast) accession and ontology information
PROMOTERS
+	arabidopsis-p	v6.3	arabidopsis promoters (arabidopsis)
+	yeast-p	v5.5	yeast promoters (yeast)
+	chicken-p	v5.5	chicken promoters (chicken)
+	mouse-p	v5.5	mouse promoters (mouse)
+	rat-p	v5.5	rat promoters (rat)
+	fly-p	v5.5	fly promoters (fly)
+	human-p	v5.5	human promoters (human)
+	worm-p	v5.5	worm promoters (worm)
+	zebrafish-p	v5.5	zebrafish promoters (zebrafish)
+	frog-p	v5.5	frog promoters (frog)
GENOMES
+	hg17	v6.4	human genome and annotation for UCSC hg17
+	susScr3	v6.4	pig genome and annotation for UCSC susScr3
+	dm6	v6.4	fly genome and annotation for UCSC dm6
+	rheMac8	v6.4	rhesus genome and annotation for UCSC rheMac8
+	hg19	v6.4	human genome and annotation for UCSC hg19
+	sacCer2	v6.4	yeast genome and annotation for UCSC sacCer2
+	gorGor5	v6.4	human genome and annotation for UCSC gorGor5
+	AGPv3	v5.10	corn genome and annotation (AGPv3)
+	ce6	v6.4	worm genome and annotation for UCSC ce6
+	patens.ASM242v1	v5.10	patens genome and annotation (patens.ASM242v1)
+	rn5	v6.4	rat genome and annotation for UCSC rn5
+	danRer7	v6.4	zebrafish genome and annotation for UCSC danRer7
+	apiMel3	v6.4	bee genome and annotation for UCSC apiMel3
+	fr3	v6.4	fugu genome and annotation for UCSC fr3
+	ce10	v6.4	worm genome and annotation for UCSC ce10
+	xenTro3	v6.4	frog genome and annotation for UCSC xenTro3
+	strPur2	v6.0	urchin genome and annotation for UCSC strPur2
+	rice.IRGSP-1.0	v5.10	rice genome and annotation (rice.IRGSP-1.0)
+	mm9	v6.4	mouse genome and annotation for UCSC mm9
+	mm10	v6.4	mouse genome and annotation for UCSC mm10
+	panPan2	v6.4	human genome and annotation for UCSC panPan2
+	ce11	v6.4	worm genome and annotation for UCSC ce11
+	mm8	v6.4	mouse genome and annotation for UCSC mm8
+	rn6	v6.4	rat genome and annotation for UCSC rn6
+	corn.AGPv3	v5.10	corn genome and annotation (corn.AGPv3)
+	rn4	v6.4	rat genome and annotation for UCSC rn4
+	papAnu2	v6.4	human genome and annotation for UCSC papAnu2
+	gorGor3	v6.4	human genome and annotation for UCSC gorGor3
+	danRer10	v6.4	zebrafish genome and annotation for UCSC danRer10
+	petMar3	v6.4	lamprey genome and annotation for UCSC petMar3
+	galGal4	v6.4	chicken genome and annotation for UCSC galGal4
+	panTro3	v6.4	human genome and annotation for UCSC panTro3
+	hg18	v6.4	human genome and annotation for UCSC hg18
+	ci2	v6.4	ciona genome and annotation for UCSC ci2
+	canFam3	v6.4	dog genome and annotation for UCSC canFam3
+	galGal5	v6.4	chicken genome and annotation for UCSC galGal5
+	taeGut2	v6.4	zebrafinch genome and annotation for UCSC taeGut2
+	anoGam1	v6.4	mosquito genome and annotation for UCSC anoGam1
+	panTro5	v6.4	human genome and annotation for UCSC panTro5
+	galGal6	v6.4	chicken genome and annotation for UCSC galGal6
+	apiMel2	v6.4	bee genome and annotation for UCSC apiMel2
+	danRer11	v6.4	zebrafish genome and annotation for UCSC danRer11
+	xenTro2	v6.4	frog genome and annotation for UCSC xenTro2
+	hg38	v6.4	human genome and annotation for UCSC hg38
+	sacCer3	v6.4	yeast genome and annotation for UCSC sacCer3
+	petMar2	v6.4	lamprey genome and annotation for UCSC petMar2
+	dm3	v6.0	fly genome and annotation for UCSC dm3
+	panPan1	v6.4	human genome and annotation for UCSC panPan1
+	ci3	v6.4	ciona genome and annotation for UCSC ci3
+	aplCal1	v6.4	seahare genome and annotation for UCSC aplCal1
+	panTro6	v6.4	human genome and annotation for UCSC panTro6
+	panTro4	v6.4	human genome and annotation for UCSC panTro4
+	xenTro9	v6.4	frog genome and annotation for UCSC xenTro9
+	susScr11	v6.4	pig genome and annotation for UCSC susScr11
+	rheMac2	v6.4	rhesus genome and annotation for UCSC rheMac2
+	gorGor4	v6.4	human genome and annotation for UCSC gorGor4
+	tetNig2	v6.4	fugu genome and annotation for UCSC tetNig2
+	tair10	v6.0	arabidopsis genome and annotation (tair10)
+	rheMac3	v6.4	rhesus genome and annotation for UCSC rheMac3
+	xenTro7	v6.4	frog genome and annotation for UCSC xenTro7
SETTINGS

jump to list

 findMotifs.pl

	
	Program will find de novo and known motifs in a gene list

		Usage:  findMotifs.pl <input list> <promoter set> <output directory> [additoinal options]

		example: findMotifs.pl genelist.txt mouse motifResults/ -len 10

		FASTA example: findMotifs.pl targets.fa fasta motifResults/ -fasta background.fa

	Available Promoter Sets: Add custom promoters sets with loadPromoters.pl
		worm	worm	/usr/local/apps/eb/Homer/4.11-foss-2016b/.//data/promoters/	-2000	2000	refseq
		zebrafish	zebrafish	/usr/local/apps/eb/Homer/4.11-foss-2016b/.//data/promoters/	-2000	2000	refseq
		rat	rat	/usr/local/apps/eb/Homer/4.11-foss-2016b/.//data/promoters/	-2000	2000	refseq
		fly	fly	/usr/local/apps/eb/Homer/4.11-foss-2016b/.//data/promoters/	-2000	2000	refseq
		yeast	yeast	/usr/local/apps/eb/Homer/4.11-foss-2016b/.//data/promoters/	-2000	2000	orf
		mouse	mouse	/usr/local/apps/eb/Homer/4.11-foss-2016b/.//data/promoters/	-2000	2000	refseq
		arabidopsis	arabidopsis	/usr/local/apps/eb/Homer/4.11-foss-2016b/.//data/promoters/	-2000	2000	refseq
		chicken	chicken	/usr/local/apps/eb/Homer/4.11-foss-2016b/.//data/promoters/	-2000	2000	refseq
		frog	frog	/usr/local/apps/eb/Homer/4.11-foss-2016b/.//data/promoters/	-2000	2000	refseq
		human	human	/usr/local/apps/eb/Homer/4.11-foss-2016b/.//data/promoters/	-2000	2000	refseq

		Try typing "perl /usr/local/apps/eb/Homer/4.11-foss-2016b/.//configureHomer.pl -list" to see available promoter sets
		Typing "perl /usr/local/apps/eb/Homer/4.11-foss-2016b/.//configureHomer.pl -install NNN" to install promoter set NNN

	Basic options:
		-len <#>[,<#>,<#>...] (motif length, default=8,10,12) [NOTE: values greater 12 may cause the program
			to run out of memmory - in these cases decrease the number of sequences analyzed]
		-bg <background file> (ids to use as background, default: all genes)
		-start <#> (offset from TSS, default=-300) [max=based on Promoter Set]
		-end <#> (offset from TSS, default=50) [max=based on Promoter Set]
		-rna (output RNA motif logos and compare to RNA motif database, automatically sets -norevopp)
		-mask/-nomask (use/don't use repeatmasked files, default: -mask)
		-S <#> (Number of motifs to optimize, default: 25)
		-mis <#> (global optimization: searches for strings with # mismatches, default: 1)
		-noconvert (will not worry about converting input files into unigene ids)
		-norevopp (do not search the reverse strand for motifs)
		-nomotif (don't search for de novo motif enrichment)

	Scanning sequence for motifs
		-find <motif file> (This will cause the program to only scan for motifs)

	Including Enhancers - peak files of enhancer location, peak ID should be gene ID
		-enhancers <peak file> <genome verion>
			(enhancers to include in search space, peaks/sequences should be named with a gene ID
			If multiple enhancers per gene, use the same gene ID, and all will be included)
		-enhancersOnly (do not include promoter sequence in motif search)

	FASTA files: If you prefer to use your own fasta files, place target sequences and 
		background sequences in two separate FASTA formated files (must have unique identifiers)
		Target File - use in place of <input list> (i.e. the first argument)
		Background File - after output directory (with additional options) use the argument:
			-fastaBg <background fasta file> (This is recommended for fasta based analysis)
		In place of the promoter set use "fasta", or any valid set (this parameter is ignored)
		When finding motifs [-find], only the target file with be searched)
			-chopify (chops up background regions to match size of target regions)
				i.e. if background is a full genome or all mRNAs

	Known Motif Options/Visualization:
		-mset <vertebrates|insects|worms|plants|yeast|all> (check against motif collects, default: auto)
		-basic (don't check de novo motifs for similarity to known motifs)
		-bits (scale sequence logos by information content, default: doesn't scale)
		-nocheck (don't check for similarity between novo motif motifs and known motifs)
		-mcheck <motif file> (known motifs to check against de novo motifs,
		-noknown (don't search for known motif enrichment, default: -known)
		-mknown <motif file> (known motifs to check for enrichment,
		-nofacts (omit humor)
		-seqlogo (uses weblogo/seqlogo/ghostscript to visualize motifs, default uses SVG)

	Advanced options:
		-b (use binomial distribution to calculate p-values, hypergeometric is default)
		-nogo (don't search for gene ontology enrichment)
		-humanGO (Convert IDs to human for GO analysis)
		-ontology <ont.genes> [ont.genes] ... (custom ontologies for GO analysis)
		-noweight (no CG correction)
		-noredun (Don't remove predetermined redundant promoters/sequences)
		-g (input file is a group file, i.e. 1st column = id, 2nd = 0 or 1 [1=target,0=back])
		-cpg (use CpG% instead of GC% for sequence normalization)
		-rand (randomize labels for target and backgound sequences)
		-maskMotif <motif file 1> [motif file 2] ... (motifs to mask before motif finding)
		-opt <motif file 1> [motif file 2] ... (motifs to optimize/change length)
		-peaks (will produce peak file of promoters to use with findMotifsGenome.pl)
		-nowarn (no warnings)
		-keepFiles (don't delete temporary files)
		-dumpFasta (create target.fa and background.fa files)
		-min <#> (remove sequences shorter than #, default: 0)
		-max <#> (remove sequences longer than #, default: 1e10)
		-reuse (rerun homer using old seq files etc. with new options
			  and ignores input list, organism)
		-fdr <#> (Calculate empirical FDR for de novo discovery #=number of randomizations)

	homer2 specific options:
		-homer2 (use homer2 instead of original homer, default)
		-nlen <#> (length of lower-order oligos to normalize - general sequences, default: 3)
			-nmax <#> (Max normalization iterations, default: 160)
			-neutral (weight sequences to neutral frequencies, i.e. 25%, 6.25%, etc.)
		-olen <#> (lower-order oligo normalization for oligo table, use if -nlen isn't working well)
		-p <#> (Number of processors to use, default: 1)
		-e <#> (Maximum expected motif instance per bp in random sequence, default: 0.01)
		-cache <#> (size in MB for statistics cache, default: 500)
		-quickMask (skip full masking after finding motifs, similar to original homer)
		-homer1 (to force the use of the original homer)
		-minlp <#> (stop looking for motifs when seed logp score gets above #, default: -10)

	Original homer specific options:
		-float (allow adjustment of the degeneracy threshold for known motifs to improve p-value[dangerous])
		-homer1 (to force the use of the original homer)
		-depth [low|med|high|allnight] (time spent on local optimization default: med)


Back to Top

Installation

Source code from Homer

System

64-bit Linux