PfamDB-Sapelo2: Difference between revisions
(Created page with "Category:Sapelo2Category:SoftwareCategory:BioinformaticsCategory:Bioinformatics Database A database for Hmmer and Pfam_scan program There are mult...") |
No edit summary |
||
(2 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
[[Category:Sapelo2]][[Category:Software]][[Category:Bioinformatics]][[Category:Bioinformatics Database]] | [[Category:Sapelo2]][[Category:Software]][[Category:Bioinformatics]][[Category:Bioinformatics Database]] | ||
A database for [[Hmmer]] and [[Pfam_scan]] program | A database for [[Hmmer]] and [[Pfam_scan]] program | ||
The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). In some releases Pfam includes both Pfam-A and Pfam-B. Because the entries in Pfam-A do not cover all known proteins, a supplement was provided called Pfam-B. Although of lower quality, Pfam-B families could be useful when no Pfam-A families were found. Pfam-B has been discontinued and reintroduced in various releases. | |||
More information about Pfam can be found at the [http://pfam.xfam.org/ Pfam website] | |||
There are multiple versions under /db/pfam | There are multiple versions under /db/pfam | ||
The Pfam database must be formatted to work with certain software such as HMMER or Pfam_scan. The non-formulated downloads are numbered with the version of the Pfam database.The currently available versions are 27.0 33.0 and 34.0. The versions which have been formatted to work with HMMER are listed with the Pfam version followed by the HMMER version they have been formatted for. The currently available versions are 27.0-hmmer3.0 and 34.0-hmmer3.3.2. | |||
<pre class="gscript"> | <pre class="gscript"> | ||
Line 8: | Line 13: | ||
</pre> | </pre> | ||
The | The commands to generate the data is documented at the UGA file sitting at the same directory with data: | ||
<pre class="gcommand"> | <pre class="gcommand"> | ||
formatdb -i Pfam-A.fasta -n Pfam-A -t Pfam-A -l Pfam-A.log -p T | formatdb -i Pfam-A.fasta -n Pfam-A -t Pfam-A -l Pfam-A.log -p T | ||
Line 18: | Line 23: | ||
xdformat -p -x Pfam-A.fasta | xdformat -p -x Pfam-A.fasta | ||
xdformat -p Pfam-B.fasta | xdformat -p Pfam-B.fasta | ||
</pre> | |||
If you would like a particular version of the Pfam database to work with a particular version of HMMER you can download or prepare this database yourself. To format Pfam for use with HMMER you can use the function, hmmpress, which comes built into HMMER. The hmmpress step is required for hmmscan to work. The following commands can be used to prepare pfam.hmm data. | |||
<pre class="gcommand"> | |||
ml HMMER/3.1b2-foss-2016b #use whichever version of HMMER you would like to use for hmmscan | |||
wget http://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam34.0/Pfam-A.hmm.gz #wget which ever version of Pfam you would like | |||
gunzip Pfam-A.hmm.gz | |||
hmmpress Pfam-A.hmm | |||
</pre> | </pre> | ||
Line 29: | Line 44: | ||
</pre> | </pre> | ||
Databases are download install files from [ | Databases are download install files from [http://bio.nic.funet.fi/pub/mirrors/ftp.sanger.ac.uk/pub/databases/Pfam/releases/ sanger] website. |
Latest revision as of 13:39, 13 October 2021
A database for Hmmer and Pfam_scan program
The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). In some releases Pfam includes both Pfam-A and Pfam-B. Because the entries in Pfam-A do not cover all known proteins, a supplement was provided called Pfam-B. Although of lower quality, Pfam-B families could be useful when no Pfam-A families were found. Pfam-B has been discontinued and reintroduced in various releases.
More information about Pfam can be found at the Pfam website
There are multiple versions under /db/pfam The Pfam database must be formatted to work with certain software such as HMMER or Pfam_scan. The non-formulated downloads are numbered with the version of the Pfam database.The currently available versions are 27.0 33.0 and 34.0. The versions which have been formatted to work with HMMER are listed with the Pfam version followed by the HMMER version they have been formatted for. The currently available versions are 27.0-hmmer3.0 and 34.0-hmmer3.3.2.
/db/pfam/
The commands to generate the data is documented at the UGA file sitting at the same directory with data:
formatdb -i Pfam-A.fasta -n Pfam-A -t Pfam-A -l Pfam-A.log -p T formatdb -i Pfam-B.fasta -n Pfam-A -t Pfam-A -l Pfam-A.log -p T ml HMMER/3.1b2-foss-2016b hmmpress Pfam-A.hmm hmmpress Pfam-B.hmm ml WU-Blast/2.2.6-foss-2016b xdformat -p -x Pfam-A.fasta xdformat -p Pfam-B.fasta
If you would like a particular version of the Pfam database to work with a particular version of HMMER you can download or prepare this database yourself. To format Pfam for use with HMMER you can use the function, hmmpress, which comes built into HMMER. The hmmpress step is required for hmmscan to work. The following commands can be used to prepare pfam.hmm data.
ml HMMER/3.1b2-foss-2016b #use whichever version of HMMER you would like to use for hmmscan wget http://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam34.0/Pfam-A.hmm.gz #wget which ever version of Pfam you would like gunzip Pfam-A.hmm.gz hmmpress Pfam-A.hmm
version built with hmmer 3.0 (Pfam_scan friendly)
/db/pfam/27.0-hmmer3.0/Pfam-A /db/pfam/27.0-hmmer3.0/Pfam-B /db/pfam/27.0-hmmer3.0/Pfam-A.hmm /db/pfam/27.0-hmmer3.0/Pfam-B.hmm
Databases are download install files from sanger website.