Bioinformatics Databases: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
As part of our services, the GACRC builds and hosts local copies of frequently cited application data, and provides assistance for sharing data among GACRC members, in the commonly shared "/work/db" filesystem.
As part of our services, the GACRC builds and hosts local copies of frequently cited application data, and provides assistance for sharing data among GACRC members. Datasets are organized by date and are uploaded at the beginning of each month. They are located in the commonly shared "/work/db" filesystem.  


Of the public local databases in /work/db, NCBI's nr and nt datasets are the most commonly used at the GACRC. GACRC staff will update these datasets every other month in fasta format. From these updated datasets, we will also build NCBI Blast and WUBlast databases in both nucleotide and protein formats.
Datasets can be loaded in a similar way to software modules. This allows users to replicate results by always being able to use a time stamped version of a database.To search for available dataset modules run the command:


Various subject datasets, e.g. pfam, bowtie indexes of human and mouse, and NCBI bacterial datasets, are hosted as well. These datasets either don’t need updating, the source does not update frequently, or the datasets are not used frequently by our users. These datasets will only be updated by user request.
ml spider ncbiblastdb
 
Of the public local databases in /work/db, NCBI's nr and nt datasets are the most commonly used at the GACRC. GACRC staff will update these datasets every month as well as the cdd, human_genome, mouse_genome, nrte, refseq_protein, refseq_rna, swissprot, and taxdb datasets.
 
Various subject datasets, e.g. pfam, bowtie indexes of human and mouse, and NCBI bacterial datasets, are hosted as well. These datasets either don’t need updating, the source does not update frequently, or the datasets are not used frequently by our users. These datasets will only be updated by user request, at which point a module will be created so that the user can be assured they can replicate results in the future.


For datasets requested by individual lab groups, GACRC staff will assist in setting up a group-shared environment and request that group members maintain their database files there.
For datasets requested by individual lab groups, GACRC staff will assist in setting up a group-shared environment and request that group members maintain their database files there.
Line 10: Line 14:


The set of data which are regularly updated is open to review, and can be expanded based on available GACRC resources.
The set of data which are regularly updated is open to review, and can be expanded based on available GACRC resources.





Revision as of 00:23, 24 June 2021

As part of our services, the GACRC builds and hosts local copies of frequently cited application data, and provides assistance for sharing data among GACRC members. Datasets are organized by date and are uploaded at the beginning of each month. They are located in the commonly shared "/work/db" filesystem.

Datasets can be loaded in a similar way to software modules. This allows users to replicate results by always being able to use a time stamped version of a database.To search for available dataset modules run the command:

ml spider ncbiblastdb

Of the public local databases in /work/db, NCBI's nr and nt datasets are the most commonly used at the GACRC. GACRC staff will update these datasets every month as well as the cdd, human_genome, mouse_genome, nrte, refseq_protein, refseq_rna, swissprot, and taxdb datasets.

Various subject datasets, e.g. pfam, bowtie indexes of human and mouse, and NCBI bacterial datasets, are hosted as well. These datasets either don’t need updating, the source does not update frequently, or the datasets are not used frequently by our users. These datasets will only be updated by user request, at which point a module will be created so that the user can be assured they can replicate results in the future.

For datasets requested by individual lab groups, GACRC staff will assist in setting up a group-shared environment and request that group members maintain their database files there.

For datasets frequently updated by their source, the GACRC encourages users to maintain their own copies of these public databases.

The set of data which are regularly updated is open to review, and can be expanded based on available GACRC resources.



Installed Bioinformatics Databases

Name Version Cluster
Bacteria NCBI 12/21/2017 Sapelo2
gss Sapelo2
htgs Sapelo2
hg Sapelo2
NCBI BLAST Database every other month Sapelo2
NCBI Fasta every other month Sapelo2
pfam 27.0 Sapelo2
Refseq Sapelo2
TaxDB Sapelo2
Uniprot 06/28/2018 Sapelo2
Uniref 06/28/2018 Sapelo2
wublast Sapelo2
decontaMiner 08/30/2019 Sapelo2