SRAToolKit-Sapelo2: Difference between revisions
No edit summary |
No edit summary |
||
Line 39: | Line 39: | ||
=== Downloading SRA Data === | === Downloading SRA Data === | ||
You can download SRA data to local directory with the prefetch tool. | You can download SRA data to local directory with the prefetch tool. This program downloads Runs (sequence files in the compressed SRA format) and all additional data necessary to convert the Run from the SRA format to a more commonly used format. You can search for a dataset using the search bar at the top of the SRA homepage. https://www.ncbi.nlm.nih.gov/sra Once you find a dataset you would like to download, search for the "Run number" in the table towards the bottom of the webpage for that dataset. Then create the folder where prefetch will deposit your files. This needs to be an empty folder. | ||
Next, run the command: | |||
<pre>vdb-config --interactive</pre> | |||
This will open a screen where you operate the buttons by pressing the letter highlighted in red, or by pressing the tab-key until the wanted button is reached and then pressing the space- or the enter-key. Make sure there is an X by the "Enable Remote Access" option on the MAIN tab, and X by the "enable local file-caching" option in the CACHE tab. Then set the "location of user-repository" to the empty folder you created. In the following image the data will be downloaded to /home/keeko/prefetchData. | |||
[[File:Sratools.png|thumb]] | |||
Then press "s" or navigate to the save button and press enter to save. Then press "x" or navigate to the exit button and press enter to exit. Now you can start the data download by running the command prefetch followed by the run number. For example the following downloads the dataset SRR390728. | |||
<pre>fetch SRR390728</pre> | |||
For more information about the prefetch command refer to the [https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=prefetch documentation] | For more information about the prefetch command refer to the [https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=prefetch documentation] | ||
Revision as of 19:53, 25 September 2021
Category
BioInformatics
Program On
Sapelo2
Version
2.9.6-1,2.10.8,2.11.1
Author / Distributor
Please see https://github.com/ncbi/sra-tools
Description
The SRA Toolkit from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives. The Sequence Read Archives (SRA) store raw sequence data from "next-generation" sequencing technologies including Illumina, 454, IonTorrent, Complete Genomics, PacBio and OxfordNanopores. In addition to raw sequence data, SRA now stores alignment information in the form of read placements on a reference sequence. The SRA-Toolkit includes the following tools:
fastq-dump: Convert SRA data into fastq format
prefetch: Allows command-line downloading of SRA, dbGaP, and ADSP data
sam-dump: Convert SRA data to sam format
sra-pileup: Generate pileup statistics on aligned SRA data
vdb-config: Display and modify VDB configuration information
vdb-decrypt: Decrypt non-SRA dbGaP data ("phenotype data")
abi-dump: Convert SRA data into ABI format (csfasta / qual)
illumina-dump: Convert SRA data into Illumina native formats (qseq, etc.)
sff-dump: Convert SRA data to sff format
sra-stat: Generate statistics about SRA data (quality distribution, etc.)
vdb-dump: Output the native VDB format of SRA data.
vdb-encrypt: Encrypt non-SRA dbGaP data ("phenotype data")
vdb-validate: Validate the integrity of downloaded SRA data
Downloading SRA Data
You can download SRA data to local directory with the prefetch tool. This program downloads Runs (sequence files in the compressed SRA format) and all additional data necessary to convert the Run from the SRA format to a more commonly used format. You can search for a dataset using the search bar at the top of the SRA homepage. https://www.ncbi.nlm.nih.gov/sra Once you find a dataset you would like to download, search for the "Run number" in the table towards the bottom of the webpage for that dataset. Then create the folder where prefetch will deposit your files. This needs to be an empty folder. Next, run the command:
vdb-config --interactive
This will open a screen where you operate the buttons by pressing the letter highlighted in red, or by pressing the tab-key until the wanted button is reached and then pressing the space- or the enter-key. Make sure there is an X by the "Enable Remote Access" option on the MAIN tab, and X by the "enable local file-caching" option in the CACHE tab. Then set the "location of user-repository" to the empty folder you created. In the following image the data will be downloaded to /home/keeko/prefetchData.
Then press "s" or navigate to the save button and press enter to save. Then press "x" or navigate to the exit button and press enter to exit. Now you can start the data download by running the command prefetch followed by the run number. For example the following downloads the dataset SRR390728.
fetch SRR390728
For more information about the prefetch command refer to the documentation
Running Program
Also refer to Running Jobs on Sapelo2
For more information on Environment Modules on Sapelo please see the Lmod page.
Documentation
Please see https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc for the documentation of each tool.
Installation
System
64-bit Linux