PASA-Sapelo2: Difference between revisions
No edit summary |
No edit summary |
||
(8 intermediate revisions by the same user not shown) | |||
Line 13: | Line 13: | ||
=== Author / Distributor === | === Author / Distributor === | ||
https://github.com/PASApipeline/PASApipeline | https://github.com/PASApipeline/PASApipeline | ||
https://github.com/PASApipeline/PASApipeline/wiki | https://github.com/PASApipeline/PASApipeline/wiki | ||
=== Description === | === Description === | ||
"PASA, acronym for Program to Assemble Spliced Alignments (and pronounced 'pass-uh'), is a eukaryotic genome annotation tool that exploits spliced alignments of expressed transcript sequences to automatically model gene structures, and to maintain gene structure annotation consistent with the most recently available experimental sequence data. PASA also identifies and classifies all splicing variations supported by the transcript alignments." | "PASA, acronym for Program to Assemble Spliced Alignments (and pronounced 'pass-uh'), is a eukaryotic genome annotation tool that exploits spliced alignments of expressed transcript sequences to automatically model gene structures, and to maintain gene structure annotation consistent with the most recently available experimental sequence data. PASA also identifies and classifies all splicing variations supported by the transcript alignments." | ||
Line 22: | Line 23: | ||
=== Running Program === | === Running Program === | ||
* version 2.5.3 with MySQL support is installed as an Apptainer container in /apps/singularity-images/pasa-2.5.3-MySQL/ | * version 2.5.3 with MySQL support is installed as an Apptainer container in /apps/singularity-images/pasa-2.5.3-MySQL/. The image name is pasa-2.5.3-mysql-production.sif. | ||
To use this version of PASA | To use this version of PASA in a batch job, please follow these setup steps in your current job working directory '''before''' submitting a batch job: | ||
<pre class="gscript"> | <pre class="gscript"> | ||
mkdir ./workDir | mkdir ./workDir | ||
cd | cd workDir | ||
cp -r /apps/singularity-images/pasa-2.5.3-MySQL/{1_init.sh,2_create_user_and_db.sh,3_cleanup.sh,pasa_conf,sub.sh,sample_data} . | cp -r /apps/singularity-images/pasa-2.5.3-MySQL/{1_init.sh,2_create_user_and_db.sh,3_cleanup.sh,pasa_conf,sub.sh,sample_data} . | ||
</pre> | </pre> | ||
Line 34: | Line 35: | ||
Note: | Note: | ||
Running the above cp commands will copy the following setup files and | Running the above cp commands will copy the following setup files and folders to your current working directory: | ||
* Three MySQL config scripts: '''1_init.sh''', '''2_create_user_and_db.sh''', and '''3_cleanup.sh''' | * Three MySQL config scripts: '''1_init.sh''', '''2_create_user_and_db.sh''', and '''3_cleanup.sh''' | ||
Line 40: | Line 41: | ||
* Sample batch job submission script: '''sub.sh''' | * Sample batch job submission script: '''sub.sh''' | ||
* Sample data folder provided by PASA : '''sample_data/''' | * Sample data folder provided by PASA : '''sample_data/''' | ||
Please note: | |||
* PASA requires an environment variable called PASAHOME. In this container it is defined as /opt/pasa/opt/pasa-2.5.3/ . | |||
* PASA's Perl scripts, for example, build_comprehensive_transcriptome.dbi, Load_Current_Gene_Annotations.dbi, and others, are installed in $PASAHOME'''/scripts/ .''' | |||
Below is the sample batch job submission script ('''sub.sh''') to run PASA in a batch job using 20 CPU cores on the batch partition: | |||
Below is the sample batch job submission script ('''sub.sh''') to run PASA (alignment assembly pipeline) in a batch job using 20 CPU cores on the batch partition: | |||
<div class="gscript2"> | <div class="gscript2"> | ||
Line 55: | Line 61: | ||
<nowiki>#</nowiki>SBATCH --mail-type=ALL<br> | <nowiki>#</nowiki>SBATCH --mail-type=ALL<br> | ||
<nowiki>#</nowiki>SBATCH --mail-user=<u>username</u>@uga.edu<br> | <nowiki>#</nowiki>SBATCH --mail-user=<u>username</u>@uga.edu<br> | ||
<br> | |||
cd $SLURM_SUBMIT_DIR | cd $SLURM_SUBMIT_DIR | ||
<br> | |||
<br> | |||
<nowiki>#</nowiki> Initialize and start MySQL from inside of the container | <nowiki>#</nowiki> Initialize and start MySQL from inside of the container | ||
<br> | |||
./1_init.sh && ./2_create_user_and_db.sh | ./1_init.sh && ./2_create_user_and_db.sh | ||
<br> | |||
<br> | |||
<nowiki>#</nowiki> unzip the sample input date genome_sample.fasta.gz | <nowiki>#</nowiki> unzip the sample input date genome_sample.fasta.gz | ||
<br> | |||
< | |||
gunzip sample_data/genome_sample.fasta.gz | gunzip sample_data/genome_sample.fasta.gz | ||
<br> | |||
<br> | |||
<nowiki>#</nowiki> Run PASA pipeline with genome_sample.fasta | <nowiki>#</nowiki> Run PASA pipeline with genome_sample.fasta | ||
<br> | |||
apptainer exec instance://pasa-mysql /bin/bash -c "source activate /opt/pasa && \$PASAHOME/Launch_PASA_pipeline.pl \ | apptainer exec instance://pasa-mysql /bin/bash -c "source activate /opt/pasa && \$PASAHOME/Launch_PASA_pipeline.pl \<br> | ||
'''--CPU <u>20</u> \'''<br> | |||
'''--CPU <u>20</u> \''' | '''--config ./sample_data/mysql.confs/alignAssembly.config --create --run \'''<br> | ||
'''--ALIGNER gmap --genome ./sample_data/genome_sample.fasta --transcripts ./sample_data/all_transcripts.fasta.clean"'''<br> | |||
'''--config ./sample_data/mysql.confs/alignAssembly.config --create --run \''' | <br> | ||
'''--ALIGNER gmap --genome ./sample_data/genome_sample.fasta --transcripts ./sample_data/all_transcripts.fasta.clean"''' | |||
<nowiki>#</nowiki> Clean up on the compute node and shut down MySQL | <nowiki>#</nowiki> Clean up on the compute node and shut down MySQL | ||
<br> | |||
./3_cleanup.sh | ./3_cleanup.sh | ||
Latest revision as of 11:53, 16 June 2025
Category
Bioinformatics
Program On
Sapelo2
Version
2.5.3
Author / Distributor
https://github.com/PASApipeline/PASApipeline
https://github.com/PASApipeline/PASApipeline/wiki
Description
"PASA, acronym for Program to Assemble Spliced Alignments (and pronounced 'pass-uh'), is a eukaryotic genome annotation tool that exploits spliced alignments of expressed transcript sequences to automatically model gene structures, and to maintain gene structure annotation consistent with the most recently available experimental sequence data. PASA also identifies and classifies all splicing variations supported by the transcript alignments."
More details are at https://github.com/PASApipeline/PASApipeline/wiki
Running Program
- version 2.5.3 with MySQL support is installed as an Apptainer container in /apps/singularity-images/pasa-2.5.3-MySQL/. The image name is pasa-2.5.3-mysql-production.sif.
To use this version of PASA in a batch job, please follow these setup steps in your current job working directory before submitting a batch job:
mkdir ./workDir cd workDir cp -r /apps/singularity-images/pasa-2.5.3-MySQL/{1_init.sh,2_create_user_and_db.sh,3_cleanup.sh,pasa_conf,sub.sh,sample_data} .
Note:
Running the above cp commands will copy the following setup files and folders to your current working directory:
- Three MySQL config scripts: 1_init.sh, 2_create_user_and_db.sh, and 3_cleanup.sh
- PASA config folder: pasa_conf/
- Sample batch job submission script: sub.sh
- Sample data folder provided by PASA : sample_data/
Please note:
- PASA requires an environment variable called PASAHOME. In this container it is defined as /opt/pasa/opt/pasa-2.5.3/ .
- PASA's Perl scripts, for example, build_comprehensive_transcriptome.dbi, Load_Current_Gene_Annotations.dbi, and others, are installed in $PASAHOME/scripts/ .
Below is the sample batch job submission script (sub.sh) to run PASA (alignment assembly pipeline) in a batch job using 20 CPU cores on the batch partition:
#!/bin/bash
#SBATCH --job-name=pasa-mysql
#SBATCH --partition=batch
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=20
#SBATCH --mem=80gb
#SBATCH --time=48:00:00
#SBATCH --output=log.%j.out
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
cd $SLURM_SUBMIT_DIR
# Initialize and start MySQL from inside of the container
./1_init.sh && ./2_create_user_and_db.sh
# unzip the sample input date genome_sample.fasta.gz
gunzip sample_data/genome_sample.fasta.gz
# Run PASA pipeline with genome_sample.fasta
apptainer exec instance://pasa-mysql /bin/bash -c "source activate /opt/pasa && \$PASAHOME/Launch_PASA_pipeline.pl \
--CPU 20 \
--config ./sample_data/mysql.confs/alignAssembly.config --create --run \
--ALIGNER gmap --genome ./sample_data/genome_sample.fasta --transcripts ./sample_data/all_transcripts.fasta.clean"
# Clean up on the compute node and shut down MySQL
./3_cleanup.sh
Note:
- In the real submission script, at least all the above underlined values in Slurm headers need to be reviewed or to be replaced by the proper values.
- In the real submission script, the above the command lines in bold font can be replaced by your own PASA command lines.
Please refer to Running Jobs on Sapelo2 for more information running jobs on the Sapelo2 cluster.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
https://github.com/PASApipeline/PASApipeline https://github.com/PASApipeline/PASApipeline/wiki
Installation
Source code is obtained from https://github.com/PASApipeline/PASApipeline
System
64-bit Linux