AlphaFold-Sapelo2: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
No edit summary
Line 21: Line 21:
For more information on Environment Modules on Sapelo2 please see the [[Lmod]] page.
For more information on Environment Modules on Sapelo2 please see the [[Lmod]] page.


*Version 2.0.0, installed as a conda environment in /apps/gb/AlphaFold/2.0.0_conda/
*Version 2.0.0, installed as a conda environment in /apps/gb/AlphaFold/2.0.0/


To use this version of AlphaFold, please first load the module with
To use this version of AlphaFold, please first load the module with
Line 28: Line 28:
</pre>
</pre>


Once you load the module, an environmental variable called EBROOTALPHAFOLD is exported. It stores the AlphaFold installation path on the cluster, i.e., /apps/gb/AlphaFold/2.0.0_conda. The bash script run_alphafold.sh in installed in EBROOTALPHAFOLD/alphafold, and the 2.2TB of database files are in /db/AlphaFold (this is the directory that you need to use for the -d option of run_alphafold.sh).  
Once you load the module, an environmental variable called EBROOTALPHAFOLD is exported. It stores the AlphaFold installation path on the cluster, i.e., /apps/gb/AlphaFold/2.0.0. The bash script run_alphafold.sh in installed in EBROOTALPHAFOLD/alphafold, and the 2.2TB of database files are in /db/AlphaFold (this is the directory that you need to use for the -d option of run_alphafold.sh).  


Sample job submission script (sub.sh) to run run_alphafold.sh in a batch job (without GPU):
Sample job submission script (sub.sh) to run run_alphafold.sh in a batch job (without GPU):

Revision as of 20:22, 28 July 2021


Category

Bioinformatics

Program On

Sapelo2

Version

2.0.0

Author / Distributor

Please see https://github.com/deepmind/alphafold

Description

From https://github.com/deepmind/alphafold: "This package provides an implementation of the inference pipeline of AlphaFold v2.0. This is a completely new model that was entered in CASP14 and published in Nature. "

Running Program

Also refer to Running Jobs on Sapelo2

For more information on Environment Modules on Sapelo2 please see the Lmod page.

  • Version 2.0.0, installed as a conda environment in /apps/gb/AlphaFold/2.0.0/

To use this version of AlphaFold, please first load the module with

ml AlphaFold/2.0.0_conda

Once you load the module, an environmental variable called EBROOTALPHAFOLD is exported. It stores the AlphaFold installation path on the cluster, i.e., /apps/gb/AlphaFold/2.0.0. The bash script run_alphafold.sh in installed in EBROOTALPHAFOLD/alphafold, and the 2.2TB of database files are in /db/AlphaFold (this is the directory that you need to use for the -d option of run_alphafold.sh).

Sample job submission script (sub.sh) to run run_alphafold.sh in a batch job (without GPU):

#!/bin/bash
#SBATCH --job-name=alphafoldjobname       
#SBATCH --partition=batch            
#SBATCH --ntasks=1                  	
#SBATCH --cpus-per-task=4        
#SBATCH --mem=20gb                    
#SBATCH --time=120:00:00           
#SBATCH --output=%x.%j.out     
#SBATCH --error=%x.%j.err          
#SBATCH --mail-user=username@uga.edu  
#SBATCH --mail-type=ALL   

cd $SLURM_SUBMIT_DIR

ml AlphaFold/2.0.0_conda

bash $EBROOTALPHAFOLD/alphafold/run_alphafold.sh -d /db/AlphaFold [options]

An example of the required options to use are

bash $EBROOTALPHAFOLD/alphafold/run_alphafold.sh -d /db/AlphaFold -o ./test/ -m model_1 -f ./query.fasta -t 2020-05-14


Sample job submission script (sub.sh) to run run_alphafold.sh in a batch job (with GPU):

#!/bin/bash
#SBATCH --job-name=alphafoldjobname    
#SBATCH --partition=gpu_p         
#SBATCH --ntasks=1                  	
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:1
#SBATCH --mem=40gb                    
#SBATCH --time=120:00:00           
#SBATCH --output=%x.%j.out     
#SBATCH --error=%x.%j.err          
#SBATCH --mail-user=username@uga.edu  
#SBATCH --mail-type=ALL   

cd $SLURM_SUBMIT_DIR

ml AlphaFold/2.0.0_conda

bash $EBROOTALPHAFOLD/alphafold/run_alphafold.sh -d /db/AlphaFold [options]

where $EBROOTALPHAFOLD is the environmental variable that stores the AlphaFold installation path on the cluster; [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.

Example of job submission

sbatch sub.sh 

Documentation

Details and references are at https://github.com/deepmind/alphafold.

ml AlphaFold/2.0.0_conda

bash $EBROOTALPHAFOLD/alphafold/run_alphafold.sh -h

Usage: /apps/gb/AlphaFold/2.0.0_conda/alphafold/run_alphafold.sh <OPTIONS>
Required Parameters:
-d <data_dir>     Path to directory of supporting data
-o <output_dir>   Path to a directory that will store the results.
-m <model_names>  Names of models to use (a comma separated list)
-f <fasta_path>   Path to a FASTA file containing one sequence
-t <max_template_date> Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets
Optional Parameters:
-b <benchmark>    Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many
    proteins (default: 'False')
-g <use_gpu>      Enable NVIDIA runtime to run with GPUs (default: 'True')
-a <gpu_devices>  Comma separated list of devices to pass to 'CUDA_VISIBLE_DEVICES' (default: 'all')
-p <preset>       Choose preset model configuration - no ensembling (full_dbs) or 8 model ensemblings (casp14) (default: 'full_dbs')


Back to Top

Installation

Installed using a conda environment following the steps in the dockerfile available at https://github.com/deepmind/alphafold. The run_alphafold.sh bash script was obtained from https://github.com/kalininalab/alphafold_non_docker and some documentation related to this script is available at that URL.

The database files are installed in /db/AlphaFold/

System

64-bit Linux