AlphaFold-Sapelo2: Difference between revisions
Line 39: | Line 39: | ||
</pre> | </pre> | ||
Once you load the module, an environmental variable called EBROOTALPHAFOLD is exported. It stores the AlphaFold installation path on the cluster, i.e., /apps/eb/AlphaFold/2.0.1-fosscuda-2020b. The python script run_alphafold.py | Once you load the module, an environmental variable called EBROOTALPHAFOLD is exported. It stores the AlphaFold installation path on the cluster, i.e., /apps/eb/AlphaFold/2.0.1-fosscuda-2020b. The python script run_alphafold.py is installed in EBROOTALPHAFOLD/bin and a symbolic link called alphafold points to it and can be used to run the program. The 2.2TB of database files are in /apps/db/AlphaFold. You can export the environment variable ALPHAFOLD_DATA_DIR to set the location of the database files. For bash, use | ||
<pre class="gscript"> | |||
export ALPHAFOLD_DATA_DIR=/apps/db/AlphaFold | |||
</pre> | |||
'''Note:''' This program does not work on the nodes with K20Xm GPU devices, because the CPUs on those nodes do not support AVX. If you run this program on the gpu_p partition, please request a K40 or a P100 GPU device. | '''Note:''' This program does not work on the nodes with K20Xm GPU devices, because the CPUs on those nodes do not support AVX. If you run this program on the gpu_p partition, please request a K40 or a P100 GPU device. | ||
Line 97: | Line 100: | ||
where $EBROOTALPHAFOLD is the environmental variable that stores the AlphaFold installation path on the cluster; [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well. You can also request a P100 device, using <code>#SBATCH --gres=gpu:P100:1</code> if you submit the job to the gpu_p partition. | where $EBROOTALPHAFOLD is the environmental variable that stores the AlphaFold installation path on the cluster; [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well. You can also request a P100 device, using <code>#SBATCH --gres=gpu:P100:1</code> if you submit the job to the gpu_p partition. | ||
Sample job submission script (sub.sh) to run AlphaFold 2.0.1 in a batch job (with GPU): | |||
<pre class="gscript"> | |||
#!/bin/bash | |||
#SBATCH --job-name=alphafoldjobname | |||
#SBATCH --partition=gpu_p | |||
#SBATCH --ntasks=1 | |||
#SBATCH --cpus-per-task=4 | |||
#SBATCH --gres=gpu:P100:1 | |||
#SBATCH --mem=40gb | |||
#SBATCH --time=120:00:00 | |||
#SBATCH --output=%x.%j.out | |||
#SBATCH --error=%x.%j.err | |||
#SBATCH --mail-user=username@uga.edu | |||
#SBATCH --mail-type=ALL | |||
cd $SLURM_SUBMIT_DIR | |||
ml AlphaFold/2.0.1-fosscuda-2020b | |||
export ALPHAFOLD_DATA_DIR=/apps/db/AlphaFold | |||
alphafold [options] | |||
</pre> | |||
where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well. | |||
Revision as of 20:52, 10 October 2021
Category
Bioinformatics
Program On
Sapelo2
Version
2.0.0, 2.0.1
Author / Distributor
Please see https://github.com/deepmind/alphafold
Description
From https://github.com/deepmind/alphafold: "This package provides an implementation of the inference pipeline of AlphaFold v2.0. This is a completely new model that was entered in CASP14 and published in Nature. "
Running Program
Also refer to Running Jobs on Sapelo2
For more information on Environment Modules on Sapelo2 please see the Lmod page.
- Version 2.0.0, installed as a conda environment in /apps/gb/AlphaFold/2.0.0/
To use this version of AlphaFold, please first load the module with
ml AlphaFold/2.0.0_conda
Once you load the module, an environmental variable called EBROOTALPHAFOLD is exported. It stores the AlphaFold installation path on the cluster, i.e., /apps/gb/AlphaFold/2.0.0. The bash script run_alphafold.sh in installed in EBROOTALPHAFOLD/alphafold, and the 2.2TB of database files are in /apps/db/AlphaFold (this is the directory that you need to use for the -d option of run_alphafold.sh).
Note: This program does not work on the nodes with K20Xm GPU devices, because the CPUs on those nodes do not support AVX. If you run this program on the gpu_p partition, please request a K40 or a P100 GPU device.
- Version 2.0.1, installed with EasyBuild in /apps/eb/AlphaFold/2.0.1-fosscuda-2020b/
To use this version of AlphaFold, please first load the module with
ml AlphaFold/2.0.1-fosscuda-2020b
Once you load the module, an environmental variable called EBROOTALPHAFOLD is exported. It stores the AlphaFold installation path on the cluster, i.e., /apps/eb/AlphaFold/2.0.1-fosscuda-2020b. The python script run_alphafold.py is installed in EBROOTALPHAFOLD/bin and a symbolic link called alphafold points to it and can be used to run the program. The 2.2TB of database files are in /apps/db/AlphaFold. You can export the environment variable ALPHAFOLD_DATA_DIR to set the location of the database files. For bash, use
export ALPHAFOLD_DATA_DIR=/apps/db/AlphaFold
Note: This program does not work on the nodes with K20Xm GPU devices, because the CPUs on those nodes do not support AVX. If you run this program on the gpu_p partition, please request a K40 or a P100 GPU device.
Sample job submission script (sub.sh) to run AlphaFold 2.0.0 using run_alphafold.sh in a batch job (without GPU):
#!/bin/bash #SBATCH --job-name=alphafoldjobname #SBATCH --partition=batch #SBATCH --ntasks=1 #SBATCH --cpus-per-task=4 #SBATCH --mem=20gb #SBATCH --time=120:00:00 #SBATCH --output=%x.%j.out #SBATCH --error=%x.%j.err #SBATCH --mail-user=username@uga.edu #SBATCH --mail-type=ALL cd $SLURM_SUBMIT_DIR ml AlphaFold/2.0.0_conda bash $EBROOTALPHAFOLD/alphafold/run_alphafold.sh -d /apps/db/AlphaFold [options]
An example of the required options to use are
bash $EBROOTALPHAFOLD/alphafold/run_alphafold.sh -d /apps/db/AlphaFold -o ./test/ -m model_1 -f ./query.fasta -t 2020-05-14
Sample job submission script (sub.sh) to run AlphaFold 2.0.0 using run_alphafold.sh in a batch job (with GPU):
#!/bin/bash #SBATCH --job-name=alphafoldjobname #SBATCH --partition=gpu_p #SBATCH --ntasks=1 #SBATCH --cpus-per-task=4 #SBATCH --gres=gpu:K40:1 #SBATCH --mem=40gb #SBATCH --time=120:00:00 #SBATCH --output=%x.%j.out #SBATCH --error=%x.%j.err #SBATCH --mail-user=username@uga.edu #SBATCH --mail-type=ALL cd $SLURM_SUBMIT_DIR ml AlphaFold/2.0.0_conda bash $EBROOTALPHAFOLD/alphafold/run_alphafold.sh -d /apps/db/AlphaFold [options]
where $EBROOTALPHAFOLD is the environmental variable that stores the AlphaFold installation path on the cluster; [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well. You can also request a P100 device, using #SBATCH --gres=gpu:P100:1
if you submit the job to the gpu_p partition.
Sample job submission script (sub.sh) to run AlphaFold 2.0.1 in a batch job (with GPU):
#!/bin/bash #SBATCH --job-name=alphafoldjobname #SBATCH --partition=gpu_p #SBATCH --ntasks=1 #SBATCH --cpus-per-task=4 #SBATCH --gres=gpu:P100:1 #SBATCH --mem=40gb #SBATCH --time=120:00:00 #SBATCH --output=%x.%j.out #SBATCH --error=%x.%j.err #SBATCH --mail-user=username@uga.edu #SBATCH --mail-type=ALL cd $SLURM_SUBMIT_DIR ml AlphaFold/2.0.1-fosscuda-2020b export ALPHAFOLD_DATA_DIR=/apps/db/AlphaFold alphafold [options]
where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.
Example of job submission
sbatch sub.sh
Documentation
Details and references are at https://github.com/deepmind/alphafold.
ml AlphaFold/2.0.0_conda bash $EBROOTALPHAFOLD/alphafold/run_alphafold.sh -h Usage: /apps/gb/AlphaFold/2.0.0_conda/alphafold/run_alphafold.sh <OPTIONS> Required Parameters: -d <data_dir> Path to directory of supporting data -o <output_dir> Path to a directory that will store the results. -m <model_names> Names of models to use (a comma separated list) -f <fasta_path> Path to a FASTA file containing one sequence -t <max_template_date> Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets Optional Parameters: -b <benchmark> Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins (default: 'False') -g <use_gpu> Enable NVIDIA runtime to run with GPUs (default: 'True') -a <gpu_devices> Comma separated list of devices to pass to 'CUDA_VISIBLE_DEVICES' (default: 'all') -p <preset> Choose preset model configuration - no ensembling (full_dbs) or 8 model ensemblings (casp14) (default: 'full_dbs')
Installation
- Version 2.0.0: Installed using a conda environment following the steps in the dockerfile available at https://github.com/deepmind/alphafold. The run_alphafold.sh bash script was obtained from https://github.com/kalininalab/alphafold_non_docker and some documentation related to this script is available at that URL.
- Version 2.0.1: Installed using EasyBuild.
- The database files are installed in /apps/db/AlphaFold/
System
64-bit Linux