CryoSPARC-Sapelo2
This page is under construction by GACRC team 2023-09-26
Category
Engineering
Program On
Sapelo2
Version
3.3.1
Author / Distributor
See https://guide.cryosparc.com/
Description
"CryoSPARC (Cryo-EM Single Particle Ab-Initio Reconstruction and Classification) is a state of the art HPC software solution for complete processing of single-particle cryo-electron microscopy (cryo-EM) data. CryoSPARC is useful for solving cryo-EM structures of membrane proteins, viruses, complexes, flexible molecules, small particles, phase plate data and negative stain data." For more information, please see https://guide.cryosparc.com/.
NOTE: Users are required to be added into GACRC cryosparc group before they can run this software from Sapelo2. Please fill out the GACRC General Support form to request. We will reach out to you after we received your request.
Configurations
Master node VM:
- Host name: ss-cryo.gacrc.uga.edu
- Intel Xeon processors (8 cores) and 24GB of RAM
- mongodb is installed and run from the master node
Worker nodes:
- Two NVIDIA Tesla K40m nodes, Intel Xeon processors (16 cores and 128GB of RAM) and 8 NVIDIA K40m GPU cards per node.
- cryoSPARC recommends using SSD for caching particle data. /lscratch/gacrc-cryo is set up on worker nodes for this purpose.
- The amount of space that cryoSPARC can use in /lscratch/gacrc-cryo is capped at 100GB.
cryoSPARC group: cryosparc
cryoSPARC service account: gacrc-cryo
- gacrc-cryo is the service account that will run cryoSPARC workflow jobs for all cryoSPARC users.
- Some tasks can only be handled by gacrc-cryo, like start or stop cryosparcm from the master node, user management, connect or update worker nodes to master, etc..
- Regular CryoSPARC users can run cryosparcm from the master node to check cryoSPARC status, using cryosparcm status and cryosparcm checkdb.
cryoSPARC project space: /scratch/gacrc-cryo
How to run cryoSPARC from Sapelo2
User login
User needs to establish a SSH tunnel to expose the port 39000 from the master node to a local computer.
If you are using a Linux or Apple desktop or laptop, you can use the following command in Terminal to establish the ssh tunnel:
ssh -N -L 39000:128.192.75.59:39000 username@ss-cryo.gacrc.uga.edu
If you are using a Windows desktop or laptop, please download the plink program to use in place of the ssh client:
plink -ssh -N -L 39000:128.192.75.59:39000 username@ss-cryo.gacrc.uga.edu
Note: Please put the plink.exe in the current directory where you have a command window open.
Unless you have SSH public key configured, you will be prompted for your MyID password and for Archpass Duo authentication. Once authentication is established, this session prompt will hang and you are ready to go to access the cryoSPARC User Interface.
Once you established the ssh tunnel by running the above command, you can open a browser (Chrome) on the local machine and navigate tohttp://localhost:39000
. The cryoSPARC User Interface should be presented with the cryoSPARC login page.
Run cryoSPARC workflow jobs
Project space selection
A project in cryoSPARC is a high level container corresponding with a project directory on the file system, which stores all associated Jobs of a project. Each project in cryoSPARC is entirely contained within a file system directory. All the jobs and their respective intermediate and output data created within a project will be stored within the project directory.
- When you start a new project in cryoSPARC GUI, please select and use /scratch/gacrc-cryo as the cryoSPARC project space.
- This folder is owned by gacrc-cryo. Regular cryoSPARC users have access and read permissions which allow them to browse files in this folder and copy files from this folder to their own storage spaces on Sapelo2.
Run job on the master node
cryoSPARC will use the master node to run some types of workflow jobs, for example, "Import Movies", "Inspect Picks", and the interactive job "Select 2D Classes". When a job is created, if cryoSPARC will use the master node to run the job, you will be notified about this in cryoSPARC GUI.
Run job using "Lane Sapelo2 (cluster)" (highly recommended)
- In cryoSPARC, queue a job to "Lane Sapelo2 (cluster)"; The job will be dispatched to a worker node via Slurm. Please note, gacrc-cryo, instead of your own Sapelo2 user account, is the user account owning and running the job. We highly recommend you to use this method to run cryoSPARC workflow jobs on Sapelo2.
- cryoSPARC will decide on how many CPU cores and how much memory it will use to run a workflow job, depend on the type of the job and your data size. Currently, we configured that from each worker node cryoSPARC can use up to 20GB memory and 4 CPU cores.
- If the job needs to run on GPU devices, cryoSPARC will queue the job with a default number of GPU devices , for example 1 or 4. You can change this number by yourself in cryoSPARC GUI when you create the job. Please note that the maximum number of GPU devices installed on worker node is 8.
Run job using "Lane default (node)"
You can select and use a worker node to run cryoSPARC workflow jobs. This method is convenient, for example, when you want to debug your cryoSPARC workflow. However, if you want to use this method, you are required to reserve a whole worker node at first via Sapelo2 queueing system.
Please follow the following steps carefully:
- From the login node, open an interactive session to reserve a worker node via Slurm: interact -p gpu_p --gres gpu:K40:8 -c 16 --mem 100gb . Please use options "--gres gpu:K40:8 -c 16 --mem 100gb" to reserve a whole node.
- Once an interactive session is opened on a worker node, please run this command to know its short hostname: hostname -s
- In cryoSPARC, queue a job to "Lane default (node)" , then click "Run on Specific GPU" to select the worker node and the GPU device(s) that you reserved in step 1 (rb6-3 or rb6-4). Please do this step carefully. Please do NOT select and use the other worker node that you didn't reserve in step 1. If you select and use GPU device(s) from a wrong node, GPU device conflict errors could happen because some other users' jobs could be running on the same node and using the same GPU device(s) you selected.
- Since you reserved a whole worker node, you can select and use up to 8 GPU devices.
After you killed and cleared a job, if you want to restart it in cryoSPARC, please do NOT click and use "Queue Job on default" listed in "ACTIONS" to restart the job. We found that "Queue Job on default" will always put the job to rb6-3 worker node by default and we cannot change this default behavior of cryoSPARC.
After you killed and cleared a job, if you want to restart it in cryoSPARC, please go back to your current Workplace in cryoSPARC GUI and click "Building" in the job card, then click "Queue" in the right-bottom corner of cryoSPARC GUI to re-queue the job, as shown in Figure 1 (1 is for "Building"; 2 is for "Queue"). Then follow instructions in Run job using "Lane Sapelo2 (cluster)" or Run job using "Lane default (node)" to rerun your job.
Documentation
About cryoSPARC: https://guide.cryosparc.com/
User Interface and Usage Guide: https://guide.cryosparc.com/processing-data/user-interface-and-usage-guide
Accessing the cryoSPARC User Interface https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/accessing-cryosparc
All Job Types in cryoSPARC: https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc
Management and Monitoring: https://guide.cryosparc.com/setup-configuration-and-management/management-and-monitoring
Cluster (Slurm) integration: https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/downloading-and-installing-cryosparc#connect-a-cluster-to-cryosparc
Introductory Tutorial: https://guide.cryosparc.com/processing-data/cryo-em-data-processing-in-cryosparc-introductory-tutorial
Tutorials and Usage Guides: https://guide.cryosparc.com/processing-data/tutorials-and-case-studies
Installation
- Version 3.3.1 master is installed on the master node (ss-cryo.gacrc.uga.edu). Source codes are downloaded in /work/cryosparc/cryosparc_master on the master node.
- Version 3.3.1 workers are installed on two worker nodes (NVIDIA Tesla K40m GPU nodes rb6-[3-4]). Source codes are downloaded in /work/cryosparc/cryosparc_worker on the master ndoe.
System
64-bit Linux