CryoSPARC-Sapelo2: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 36: Line 36:
'''cryoSPARC service account:''' '''gacrc-cryo'''  
'''cryoSPARC service account:''' '''gacrc-cryo'''  


* gacrc-cryo is the '''service account''' that will run the cryoSPARC workflow jobs for all cryoSPARC users.
* gacrc-cryo is the '''service account''' that will run cryoSPARC workflow jobs for all cryoSPARC users.
* Some tasks can only be handled by gacrc-cryo, like start or stop cryosparcm from the master node, user management, connect or update worker nodes to master, etc..
* Some tasks can only be handled by gacrc-cryo, like start or stop cryosparcm from the master node, user management, connect or update worker nodes to master, etc..
* Regular cryoSPARC users can run cryosparcm from the master node to check cryoSPARC status, using cryosparcm status and cryosparcm checkdb.
* Regular cryoSPARC users can run cryosparcm from the master node to check cryoSPARC status, using cryosparcm status and cryosparcm checkdb.


'''cryoSPARC project working space:''' '''/scratch/gacrc-cryo/'''
'''cryoSPARC project space:''' '''/scratch/gacrc-cryo'''
 
* This space is owned by gacrc-cryo.
* Regular cryoSPARC users have access and read permissions allowing them to browse in this folder and copy files from this folder to their own storage spaces on Sapelo2.
#  
#  


Line 61: Line 58:
Please refer to https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/accessing-cryosparc
Please refer to https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/accessing-cryosparc


===== How to run cryoSPARC workflow jobs =====
===== Run cryoSPARC workflow jobs =====


====== Project space selection ======
====== Project space selection ======
A project in cryoSPARC is a high level container corresponding with a project directory on the file system, which stores all associated Jobs of a project. Each project in cryoSPARC is entirely contained within a file system directory. All the jobs and their respective intermediate and output data created within a project will be stored within the project directory
A project in cryoSPARC is a high level container corresponding with a project directory on the file system, which stores all associated Jobs of a project. Each project in cryoSPARC is entirely contained within a file system directory. All the jobs and their respective intermediate and output data created within a project will be stored within the project directory.
 
* You can use the scratch space to run a project with large data '''(recommended)'''. Steps to set up a project folder, for example '''cryo_project/''', in your scratch space are shown below:
*# cd /scratch/username
*# mkdir ./cryo_project
*# chgrp cryosparc ./cryo_project
*# chmod g+rwx ./cryo_project
*# chmod o+rx /scratch/username


* gacrc-cryo is the service user account to launch and run jobs for all regular cryoSPARC users. The above steps 3, 4, and 5 will enable gacrc-cryo to write/read into/from /scratch/username/cryo_project. Once a project is completed, we suggest that you turn off the rx permission on your scratch folder by '''chmod o-rx /scratch/username'''
* When you start a new project in cryoSPARC GUI, please select and use '''/scratch/gacrc-cryo''' as the cryoSPARC project space.
* When you start a new project in cryoSPARC, please select the appropriate path to your project space.
* This folder is owned by gacrc-cryo. Regular cryoSPARC users have access and read permissions which allow them to browse files in this folder and copy files from this folder to their own storage spaces on Sapelo2.


====== Run cryoSPARC job using the master node ======
====== Run job on the master node ======
cryoSPARC will decide on its own to run some types of workflow jobs on the master node, like "Import Movies", "Inspect Picks", and the interactive job "Select 2D Classes". It will notify you that the job will be running on the master node when job is created.  
cryoSPARC will use the master node to run some types of workflow jobs, like "Import Movies", "Inspect Picks", and the interactive job "Select 2D Classes", etc.. It will notify you that the job will be running on the master node when the job is created.  


====== Run cryoSPARC job using Lane Sapelo2 (cluster)  (highly recommended) ======
====== Run job using "Lane Sapelo2 (cluster)" (highly recommended) ======


* In cryoSPARC, queue a job to "'''Lane Sapelo2 (cluster)'''"; The job will be dispatched to a worker node via Slurm and running as a batch job on Sapelo2. Please note, the user account running the job will be '''gacrc-cryo''', which is the cryoSPARC service account. We highly recommend you to use this method to run cryoSPARC workflow jobs on Sapelo2.
* In cryoSPARC, queue a job to "'''Lane Sapelo2 (cluster)'''"; The job will be dispatched to a worker node via Slurm. Please note, gacrc-cryo, instead of your own Sapelo2 user name, is the user name running the job. We highly recommend you to use this method to run cryoSPARC workflow jobs on Sapelo2.
* cryoSPARC will decide on its own on how many CPU cores and how much memory it will use to run a workflow job, depend on the type of the job and your data size. At this moment, we give a fixed amount of '''64GB''' memory on each worker node for cryoSPARC to use.
* cryoSPARC will decide on how many CPU cores and how much memory it will use to run a workflow job, depend on the type of the job and your data size. Currently, we give a fixed amount of '''64GB''' memory on each worker node for cryoSPARC to use.
* If the job needs to run on GPU devices, cryoSPARC will queue the job with a default number of GPU devices for you, for example 1 or 4. You can change this number by yourself. Please note that the maximum number of GPU devices on each worker node is '''8'''.
* If the job needs to run on GPU devices, cryoSPARC will queue the job with a default number of GPU devices , for example 1 or 4. You can change this number by yourself. Please note that the maximum number of GPU devices on each worker node is '''8'''.


====== Run cryoSPARC job using Lane default (node) ======
====== Run job using "Lane default (node)" ======
You can pick a worker node to run cryoSPARC workflow jobs. This method is a convenient way to run cryoSPARC. The downside is that you need to reserve a worker node at first before you pick and use it in cryoSPARC.  
You can select and use a worker node to run cryoSPARC workflow jobs. This method is a convenient way to run your workflow jobs. The downside is that you need to reserve a worker node at first before you select and use it.  


'''Please follow the following steps carefully:'''
'''Please follow the following steps carefully:'''
Line 92: Line 82:
# From the login node, open an interactive session to reserve a whole worker node via Slurm, for example, '''interact -p gpu_p --gres gpu:K40:8 -c 16 . ''Please request a whole node with the options "--gres gpu:K40:8" and "-c 16".'''''
# From the login node, open an interactive session to reserve a whole worker node via Slurm, for example, '''interact -p gpu_p --gres gpu:K40:8 -c 16 . ''Please request a whole node with the options "--gres gpu:K40:8" and "-c 16".'''''
# Once an interactive session is opened on a worker node, please run this command to know its short hostname: '''hostname -s'''
# Once an interactive session is opened on a worker node, please run this command to know its short hostname: '''hostname -s'''
# In cryoSPARC, firstly queue a job to "'''Lane default (node)'''" , then click "'''Run on Specific GPU'''" to select the worker node that you reserved in step 1 (rb6-3 or rb6-4 which is the hostname you learned in step 2). '''''Please perform this step very carefully and do NOT select the other worker node which you didn't reserve in step 1!'''''
# In cryoSPARC, firstly queue a job to "'''Lane default (node)'''" , then click "'''Run on Specific GPU'''" to select the worker node that you reserved in step 1 (rb6-3 or rb6-4, the hostname you get in step 2). '''''Please do this step carefully and do NOT select and use the other worker node which you didn't reserve in step 1. If you selected and used a wrong node which you didn't reserve in step 1, the node could be crashed and all jobs running on the node will be lost.'''''
# cryoSPARC will decide on its own on how many CPU cores and how much memory it will use to run a workflow job, depend on the type of the job and your data size.  
# cryoSPARC will decide on how many CPU cores and how much memory it will use to run a workflow job, depend on the type of the job and your data size.
# If the job needs to run on GPU devices, cryoSPARC will queue the job with a default number of GPU devices for you, for example 1 or 4. You can change this number by yourself up to '''8'''.
# If the job needs to run on GPU devices, cryoSPARC will queue the job with a default number of GPU devices for you, for example 1 or 4. You can change this number by yourself up to '''8'''.



Revision as of 15:57, 15 February 2022

ategory

Engineering

Program On

Sapelo2

Version

3.3.1

Author / Distributor

See https://guide.cryosparc.com/

Description

"CryoSPARC (Cryo-EM Single Particle Ab-Initio Reconstruction and Classification) is a state of the art HPC software solution for complete processing of single-particle cryo-electron microscopy (cryo-EM) data. CryoSPARC is useful for solving cryo-EM structures of membrane proteins, viruses, complexes, flexible molecules, small particles, phase plate data and negative stain data." For more information, please see https://guide.cryosparc.com/.

NOTE: Users are required to be added into GACRC cryosparc group before being allowed to run this software. Please fill out the GACRC General Support form to request. We will reach out to you once we received your request.

Configurations

Master node VM:

  • Host name: ss-cryo.gacrc.uga.edu
  • Intel Xeon processors (8 cores) and 24GB of RAM
  • mongodb is installed and run from the master node

Worker nodes:

  • Two NVIDIA Tesla K40m nodes, Intel Xeon processors (16 cores and 128GB of RAM) and 8 NVIDIA K40m GPU cards per node.
  • cryoSPARC recommends using SSD for caching particle data. /lscratch/gacrc-cryo is set up on worker nodes for this purpose.
  • The amount of space that cryoSPARC can use in /lscratch/gacrc-cryo is capped at 100GB.

cryoSPARC group: cryosparc

cryoSPARC service account: gacrc-cryo

  • gacrc-cryo is the service account that will run cryoSPARC workflow jobs for all cryoSPARC users.
  • Some tasks can only be handled by gacrc-cryo, like start or stop cryosparcm from the master node, user management, connect or update worker nodes to master, etc..
  • Regular cryoSPARC users can run cryosparcm from the master node to check cryoSPARC status, using cryosparcm status and cryosparcm checkdb.

cryoSPARC project space: /scratch/gacrc-cryo

How to run cryoSPARC from Sapelo2

User login

User needs to establish a SSH tunnel to expose the port 39000 from the master node to a local computer.

If you are using a Linux or Apple desktop or laptop, you can use the following command in Terminal to establish the ssh tunnel:

ssh -N -L 39000:128.192.75.59:39000 username@ss-cryo.gacrc.uga.edu

If you are using a Windows desktop or laptop, please download the plink program to use in place of the ssh client:

plink -ssh -N -L 39000:128.192.75.59:39000 username@ss-cryo.gacrc.uga.edu

Note: Please put the plink.exe in the current directory where you have a command window open.

Unless you have SSH public key configured, you will be prompted for your MyID password and for Archpass Duo authentication. Once authentication is established, this session prompt will hang and you are ready to go to access the cryoSPARC User Interface.

Once you established the ssh tunnel by running the above command, you can open a browser (Chrome) on the local machine and navigate tohttp://localhost:39000. The cryoSPARC User Interface should be presented with the cryoSPARC login page.

Please refer to https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/accessing-cryosparc

Run cryoSPARC workflow jobs
Project space selection

A project in cryoSPARC is a high level container corresponding with a project directory on the file system, which stores all associated Jobs of a project. Each project in cryoSPARC is entirely contained within a file system directory. All the jobs and their respective intermediate and output data created within a project will be stored within the project directory.

  • When you start a new project in cryoSPARC GUI, please select and use /scratch/gacrc-cryo as the cryoSPARC project space.
  • This folder is owned by gacrc-cryo. Regular cryoSPARC users have access and read permissions which allow them to browse files in this folder and copy files from this folder to their own storage spaces on Sapelo2.
Run job on the master node

cryoSPARC will use the master node to run some types of workflow jobs, like "Import Movies", "Inspect Picks", and the interactive job "Select 2D Classes", etc.. It will notify you that the job will be running on the master node when the job is created.

Run job using "Lane Sapelo2 (cluster)" (highly recommended)
  • In cryoSPARC, queue a job to "Lane Sapelo2 (cluster)"; The job will be dispatched to a worker node via Slurm. Please note, gacrc-cryo, instead of your own Sapelo2 user name, is the user name running the job. We highly recommend you to use this method to run cryoSPARC workflow jobs on Sapelo2.
  • cryoSPARC will decide on how many CPU cores and how much memory it will use to run a workflow job, depend on the type of the job and your data size. Currently, we give a fixed amount of 64GB memory on each worker node for cryoSPARC to use.
  • If the job needs to run on GPU devices, cryoSPARC will queue the job with a default number of GPU devices , for example 1 or 4. You can change this number by yourself. Please note that the maximum number of GPU devices on each worker node is 8.
Run job using "Lane default (node)"

You can select and use a worker node to run cryoSPARC workflow jobs. This method is a convenient way to run your workflow jobs. The downside is that you need to reserve a worker node at first before you select and use it.

Please follow the following steps carefully:

  1. From the login node, open an interactive session to reserve a whole worker node via Slurm, for example, interact -p gpu_p --gres gpu:K40:8 -c 16 . Please request a whole node with the options "--gres gpu:K40:8" and "-c 16".
  2. Once an interactive session is opened on a worker node, please run this command to know its short hostname: hostname -s
  3. In cryoSPARC, firstly queue a job to "Lane default (node)" , then click "Run on Specific GPU" to select the worker node that you reserved in step 1 (rb6-3 or rb6-4, the hostname you get in step 2). Please do this step carefully and do NOT select and use the other worker node which you didn't reserve in step 1. If you selected and used a wrong node which you didn't reserve in step 1, the node could be crashed and all jobs running on the node will be lost.
  4. cryoSPARC will decide on how many CPU cores and how much memory it will use to run a workflow job, depend on the type of the job and your data size.
  5. If the job needs to run on GPU devices, cryoSPARC will queue the job with a default number of GPU devices for you, for example 1 or 4. You can change this number by yourself up to 8.

Documentation

About cryoSPARC: https://guide.cryosparc.com/

User Interface and Usage Guide: https://guide.cryosparc.com/processing-data/user-interface-and-usage-guide

Accessing the cryoSPARC User Interface https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/accessing-cryosparc

All Job Types in cryoSPARC: https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc

Management and Monitoring: https://guide.cryosparc.com/setup-configuration-and-management/management-and-monitoring

Cluster (Slurm) integration: https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/downloading-and-installing-cryosparc#connect-a-cluster-to-cryosparc

Introductory Tutorial: https://guide.cryosparc.com/processing-data/cryo-em-data-processing-in-cryosparc-introductory-tutorial

Tutorials and Usage Guides: https://guide.cryosparc.com/processing-data/tutorials-and-case-studies

Installation

  • Version 3.3.1 master is installed on the master node (ss-cryo.gacrc.uga.edu). Source codes are downloaded in /work/cryosparc/cryosparc_master on the master node.
  • Version 3.3.1 workers are installed on two worker nodes (NVIDIA Tesla K40m GPU nodes rb6-[3-4]). Source codes are downloaded in /work/cryosparc/cryosparc_worker on the master ndoe.

System

64-bit Linux