CryoSPARC-Sapelo2: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 40: Line 40:
# Regular cryoSPARC users can still run cryosparcm on the master node to check status of the master and its database, using cryosparcm status and cryosparcm checkdb
# Regular cryoSPARC users can still run cryosparcm on the master node to check status of the master and its database, using cryosparcm status and cryosparcm checkdb


'''cryoSPARC group space:''' /work/cryosparc/ . There are 6 sub-directories in /work/cryosparc/ :
'''cryoSPARC group space:''' /work/cryosparc/ , with 500GB quota limit.
 
There are 6 sub-directories in /work/cryosparc/ :


# cryosparc_master/ , cryosparc_worker/ : Master and worker installation folders
# cryosparc_master/ , cryosparc_worker/ : Master and worker installation folders
Line 53: Line 55:
=== Running cryoSPARC from Sapelo2 ===
=== Running cryoSPARC from Sapelo2 ===


===== User Login =====
===== User login =====


User needs to establish a SSH tunnel to expose the port 39000 from the master node to a local computer.   
User needs to establish a SSH tunnel to expose the port 39000 from the master node to a local computer.   
Line 66: Line 68:
Please refer to https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/accessing-cryosparc
Please refer to https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/accessing-cryosparc


===== Computing Resource for cryoSPARC =====
===== How to run cryoSPARC workflow jobs =====
 
====== Project space selection ======
Project in cryoSPARC is a high level container corresponding with a project directory on the file system, which stores all associated Jobs of a project. Each project in cryoSPARC is entirely contained within a file system directory. All the jobs and their respective intermediate and output data created within a project will be stored within the project directory
 
# In group space /work/cryosparc/, for each '''cryoSPARC''' user, a "default" user project folder has been created at /work/cryosparc/users/<username>. It is not suitable if you have a big data to process since /work/cryosparc/ has 500GB quota limit.
 
====== '''Launch job on the mater node''' ======
cryoSPARC will decide on its own to run some types of workflow jobs on the master node, like "Import Movies", "Inspect Picks", and the interactive job "Select 2D Classes".


====== '''Launch job via Slurm (recommended)''' ======


====== '''Launch and run job on a worker node''' ======


===Documentation===
===Documentation===

Revision as of 17:25, 16 December 2021

Category

Engineering

Program On

Sapelo2

Version

3.3.1

Author / Distributor

See https://guide.cryosparc.com/

Description

"CryoSPARC (Cryo-EM Single Particle Ab-Initio Reconstruction and Classification) is a state of the art HPC software solution for complete processing of single-particle cryo-electron microscopy (cryo-EM) data. CryoSPARC is useful for solving cryo-EM structures of membrane proteins, viruses, complexes, flexible molecules, small particles, phase plate data and negative stain data." For more information, please see https://guide.cryosparc.com/.

NOTE: Users are required to be added into GACRC cryosparc group before being allowed to run this software. Please fill out the GACRC General Support form to request. We will reach out to you once we received your request.

Configurations

Master node VM:

  1. Host name: ss-cryo.gacrc.uga.edu
  2. Intel Xeon processors (8 cores and 24GB of RAM)
  3. mongodb is installed

Worker nodes:

  1. Two NVIDIA Tesla K40m nodes, Intel Xeon processors (16 cores and 128GB of RAM) and 8 NVIDIA K40m GPU cards per node.
  2. cryoSPARC recommends to use SSD or caching particle data. /lscratch/gacrc-cryo was set up on worker nodes for this purpose.
  3. The amount of space that cryoSPARC can use in /lscratch/gacrc-cryo is capped at 100GB.

cryoSPARC group: cryosparc

cryoSPARC service account: gacrc-cryo

  1. gacrc-cryo is the service user account that will run the cryoSPARC workflow jobs for all regular cryoSPARC users on the master node and each worker node that will be used for computation.
  2. Some tasks can only be performed by gacrc-cryo, like start or stop cryosparcm from the master node, user management, connect or update worker nodes to master, etc.
  3. Regular cryoSPARC users can still run cryosparcm on the master node to check status of the master and its database, using cryosparcm status and cryosparcm checkdb

cryoSPARC group space: /work/cryosparc/ , with 500GB quota limit.

There are 6 sub-directories in /work/cryosparc/ :

  1. cryosparc_master/ , cryosparc_worker/ : Master and worker installation folders
  2. database/ : cryoSPARC database folder
  3. users/ : cryoSPARC user project folder
  4. cryosparc_cluster/ : The folder storing cluster integration scripts
  5. testdataset/ : The folder storing cryoSPARC test data
  6. src_v3.3.1/ : The folder storing cryoSPARC sources v3.3.1

Running cryoSPARC from Sapelo2

User login

User needs to establish a SSH tunnel to expose the port 39000 from the master node to a local computer.

If you are using a Linux or Apple desktop or laptop, you can use the following command in Terminal to establish the ssh tunnel:

ssh -N -L 39000:128.192.75.59:39000 <username>@ss-cryo.gacrc.uga.edu

If you are using a Windows desktop or laptop, please download the plink program to use in place of the ssh client:

plink -ssh -N -L 39000:128.192.75.59:39000 <username>@ss-cryo.gacrc.uga.edu

Note: Please put the plink.exe in the current directory where you have a command window open.

Unless you have SSH public key configured, you will be prompted for your MyID password and for Archpass Duo authentication. Once authentication is established, this session prompt will hang and you are ready to go to access the cryoSPARC User Interface.

Once you established the ssh tunnel by running the above command, you can open a browser (Chrome) on the local machine and navigate tohttp://localhost:39000. The cryoSPARC User Interface should be presented with the cryoSPARC login page.

Please refer to https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/accessing-cryosparc

How to run cryoSPARC workflow jobs
Project space selection

Project in cryoSPARC is a high level container corresponding with a project directory on the file system, which stores all associated Jobs of a project. Each project in cryoSPARC is entirely contained within a file system directory. All the jobs and their respective intermediate and output data created within a project will be stored within the project directory

  1. In group space /work/cryosparc/, for each cryoSPARC user, a "default" user project folder has been created at /work/cryosparc/users/<username>. It is not suitable if you have a big data to process since /work/cryosparc/ has 500GB quota limit.
Launch job on the mater node

cryoSPARC will decide on its own to run some types of workflow jobs on the master node, like "Import Movies", "Inspect Picks", and the interactive job "Select 2D Classes".

Launch job via Slurm (recommended)
Launch and run job on a worker node

Documentation

About cryoSPARC: https://guide.cryosparc.com/

User Interface and Usage Guide: https://guide.cryosparc.com/processing-data/user-interface-and-usage-guide

Accessing the cryoSPARC User Interface https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/accessing-cryosparc

All Job Types in cryoSPARC: https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc

Management and Monitoring: https://guide.cryosparc.com/setup-configuration-and-management/management-and-monitoring

Cluster (Slurm) integration: https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/downloading-and-installing-cryosparc#connect-a-cluster-to-cryosparc

Introductory Tutorial: https://guide.cryosparc.com/processing-data/cryo-em-data-processing-in-cryosparc-introductory-tutorial

Tutorials and Usage Guides: https://guide.cryosparc.com/processing-data/tutorials-and-case-studies

Installation

  • Version 3.3.1 master is installed on the master node (ss-cryo.gacrc.uga.edu). Source codes are downloaded in /work/cryosparc/cryosparc_master on the master node.
  • Version 3.3.1 workers are installed on two worker nodes (NVIDIA Tesla K40m GPU nodes rb6-[3-4]). Source codes are downloaded in /work/cryosparc/cryosparc_worker on the master ndoe.

System

64-bit Linux