CryoSPARC-Sapelo2: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
(Created page with "Category:Sapelo2Category:SoftwareCategory:Chemistry === Category === Chemistry === Program On === Sapelo2 === Version === 5, 6.1 === Author / Distributor...")
 
 
(91 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[Category:Sapelo2]][[Category:Software]][[Category:Chemistry]]
[[Category:Sapelo2]][[Category:Software]][[Category:Engineering]]
 
=== Category ===
=== Category ===


Chemistry
Engineering


=== Program On ===
=== Program On ===
Line 10: Line 11:
=== Version ===
=== Version ===
   
   
5, 6.1
4.3.1


=== Author / Distributor ===
=== Author / Distributor ===
   
   
See http://www.gaussian.com
See https://guide.cryosparc.com/
   
   
=== Description ===
=== Description ===
   
   
From http://www.gaussian.com/g_tech/gv5ref/intro.htm: "GaussView is a graphical user interface designed to help you prepare input for submission to Gaussian and to examine graphically the output that Gaussian produces. GaussView is not integrated with the computational module of Gaussian, but rather is a front-end/back-end processor to aid in the use of Gaussian." For more information, please see http://www.gaussian.com/g_prod/gv5.htm.
"CryoSPARC (Cryo-EM Single Particle Ab-Initio Reconstruction and Classification) is a state of the art HPC software solution for complete processing of single-particle cryo-electron microscopy (cryo-EM) data. CryoSPARC is useful for solving cryo-EM structures of membrane proteins, viruses, complexes, flexible molecules, small particles, phase plate data and negative stain data." For more information, please see https://guide.cryosparc.com/.


'''NOTE''': Users are required to sign a license agreement form before being allowed to run this software. Please fill out the [http://help.gacrc.uga.edu/ GACRC Support Form] to check if you have permission to use this software.
'''NOTE''': Users are required to be added into GACRC '''cryosparc''' group before they can run this software from Sapelo2. Please fill out the [https://uga.teamdynamix.com/TDClient/2060/Portal/Requests/ServiceDet?ID=25844 GACRC General Support form] to request. We will reach out to you after we received your request.


=== Running Program ===
=== Configurations ===
'''Master node VM:'''
Also refer to [[Running Jobs on Sapelo2]].


'''Version 5'''
* Host name: '''cryosparc.gacrc.uga.edu'''
GaussView 5 is installed with Gaussian 09, in /apps/eb/gaussian/09-Intel-SSE4_2/gv and /apps/eb/gaussian/09-AMD-SSE4a/gv.
* Intel Xeon processors (16 cores) and 64GB of RAM
* mongodb is installed and run on the master node
'''Worker nodes:'''
* One NVIDIA Tesla A100 node: Intel Xeon processors (64 cores), 1TB host RAM memory, 4 NVIDIA Tesla A100 GPU cards (80GiB device memory per card), and NVMe SSD 3570GB local drive.
* cryoSPARC recommends using SSD for caching particle data. /lscratch/gacrc-cryo is set up on the worker node for this purpose.
* The amount of space that cryoSPARC can use in /lscratch/gacrc-cryo is capped at 100GB.
'''cryoSPARC group:''' '''cryosparc'''


'''Version 6.1'''
'''cryoSPARC service account:''' '''gacrc-cryo'''
GaussView 6.1 is installed with Gaussian 16, in /apps/eb/gaussian/16-AVX2/gv,  /apps/eb/gaussian/16-AVX/gv, and /apps/eb/gaussian/16-SSE4/gv.


* gacrc-cryo is the '''service account''' that will run cryoSPARC workflow jobs for all cryoSPARC users.
* Some tasks can only be handled by gacrc-cryo, like make a new lane of worker node(s), user management, and connect or update worker node(s) to master, etc..
* Regular CryoSPARC users can run cryosparcm on the master node to check cryoSPARC status, using cryosparcm status or cryosparcm checkdb.


Please do not run GaussView directly on the login node. To run GaussView, please first start an interactive session using the '''xqlogin''' command, once the prompt on an interactive node is returned, you can check the processor type with the command
'''cryoSPARC project space:''' '''/scratch/gacrc-cryo'''
<pre class="gcommand">
#
head -n 6 /proc/cpuinfo
</pre>
and then load the appropriate gaussian module, source g09.profile (for gaussian 09) or g16.profile (for gaussian 16), and then start GaussView, as described below.


Note that you will need to have your SSH configured to export X to your local machine. For more information on how to run remote X-windows applications, please see [[Connecting]] and our [[Frequently Asked Questions]]. '''Note for Windows users:''' If you are using Xming, you will also need to have Xming-mesa installed.
=== How to run cryoSPARC from Sapelo2 ===


'''GAUSSVIEW 5 for GAUSSIAN 09'''
===== User login =====


'''For AMD processors:'''
User needs to establish a SSH tunnel to expose the port 39000 from the master node to a local computer. 


If your xqlogin session landed on an AMD processor, please note that Gaussian binaries optimized for AMD processors are installed in /apps/eb/gaussian/09-AMD-SSE4a/g09. To use GaussView with this version of Gaussian, please first load the gaussian/09-AMD-SSE4a module and source g09.profile with
If you are using a Linux or Apple desktop or laptop, you can use the following command in Terminal to establish the ssh tunnel: <blockquote>'''ssh -N -L 39000:10.2.0.60:39000 username@cryosparc.gacrc.uga.edu'''</blockquote>If you are using a Windows desktop or laptop, please download the '''[https://the.earth.li/~sgtatham/putty/latest/x86/plink.exe plink program]''' to use in place of the ssh client:<blockquote>'''plink -ssh -N -L 39000:10.2.0.60:39000 username@cryosparc.gacrc.uga.edu'''</blockquote>
'''Note:''' Please put the plink.exe in the current directory where you have a command window open.


<pre class="gcommand">
Unless you have SSH public key configured, you will be prompted for your MyID password and for Archpass Duo authentication. Once authentication is established, this session prompt will hang and you are ready to go to access the cryoSPARC User Interface.


module load gaussian/09-AMD-SSE4a
Once you established the ssh tunnel by running the above command, you can open a browser (Chrome) on the local machine and navigate to<code><nowiki>http://localhost:39000</nowiki></code>. The cryoSPARC User Interface should be presented with the cryoSPARC login page.


. $g09root/g09/bsd/g09.profile
===== Run cryoSPARC workflow jobs =====


gview.csh
====== Project space selection ======
</pre>
A project in cryoSPARC is a high level container corresponding with a project directory on the file system, which stores all associated Jobs of a project. Each project in cryoSPARC is entirely contained within a file system directory. All the jobs and their respective intermediate and output data created within a project will be stored within the project directory.


'''For Intel processors:'''
* When you start a new project in cryoSPARC GUI, please select and use '''/scratch/gacrc-cryo''' as the cryoSPARC project space.
* This folder is owned by gacrc-cryo. Regular cryoSPARC users have access and read permissions which allow them to browse files in this folder and copy files from this folder to their own storage spaces on Sapelo2.


If your xqlogin session landed on an Intel processor, please note that Gaussian binaries optimized for Intel processors are installed in /apps/eb/gaussian/09-Intel-SSE4_2/g09. To use GaussView with this version of Gaussian, please first load the gaussian/09-Intel-SSE4_2 module and source g09.profile with
====== Run job on the master node ======
cryoSPARC will use the master node to run some types of workflow jobs, for example, "Import Movies", "Inspect Picks", and the interactive job "Select 2D Classes". When a job is created, if cryoSPARC will use the master node to run the job, you will be notified about this in cryoSPARC GUI.  


<pre class="gcommand">
====== Run job using "Lane Sapelo2 Default (cluster)" ======


module load gaussian/09-Intel-SSE4_2
* In cryoSPARC, queue a job to "'''Lane Sapelo2 Default (cluster)'''"; The job will be dispatched to the worker node via Slurm. Please note, gacrc-cryo, instead of your own Sapelo2 user account, is the user account owning and running the job. We highly recommend you to use this method to run cryoSPARC workflow jobs on Sapelo2.
* cryoSPARC will decide on how many CPU cores and how much memory it will use to run a workflow job, depend on the type of the job and your data size. Currently, we configured that from each worker node cryoSPARC can use up to '''20GB''' memory and '''4''' CPU cores.
* If the job needs to run on GPU devices, cryoSPARC will queue the job with a default number of GPU devices , for example 1. You can change this number by yourself in cryoSPARC GUI when you create the job. Please note that the maximum number of GPU devices installed on the worker node is '''4'''.


. $g09root/g09/bsd/g09.profile
===Documentation (v4.0+)===
 
gview.csh
About cryoSPARC: https://guide.cryosparc.com/
</pre>
For more information on how to run interactive jobs, please see [https://wiki.gacrc.uga.edu/wiki/Running_Jobs_on_Sapelo2#How_to_run_an_interactive_job_with_Graphical_User_Interface_capabilities How to run an interactive job with Graphical User Interface].
 
 
'''GAUSSVIEW 6.1 for GAUSSIAN 16'''


Get Started with CryoSPARC: Introductory Tutorial: https://guide.cryosparc.com/processing-data/get-started-with-cryosparc-introductory-tutorial


We suggest using the version with AVX optimization that are installed in /apps/eb/gaussian/16-AVX/. To use GaussView with this version of Gaussian, please first load the gaussian/16-AVX module and source g16.profile with
A Tour of the CryoSPARC Interface: https://guide.cryosparc.com/application-guide-v4.0+/a-tour-of-the-cryosparc-interface


<pre class="gcommand">
Using the CryoSPARC Interface: https://guide.cryosparc.com/application-guide-v4.0+/using-the-cryosparc-interface


module load gaussian/16-AVX
Creating and Running Jobs: https://guide.cryosparc.com/application-guide-v4.0+/creating-and-running-jobs


. $g16root/g16/bsd/g16.profile
Tutorial videos: https://guide.cryosparc.com/processing-data/tutorial-videos


gview.sh
===Installation===
</pre>
To use AVX2 or SSE4 optimized code, please load gaussian/16-AVX and gaussian/16-SSE4, respectively, instead of gaussian/16-AVX2. But note that some xqlogin nodes do not support AVX2 optimization.
 
For more information on how to run interactive jobs, please see [https://wiki.gacrc.uga.edu/wiki/Running_Jobs_on_Sapelo2#How_to_run_an_interactive_job_with_Graphical_User_Interface_capabilities How to run an interactive job with Graphical User Interface].
 
 
 
If you are having trouble with GaussView starting, with errors about X11 or "OpenGL is not available", or if you are on a Mac and the GaussView window does not display properly, please try to type the command below before invoking GaussView on the interactive node:
<pre class="gcommand">
export USE_MESAGL=1
</pre>
 
=== Documentation ===
   
   
http://www.gaussian.com
*Version 4.3.1 master is installed on the master node (cryosparc.gacrc.uga.edu).  
 
*Version 4.3.1 workers are installed on one worker GPU node (NVIDIA Tesla A100 GPU node).
=== Installation ===
*Version 5 installed for Gaussian09 in /apps/eb/gaussian/09-Intel-SSE4_2/gv and /apps/eb/gaussian/09-AMD-SSE4a/gv.
 
*Version 6.1 is installed with Gaussian 16, in /apps/eb/gaussian/16-AVX2/gv, /apps/eb/gaussian/16-AVX/gv, and /apps/eb/gaussian/16-SSE4/gv.


=== System ===
===System===
64-bit Linux
64-bit Linux

Latest revision as of 14:00, 12 October 2023


Category

Engineering

Program On

Sapelo2

Version

4.3.1

Author / Distributor

See https://guide.cryosparc.com/

Description

"CryoSPARC (Cryo-EM Single Particle Ab-Initio Reconstruction and Classification) is a state of the art HPC software solution for complete processing of single-particle cryo-electron microscopy (cryo-EM) data. CryoSPARC is useful for solving cryo-EM structures of membrane proteins, viruses, complexes, flexible molecules, small particles, phase plate data and negative stain data." For more information, please see https://guide.cryosparc.com/.

NOTE: Users are required to be added into GACRC cryosparc group before they can run this software from Sapelo2. Please fill out the GACRC General Support form to request. We will reach out to you after we received your request.

Configurations

Master node VM:

  • Host name: cryosparc.gacrc.uga.edu
  • Intel Xeon processors (16 cores) and 64GB of RAM
  • mongodb is installed and run on the master node

Worker nodes:

  • One NVIDIA Tesla A100 node: Intel Xeon processors (64 cores), 1TB host RAM memory, 4 NVIDIA Tesla A100 GPU cards (80GiB device memory per card), and NVMe SSD 3570GB local drive.
  • cryoSPARC recommends using SSD for caching particle data. /lscratch/gacrc-cryo is set up on the worker node for this purpose.
  • The amount of space that cryoSPARC can use in /lscratch/gacrc-cryo is capped at 100GB.

cryoSPARC group: cryosparc

cryoSPARC service account: gacrc-cryo

  • gacrc-cryo is the service account that will run cryoSPARC workflow jobs for all cryoSPARC users.
  • Some tasks can only be handled by gacrc-cryo, like make a new lane of worker node(s), user management, and connect or update worker node(s) to master, etc..
  • Regular CryoSPARC users can run cryosparcm on the master node to check cryoSPARC status, using cryosparcm status or cryosparcm checkdb.

cryoSPARC project space: /scratch/gacrc-cryo

How to run cryoSPARC from Sapelo2

User login

User needs to establish a SSH tunnel to expose the port 39000 from the master node to a local computer.

If you are using a Linux or Apple desktop or laptop, you can use the following command in Terminal to establish the ssh tunnel:

ssh -N -L 39000:10.2.0.60:39000 username@cryosparc.gacrc.uga.edu

If you are using a Windows desktop or laptop, please download the plink program to use in place of the ssh client:

plink -ssh -N -L 39000:10.2.0.60:39000 username@cryosparc.gacrc.uga.edu

Note: Please put the plink.exe in the current directory where you have a command window open.

Unless you have SSH public key configured, you will be prompted for your MyID password and for Archpass Duo authentication. Once authentication is established, this session prompt will hang and you are ready to go to access the cryoSPARC User Interface.

Once you established the ssh tunnel by running the above command, you can open a browser (Chrome) on the local machine and navigate tohttp://localhost:39000. The cryoSPARC User Interface should be presented with the cryoSPARC login page.

Run cryoSPARC workflow jobs
Project space selection

A project in cryoSPARC is a high level container corresponding with a project directory on the file system, which stores all associated Jobs of a project. Each project in cryoSPARC is entirely contained within a file system directory. All the jobs and their respective intermediate and output data created within a project will be stored within the project directory.

  • When you start a new project in cryoSPARC GUI, please select and use /scratch/gacrc-cryo as the cryoSPARC project space.
  • This folder is owned by gacrc-cryo. Regular cryoSPARC users have access and read permissions which allow them to browse files in this folder and copy files from this folder to their own storage spaces on Sapelo2.
Run job on the master node

cryoSPARC will use the master node to run some types of workflow jobs, for example, "Import Movies", "Inspect Picks", and the interactive job "Select 2D Classes". When a job is created, if cryoSPARC will use the master node to run the job, you will be notified about this in cryoSPARC GUI.

Run job using "Lane Sapelo2 Default (cluster)"
  • In cryoSPARC, queue a job to "Lane Sapelo2 Default (cluster)"; The job will be dispatched to the worker node via Slurm. Please note, gacrc-cryo, instead of your own Sapelo2 user account, is the user account owning and running the job. We highly recommend you to use this method to run cryoSPARC workflow jobs on Sapelo2.
  • cryoSPARC will decide on how many CPU cores and how much memory it will use to run a workflow job, depend on the type of the job and your data size. Currently, we configured that from each worker node cryoSPARC can use up to 20GB memory and 4 CPU cores.
  • If the job needs to run on GPU devices, cryoSPARC will queue the job with a default number of GPU devices , for example 1. You can change this number by yourself in cryoSPARC GUI when you create the job. Please note that the maximum number of GPU devices installed on the worker node is 4.

Documentation (v4.0+)

About cryoSPARC: https://guide.cryosparc.com/

Get Started with CryoSPARC: Introductory Tutorial: https://guide.cryosparc.com/processing-data/get-started-with-cryosparc-introductory-tutorial

A Tour of the CryoSPARC Interface: https://guide.cryosparc.com/application-guide-v4.0+/a-tour-of-the-cryosparc-interface

Using the CryoSPARC Interface: https://guide.cryosparc.com/application-guide-v4.0+/using-the-cryosparc-interface

Creating and Running Jobs: https://guide.cryosparc.com/application-guide-v4.0+/creating-and-running-jobs

Tutorial videos: https://guide.cryosparc.com/processing-data/tutorial-videos

Installation

  • Version 4.3.1 master is installed on the master node (cryosparc.gacrc.uga.edu).
  • Version 4.3.1 workers are installed on one worker GPU node (NVIDIA Tesla A100 GPU node).

System

64-bit Linux