Software on Sapelo2: Difference between revisions
No edit summary |
|||
(31 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
=Introduction= | =Introduction= | ||
On | On Sapelo2 users have the option to [https://wiki.gacrc.uga.edu/wiki/Installing_Applications_on_Sapelo2 install their own software] or use software installed centrally on Sapelo2 by the GACRC staff. Centrally-installed software on Sapelo2 will typically be in one of two formats: a software module or a Singularity container. Outlined here is an explanation of each of both categories of software and how to use them. | ||
=Getting Help Using Software= | =Getting Help Using Software= | ||
Line 10: | Line 10: | ||
<nowiki> | <nowiki> | ||
bc06026@ra4-2 ~$ | bc06026@ra4-2 ~$ ml Bowtie2/2.4.5-GCC-11.3.0 | ||
bc06026@ra4-2 ~$ | bc06026@ra4-2 ~$ | ||
bc06026@ra4-2 ~$ bowtie2 --help | bc06026@ra4-2 ~$ bowtie2 --help | ||
Bowtie 2 version 2.4. | Bowtie 2 version 2.4.5 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea) | ||
Usage: | Usage: | ||
bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r> | --interleaved <i> | -b <bam>} [-S <sam>] | bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r> | --interleaved <i> | -b <bam>} [-S <sam>] | ||
Line 31: | Line 31: | ||
-h/--help print this usage message | -h/--help print this usage message | ||
</nowiki> | </nowiki> | ||
In the above help output for the Bowtie2 software, we see example usage, and some of the optional parameters that you can specify when calling bowtie2. If you're not sure what the name of the compiled binary to call is, it very often will be the name of the software you're using in lowercase (but not always). You can verify this by checking online documentation, or by loading the software and then starting to type what may be the appropriate executable, and then hitting the tab key twice on your keyboard, which would auto-complete the rest of the executable name if it is valid. Hitting tab three times in succession would show you if there are any similarly-named executables of the software with the same name. For example, this output is shown after loading the Bowtie2/2.4. | In the above help output for the Bowtie2 software, we see example usage, and some of the optional parameters that you can specify when calling bowtie2. If you're not sure what the name of the compiled binary to call is, it very often will be the name of the software you're using in lowercase (but not always). You can verify this by checking online documentation, or by loading the software and then starting to type what may be the appropriate executable, and then hitting the tab key twice on your keyboard, which would auto-complete the rest of the executable name if it is valid. Hitting tab three times in succession would show you if there are any similarly-named executables of the software with the same name. For example, this output is shown after loading the Bowtie2/2.4.5-GCC-11.3.0 module and typing "bow" and then hitting tab three times: | ||
<nowiki> | <nowiki> | ||
Line 42: | Line 42: | ||
=Software Modules= | =Software Modules= | ||
The majority of software centrally installed on | The majority of software centrally installed on Sapelo2 is installed in the form of a '''software module'''. A software module is a grouping of some software and dependencies. By leveraging software modules, you gain access to only the software you need when you need it. This is achieved by modifying your PATH environmental variable as well as creating other environment variables. | ||
Software modules are in the format '''Name/Version-Toolchain'''. A toolchain is a collection of ancillary software discussed further [https://wiki.gacrc.uga.edu/wiki/ | Software modules are in the format '''Name/Version-Toolchain'''. A toolchain is a collection of ancillary software discussed further [https://wiki.gacrc.uga.edu/wiki/Available_Toolchains_and_Toolchain_Compatibility here]. | ||
Here are some examples of Python modules on | Here are some examples of Python modules on Sapelo2: | ||
<nowiki>Python/2.7. | <nowiki>Python/2.7.18-GCCcore-11.3.0 | ||
Python/3. | Python/3.9.6-GCCcore-11.2.0 | ||
Python/3. | Python/3.10.4-GCCcore-11.3.0</nowiki> | ||
Here is an example of how the path and version of the command "python" changes after a module is loaded: | Here is an example of how the path and version of the command "python" changes after a module is loaded: | ||
<nowiki>bc06026@ra4-2 ~$ which python | <nowiki>bc06026@ra4-2 ~$ which python | ||
/usr/bin/python | /usr/bin/which: no python in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/gacrc/bin:/opt/apps/slurm/prod/bin:) | ||
bc06026@ra4-2 ~$ | bc06026@ra4-2 ~$ which python3 | ||
Python | /usr/bin/python3 | ||
bc06026@ra4-2 ~$ | bc06026@ra4-2 ~$python3 -V | ||
Python 3.6.8 | |||
bc06026@ra4-2 ~$ ml Python/3.10.4-GCCcore-11.3.0 | |||
bc06026@ra4-2 ~$ which python | bc06026@ra4-2 ~$ which python | ||
/apps/eb/Python/3. | /apps/eb/Python/3.10.4-GCCcore-11.3.0/bin/python | ||
bc06026@ra4-2 ~$ python -V | bc06026@ra4-2 ~$ python -V | ||
Python 3. | Python 3.10.4 | ||
bc06026@ra4-2 ~$ </nowiki> | bc06026@ra4-2 ~$ </nowiki> | ||
Each software module defines an environment variable called EBROOT''NAME'', where ''NAME'' is the name of the application, in capital letters (e.g. EBROOTPYTHON for Python). This environment variable stores the full path to the directory where the application is installed and it can be used to used to invoke binaries that need to be called with their full path. | |||
===Software Module Commands=== | ===Software Module Commands=== | ||
Software modules are managed with <code> | Software modules are managed with <code>ml</code> commands. Here are some useful <code>ml</code> commands: | ||
* <code> | * <code>ml spider ''pattern''</code> - search for available software, e.g., <code>ml spider Python</code> | ||
* <code> | * <code>ml ''moduleName''</code> – load a software module for use, e.g., <code>ml Python/3.10.4-GCCcore-11.3.0</code> | ||
* <code> | * <code>ml -''moduleName''</code> – unload a software module, e.g., <code>ml -Python/3.10.4-GCCcore-11.3.0</code> | ||
* <code> | * <code>ml</code> – list currently loaded software modules | ||
* <code> | * <code>ml show ''moduleName''</code> - show detailed information about a software module, including its description and homepage URL | ||
* <code> | * <code>ml av</code> – list all available software modules | ||
===Using Software Modules=== | ===Using Software Modules=== | ||
When you first log into | When you first log into Sapelo2, you will not have any modules loaded. There are two scenarios in which you would <code>ml</code> a software module, in a submission script and in an interactive job session. You may search for modules on the login node with the <code>ml spider</code> command, but please '''never''' load software modules from the login node. | ||
Here is an example of loading a software module in a submission script: | Here is an example of loading a software module in a submission script: | ||
Line 86: | Line 92: | ||
#SBATCH --job-name=testserial # Job name | #SBATCH --job-name=testserial # Job name | ||
#SBATCH --partition=batch # Partition (queue) name | #SBATCH --partition=batch # Partition (queue) name | ||
#SBATCH --ntasks=1 # Run on a single CPU | #SBATCH --ntasks=1 # Run on a single CPU core | ||
#SBATCH --mem=1gb # Job memory request | #SBATCH --mem=1gb # Job memory request | ||
#SBATCH --time=02:00:00 # Time limit hrs:min:sec | #SBATCH --time=02:00:00 # Time limit hrs:min:sec | ||
Line 97: | Line 103: | ||
cd $SLURM_SUBMIT_DIR | cd $SLURM_SUBMIT_DIR | ||
ml R/4.3.1-foss-2022a | |||
R CMD BATCH add.R</pre> | R CMD BATCH add.R</pre> | ||
Line 103: | Line 109: | ||
Note that the module is loaded ''after'' the Slurm headers (#SBATCH lines), but ''before'' calling the software, which in this case is R. This is important as the Slurm headers need to come first, and the proper version of the software to be used must be loaded prior to being called. | Note that the module is loaded ''after'' the Slurm headers (#SBATCH lines), but ''before'' calling the software, which in this case is R. This is important as the Slurm headers need to come first, and the proper version of the software to be used must be loaded prior to being called. | ||
Here is an example of a module being loaded in an interactive session on | Here is an example of a module being loaded in an interactive session on Sapelo2: | ||
<nowiki> | <nowiki> | ||
bc06026@ra4-2 ~$ | bc06026@ra4-2 ~$ ml | ||
No modules loaded | No modules loaded | ||
bc06026@ra4-2 ~$ | bc06026@ra4-2 ~$ ml Python/3.10.4-GCCcore-11.3.0 | ||
bc06026@ra4-2 ~$ | bc06026@ra4-2 ~$ ml | ||
Currently Loaded Modules: | Currently Loaded Modules: | ||
1) GCCcore/ | 1) GCCcore/11.3.0 5) ncurses/6.3-GCCcore-11.3.0 9) XZ/5.2.5-GCCcore-11.3.0 13) Python/3.10.4-GCCcore-11.3.0 | ||
2) zlib/1.2. | 2) zlib/1.2.12-GCCcore-11.3.0 6) libreadline/8.1.2-GCCcore-11.3.0 10) GMP/6.2.1-GCCcore-11.3.0 | ||
3) binutils/2. | 3) binutils/2.38-GCCcore-11.3.0 7) Tcl/8.6.12-GCCcore-11.3.0 11) libffi/3.4.2-GCCcore-11.3.0 | ||
4) bzip2/1.0.8-GCCcore- | 4) bzip2/1.0.8-GCCcore-11.3.0 8) SQLite/3.38.3-GCCcore-11.3.0 12) OpenSSL/1.1 | ||
bc06026@ra4-2 ~$ | bc06026@ra4-2 ~$ ml -Python/3.10.4-GCCcore-11.3.0 | ||
bc06026@ra4-2 ~$ | bc06026@ra4-2 ~$ ml | ||
No modules loaded | No modules loaded | ||
bc06026@ra4-2 ~$ | bc06026@ra4-2 ~$ | ||
</nowiki> | </nowiki> | ||
In the above example we see that we start with no modules loaded. Then upon loading the Python/3. | In the above example we see that we start with no modules loaded. Then upon loading the Python/3.10.4-GCCcore-11.3.0 module, we load Python and software included in the GCCcore-11.3.0 toolchain (the version of the compiler suite from which this instance of Python was compiled). If we are finished using a software module, we can simply unload it with the <code>ml -</code> command, and it will unload everything that came with the software module. | ||
===Searching for Software Modules=== | ===Searching for Software Modules=== | ||
Searching for software modules on Sapelo2 can be done with the <code> | Searching for software modules on Sapelo2 can be done with the <code>ml spider</code> command. It is important to note that while the <code>ml spider</code> command is case-insensitive, there are some cases in which the case of your search pattern can affect how the search results are displayed. For example, if you enter the command <code>ml spider python</code>, it would return every software module on Sapelo2 that has the string "python" in it (in upper or lowercase). On the other hand, if you were to enter the command <code>ml spider Python</code>, with an uppercase "P", it would list the software modules on Sapelo2 specifically for Python. For example: | ||
<nowiki> | <nowiki> | ||
bc06026@ss-sub1 ~$ | bc06026@ss-sub1 ~$ ml spider Python | ||
------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- | ||
Line 141: | Line 147: | ||
Versions: | Versions: | ||
Python/2.7. | Python/2.7.18-GCCcore-11.2.0 | ||
Python/2.7.18-GCCcore-11.3.0 | |||
Python/3.7.4-GCCcore-8.3.0 | Python/3.7.4-GCCcore-8.3.0 | ||
Python/3.8.2-GCCcore- | Python/3.8.6-GCCcore-10.2.0 | ||
Python/3.9.6-GCCcore-11.2.0 | |||
Python/3.10.4-GCCcore-11.3.0 | |||
Other possible modules matches: | Other possible modules matches: | ||
Biopython Boost.Python IPython bx-python netcdf4-python openslide-python | Biopython Boost.Python IPython bx-python netcdf4-python openslide-python | ||
Line 157: | Line 167: | ||
For example: | For example: | ||
$ module spider Python/3. | $ module spider Python/3.10.4-GCCcore-11.3.0 | ||
------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- | ||
Line 164: | Line 174: | ||
bc06026@ss-sub1 ~$ </nowiki> | bc06026@ss-sub1 ~$ </nowiki> | ||
If you do not find some software that you would like to use already installed on | If you do not find some software that you would like to use already installed on Sapelo2, you may request that we install it [https://uga.teamdynamix.com/TDClient/2060/Portal/Requests/ServiceDet?ID=25850 here]. | ||
===Important Notes Regarding Software Modules=== | ===Important Notes Regarding Software Modules=== | ||
Line 171: | Line 181: | ||
* If loading more than one software module, make sure that there are no toolchain conflicts, as discussed on our [https://wiki.gacrc.uga.edu/wiki/Avaliable_Toolchains_and_Toolchain_Compatibility#Toolchain_compatibility toolchain wiki page]. Loading multiple software modules with conflicting toolchains will cause your job to fail. | * If loading more than one software module, make sure that there are no toolchain conflicts, as discussed on our [https://wiki.gacrc.uga.edu/wiki/Avaliable_Toolchains_and_Toolchain_Compatibility#Toolchain_compatibility toolchain wiki page]. Loading multiple software modules with conflicting toolchains will cause your job to fail. | ||
* If you need to use software modules that have conflicting toolchains at the same time, you could reach out to us to see if we could install a version of your software with a particular toolchain, or you could try loading one software module, then unloading it and loading another, if your workflow allows this. | * If you need to use software modules that have conflicting toolchains at the same time, you could reach out to us to see if we could install a version of your software with a particular toolchain, or you could try loading one software module, then unloading it and loading another, if your workflow allows this. | ||
* Note that software modules often have dependencies that are packaged together into one module. For example, the toolchain " | * Note that software modules often have dependencies that are packaged together into one module. For example, the toolchain "3D-DNA/201008-foss-2021b-Python-2.7.18" will load Python 2.7.18, so you would not need to load Python separately when using 3D-DNA. For a full list of what a software module includes, use the <code>ml</code> command after loading a module. | ||
* Make sure the software module you're using is the software that you think it is. Sometimes different software will have a similar or the same name. This can be verified by reading the description of the software module from <code> | * Make sure the software module you're using is the software that you think it is. Sometimes different software will have a similar or the same name. This can be verified by reading the description of the software module from <code>ml spider ''pattern''</code> or <code>ml show ''moduleName''</code>, as well as by checking the homepage for the software, which is displayed in the <code>module show ''moduleName''</code> output. | ||
* Loaded software modules do not persist across separate login sessions, and will be unloaded upon exiting an interactive session or the completion of a job. | * Loaded software modules do not persist across separate login sessions, and will be unloaded upon exiting an interactive session or the completion of a job. | ||
* Software libraries not a part of a programming language's standard library will often exist in | * Software libraries not a part of a programming language's standard library will often exist in Sapelo2 as their own module. For example: | ||
<nowiki> | <nowiki> | ||
bc06026@ra4-2 ~$ | bc06026@ra4-2 ~$ ml Python/3.10.4-GCCcore-11.3.0 | ||
bc06026@ra4-2 ~$ | bc06026@ra4-2 ~$ | ||
bc06026@ra4-2 ~$ python -c "import scipy;print(scipy.__version__)" | bc06026@ra4-2 ~$ python -c "import scipy;print(scipy.__version__)" | ||
Line 183: | Line 193: | ||
ModuleNotFoundError: No module named 'scipy' | ModuleNotFoundError: No module named 'scipy' | ||
bc06026@ra4-2 ~$ | bc06026@ra4-2 ~$ | ||
bc06026@ra4-2 ~$ | bc06026@ra4-2 ~$ ml SciPy-bundle/2022.05-foss-2022a | ||
bc06026@ra4-2 ~$ | bc06026@ra4-2 ~$ | ||
bc06026@ra4-2 ~$ python -c "import scipy;print(scipy.__version__)" | bc06026@ra4-2 ~$ python -c "import scipy;print(scipy.__version__)" | ||
1. | 1.8.1 | ||
bc06026@ra4-2 ~$ | bc06026@ra4-2 ~$ | ||
</nowiki> | </nowiki> | ||
In the above example we see that Python/3. | In the above example we see that Python/3.10.4-GCCcore-11.3.0 was loaded, and then there was an attempt to import and print the version of SciPy from the command line. This returned an error, because SciPy has its own software module on Sapelo2, which includes Python. We can see that after loading the SciPy-bundle/2022.05-foss-2022a module, the same command printed out 1.8.1 for the version of SciPy. | ||
=Singularity Containers= | =Singularity Containers= | ||
Some software on | Some software on Sapelo2 is installed in the form of a Singularity container, in /apps/singularity-images. Singularity is an open-source container technology, similar to Docker, but designed for HPC cluster environments. For more information on Singularity, please see their documentation [https://sylabs.io/docs/ here]. Please note that Docker cannot be run on Sapelo2 due to security considerations of Docker containers giving a user effectively root privileges. Docker also requires a daemon, whereas Singularity does not. The good news is that you can convert Docker images into Singularity images. | ||
Singularity images are run to create Singularity containers. An image will be either a . | Singularity images are run to create Singularity containers. An image will be either a .sif or .simg (old extension name) file on Sapelo2. From Singularity's [https://sylabs.io/docs/ documentation], an image is "a single executable file based container image, cryptographically signed, auditable, secure, and easy to move using existing data mobility paradigms." Like a Docker image, this is essentially an executable program to start a container. From Singularity's [https://singularity.lbl.gov/#:~:text=Singularity%20enables%20users%20to%20have,a%20Singularity%20container%20and%20run. documentation], "Singularity containers can be used to package entire scientific workflows, software and libraries, and even data." Like a Docker container, a Singularity container is a user space apart from the underlying operating system from where you started the container. This typically contain any necessary dependencies for the application, and in some cases its own operating system in the container. An example where this can be very convenient is if an application was designed to run in an Ubuntu environment, but you need to run it on Sapelo2, which is Rocky Linux. By using a Singularity container, you are able to create the necessary environment for your software in your job, whether interactive or through a submission script. | ||
===Singularity Commands=== | ===Singularity Commands=== | ||
Line 202: | Line 212: | ||
Thankfully there are very few commands needed to run software from a Singularity container. | Thankfully there are very few commands needed to run software from a Singularity container. | ||
* <code>singularity exec</code> / | * <code>singularity exec</code> ''path/to/image'' - Launches a Singularity container and executes a command inside the container. For more information, please see the Singularity [https://sylabs.io/docs/ documentation]. | ||
* <code>singularity run</code> / | * <code>singularity run</code> ''path/to/image'' - Launches a Singularity container and executes a runscript (something defined at creation time for the container to do if treated as an executable or ran with <code>singularity run</code>) if one is defined for that container. This is less common than <code>singularity exec</code>, which will usually be used. | ||
* <code>singularity shell</code> - Launches a Singularity container and starts an interactive shell inside the container. | |||
<!--<code>singularity build --remote</code> - Builds a Singularity container remotely, and then pulls the container on to the machine from which this command was run. This is a great feature, because without the --remote option, building Singularity containers requires administrative privileges.--> | |||
'''Please note:''' You can run <code>singularity</code> on the compute nodes, either in an interactive session or in a batch job, without loading any modules. Note that singularity cannot be run on the Sapelo2 login nodes. | |||
===Building Singularity Containers=== | |||
<!-- | |||
Upon [https://uga.teamdynamix.com/TDClient/2060/Portal/Requests/ServiceCatalog?CategoryID=11593 request], we will be happy to build Singularity images for you and put them in our central /apps/singularity-images directory. However, if you would like to build your own Singularity images on the cluster, you can certainly do that, using the <code>--remote</code> option with <code>singularity build</code>. As previously mentioned, this option will build the image remotely, and then pull the image to machine from which the build command was ran. Setting up your environment to build Singularity images only takes a few steps, as follows: | |||
# Create a cloud.sylabs.io account by clicking the "Sign In" link at https://cloud.sylabs.io/home (there is no "Register" link, just click "Sign In" and it will create an account for you). | |||
# Execute the command <code>singularity remote login</code>. This will output a link for you to paste into your browser, followed by an input prompt. Upon pasting the link your browser, you will be able to generate an access token, that you can then paste into the input prompt in your terminal. | |||
Once those steps are done, you do not have to repeat them. Then, to build a Singularity image, follow these steps: | |||
# Start an interactive job session, for example, using the <code>qlogin</code> command. | |||
# Build your Singularity image with the syntax <code>singularity build --remote</code> ''path/to/image.sif'' ''imageDefinitionSource'', where ''imageDefinitionSource'' could be the path to a local Singularity definition file, or the remote address of a container image (be that Singularity or Docker). | |||
--> | |||
Upon [https://uga.teamdynamix.com/TDClient/2060/Portal/Requests/ServiceCatalog?CategoryID=11593 request], we will be happy to build Singularity images for you and put them in our central /apps/singularity-images directory. However, if you would like to build your own Singularity images on the cluster, you can certainly do that, using <code>apptainer build</code>. | |||
* Building a Singularity image with a local definition file: | |||
<!-- <code>singularity build --remote mycontainer.sif mycontainer.def</code> --> | |||
<code>apptainer build mycontainer.sif mycontainer.def</code> | |||
This will build a container called mycontainer.sif in the current directory. For more information on writing Singularity definition files, please see the official [https://sylabs.io/docs/ documentation] | |||
* Building a Singularity image with a remote Docker image: | |||
<!-- <code>singularity build --remote mycontainer.sif docker://''SomeDockerAccount''/''SomeDockerRepo''</code> --> | |||
<code>apptainer build mycontainer.sif docker://user/image:tag</code> | |||
This will convert a Docker image to a Singularity image, and then put it to the local machine from which the build command was ran. | |||
* Building a Singularity image by pulling a Docker image from Docker Hub: | |||
<code>apptainer pull docker://user/image:tag</code> | |||
This will download a Docker image and convert it to a Singularity image, and then put it to the local machine from which the build command was ran. | |||
<!-- | |||
Regardless of whether you build a Singularity image this way with a definition file or a Docker image, the Singularity image that you create will also exist at the URL provided at the end of the image build output, albeit with a tokenized name. Clicking on one of your images there will take you to a webpage that provides a <code>singularity pull</code> command for that Singularity image. If you would prefer to build a Singularity image in your web browser, you also have the option of writing or uploading a definition file to https://cloud.sylabs.io/builder. This also requires having and signing into a cloud.sylabs.io account, but provides an in-browser editor to write your own Singularity definition files, or to upload one you've already written. | |||
--> | |||
===Using Singularity Containers=== | ===Using Singularity Containers=== | ||
To use software from a Singularity container create your submission script or start your interactive session as you | To use software from a Singularity container create your submission script or start your interactive session, just as if you were going to use a software module, specifying the appropriate resource values in the Slurm #SBATCH headers or command line options, and then call your software by executing a command within the container. No loading of any software module is required. | ||
Here is an example of a submission script using a Singularity container: | Here is an example of a submission script using a Singularity container: | ||
Line 233: | Line 286: | ||
* If the documentation of the software you want to use is lacking and you're unable to determine what executable binary or script is meant to be run from the Singularity image you're using, try using the <code>ls</code> command to search directories inside the container. It is very common for a container's software to be in /usr/bin or /opt. | * If the documentation of the software you want to use is lacking and you're unable to determine what executable binary or script is meant to be run from the Singularity image you're using, try using the <code>ls</code> command to search directories inside the container. It is very common for a container's software to be in /usr/bin or /opt. | ||
* Singularity containers have been configured to access to the user's home directory ($HOME), scratch directory (/scratch), and the local scratch directory on the node (/lscratch). The /tmp directory is defined inside the container. | * Singularity containers have been configured to access to the user's home directory ($HOME), scratch directory (/scratch), and the local scratch directory on the node (/lscratch). The /tmp directory is defined inside the container. | ||
* To run a GPU-enabled singularity container on the GPU, please submit the job to the gpu_p partition, request a GPU device and add the --nv option to the singularity command. |
Latest revision as of 10:13, 19 August 2024
Introduction
On Sapelo2 users have the option to install their own software or use software installed centrally on Sapelo2 by the GACRC staff. Centrally-installed software on Sapelo2 will typically be in one of two formats: a software module or a Singularity container. Outlined here is an explanation of each of both categories of software and how to use them.
Getting Help Using Software
Very often the best place to learn how to use various types of scientific software will be the software's official documentation. Some software may have its own website, whereas other software's documentation may be in a README file of a Github repository. This is where you will find information such as a how to call software, necessary input files, default parameters values, various options available when calling software, etc... This type of documentation is very important to read to make sure that you're using software as you intend to and to its fullest extent.
When the online documentation for software is lacking or non-existent, you may also be able to find some helpful information calling the software via the command line. To do this, first start an interactive session with qlogin
. In your interactive session, load the software you want to use (be it a software module, Conda environment, or Singularity container) and then execute the relevant compiled binary or script with no options, or with -h or --help. For example:
bc06026@ra4-2 ~$ ml Bowtie2/2.4.5-GCC-11.3.0 bc06026@ra4-2 ~$ bc06026@ra4-2 ~$ bowtie2 --help Bowtie 2 version 2.4.5 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea) Usage: bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r> | --interleaved <i> | -b <bam>} [-S <sam>] ... (shortened for readability) Performance: -p/--threads <int> number of alignment threads to launch (1) --reorder force SAM output order to match order of input reads --mm use memory-mapped I/O for index; many 'bowtie's can share Other: --qc-filter filter out reads that are bad according to QSEQ filter --seed <int> seed for random number generator (0) --non-deterministic seed rand. gen. arbitrarily instead of using read attributes --version print version information and quit -h/--help print this usage message
In the above help output for the Bowtie2 software, we see example usage, and some of the optional parameters that you can specify when calling bowtie2. If you're not sure what the name of the compiled binary to call is, it very often will be the name of the software you're using in lowercase (but not always). You can verify this by checking online documentation, or by loading the software and then starting to type what may be the appropriate executable, and then hitting the tab key twice on your keyboard, which would auto-complete the rest of the executable name if it is valid. Hitting tab three times in succession would show you if there are any similarly-named executables of the software with the same name. For example, this output is shown after loading the Bowtie2/2.4.5-GCC-11.3.0 module and typing "bow" and then hitting tab three times:
bc06026@ra4-2 ~$ bowtie2 bowtie2 bowtie2-build bowtie2-inspect bowtie2-align-l bowtie2-build-l bowtie2-inspect-l bowtie2-align-s bowtie2-build-s bowtie2-inspect-s bc06026@ra4-2 ~$ bowtie2
Software Modules
The majority of software centrally installed on Sapelo2 is installed in the form of a software module. A software module is a grouping of some software and dependencies. By leveraging software modules, you gain access to only the software you need when you need it. This is achieved by modifying your PATH environmental variable as well as creating other environment variables.
Software modules are in the format Name/Version-Toolchain. A toolchain is a collection of ancillary software discussed further here.
Here are some examples of Python modules on Sapelo2:
Python/2.7.18-GCCcore-11.3.0 Python/3.9.6-GCCcore-11.2.0 Python/3.10.4-GCCcore-11.3.0
Here is an example of how the path and version of the command "python" changes after a module is loaded:
bc06026@ra4-2 ~$ which python /usr/bin/which: no python in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/gacrc/bin:/opt/apps/slurm/prod/bin:) bc06026@ra4-2 ~$ which python3 /usr/bin/python3 bc06026@ra4-2 ~$python3 -V Python 3.6.8 bc06026@ra4-2 ~$ ml Python/3.10.4-GCCcore-11.3.0 bc06026@ra4-2 ~$ which python /apps/eb/Python/3.10.4-GCCcore-11.3.0/bin/python bc06026@ra4-2 ~$ python -V Python 3.10.4 bc06026@ra4-2 ~$
Each software module defines an environment variable called EBROOTNAME, where NAME is the name of the application, in capital letters (e.g. EBROOTPYTHON for Python). This environment variable stores the full path to the directory where the application is installed and it can be used to used to invoke binaries that need to be called with their full path.
Software Module Commands
Software modules are managed with ml
commands. Here are some useful ml
commands:
ml spider pattern
- search for available software, e.g.,ml spider Python
ml moduleName
– load a software module for use, e.g.,ml Python/3.10.4-GCCcore-11.3.0
ml -moduleName
– unload a software module, e.g.,ml -Python/3.10.4-GCCcore-11.3.0
ml
– list currently loaded software modulesml show moduleName
- show detailed information about a software module, including its description and homepage URLml av
– list all available software modules
Using Software Modules
When you first log into Sapelo2, you will not have any modules loaded. There are two scenarios in which you would ml
a software module, in a submission script and in an interactive job session. You may search for modules on the login node with the ml spider
command, but please never load software modules from the login node.
Here is an example of loading a software module in a submission script:
#!/bin/bash #SBATCH --job-name=testserial # Job name #SBATCH --partition=batch # Partition (queue) name #SBATCH --ntasks=1 # Run on a single CPU core #SBATCH --mem=1gb # Job memory request #SBATCH --time=02:00:00 # Time limit hrs:min:sec #SBATCH --output=testserial.%j.out # Standard output log #SBATCH --error=testserial.%j.err # Standard error log #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=username@uga.edu # Where to send mail cd $SLURM_SUBMIT_DIR ml R/4.3.1-foss-2022a R CMD BATCH add.R
Note that the module is loaded after the Slurm headers (#SBATCH lines), but before calling the software, which in this case is R. This is important as the Slurm headers need to come first, and the proper version of the software to be used must be loaded prior to being called.
Here is an example of a module being loaded in an interactive session on Sapelo2:
bc06026@ra4-2 ~$ ml No modules loaded bc06026@ra4-2 ~$ ml Python/3.10.4-GCCcore-11.3.0 bc06026@ra4-2 ~$ ml Currently Loaded Modules: 1) GCCcore/11.3.0 5) ncurses/6.3-GCCcore-11.3.0 9) XZ/5.2.5-GCCcore-11.3.0 13) Python/3.10.4-GCCcore-11.3.0 2) zlib/1.2.12-GCCcore-11.3.0 6) libreadline/8.1.2-GCCcore-11.3.0 10) GMP/6.2.1-GCCcore-11.3.0 3) binutils/2.38-GCCcore-11.3.0 7) Tcl/8.6.12-GCCcore-11.3.0 11) libffi/3.4.2-GCCcore-11.3.0 4) bzip2/1.0.8-GCCcore-11.3.0 8) SQLite/3.38.3-GCCcore-11.3.0 12) OpenSSL/1.1 bc06026@ra4-2 ~$ ml -Python/3.10.4-GCCcore-11.3.0 bc06026@ra4-2 ~$ ml No modules loaded bc06026@ra4-2 ~$
In the above example we see that we start with no modules loaded. Then upon loading the Python/3.10.4-GCCcore-11.3.0 module, we load Python and software included in the GCCcore-11.3.0 toolchain (the version of the compiler suite from which this instance of Python was compiled). If we are finished using a software module, we can simply unload it with the ml -
command, and it will unload everything that came with the software module.
Searching for Software Modules
Searching for software modules on Sapelo2 can be done with the ml spider
command. It is important to note that while the ml spider
command is case-insensitive, there are some cases in which the case of your search pattern can affect how the search results are displayed. For example, if you enter the command ml spider python
, it would return every software module on Sapelo2 that has the string "python" in it (in upper or lowercase). On the other hand, if you were to enter the command ml spider Python
, with an uppercase "P", it would list the software modules on Sapelo2 specifically for Python. For example:
bc06026@ss-sub1 ~$ ml spider Python ------------------------------------------------------------------------------------------------------------------- Python: ------------------------------------------------------------------------------------------------------------------- Description: Python is a programming language that lets you work more quickly and integrate your systems more effectively. Versions: Python/2.7.18-GCCcore-11.2.0 Python/2.7.18-GCCcore-11.3.0 Python/3.7.4-GCCcore-8.3.0 Python/3.8.6-GCCcore-10.2.0 Python/3.9.6-GCCcore-11.2.0 Python/3.10.4-GCCcore-11.3.0 Other possible modules matches: Biopython Boost.Python IPython bx-python netcdf4-python openslide-python ------------------------------------------------------------------------------------------------------------------- To find other possible module matches execute: $ module -r spider '.*Python.*' ------------------------------------------------------------------------------------------------------------------- For detailed information about a specific "Python" package (including how to load the modules) use the module's full name. Note that names that have a trailing (E) are extensions provided by other modules. For example: $ module spider Python/3.10.4-GCCcore-11.3.0 ------------------------------------------------------------------------------------------------------------------- bc06026@ss-sub1 ~$
If you do not find some software that you would like to use already installed on Sapelo2, you may request that we install it here.
Important Notes Regarding Software Modules
- If loading more than one software module, make sure that there are no toolchain conflicts, as discussed on our toolchain wiki page. Loading multiple software modules with conflicting toolchains will cause your job to fail.
- If you need to use software modules that have conflicting toolchains at the same time, you could reach out to us to see if we could install a version of your software with a particular toolchain, or you could try loading one software module, then unloading it and loading another, if your workflow allows this.
- Note that software modules often have dependencies that are packaged together into one module. For example, the toolchain "3D-DNA/201008-foss-2021b-Python-2.7.18" will load Python 2.7.18, so you would not need to load Python separately when using 3D-DNA. For a full list of what a software module includes, use the
ml
command after loading a module. - Make sure the software module you're using is the software that you think it is. Sometimes different software will have a similar or the same name. This can be verified by reading the description of the software module from
ml spider pattern
orml show moduleName
, as well as by checking the homepage for the software, which is displayed in themodule show moduleName
output. - Loaded software modules do not persist across separate login sessions, and will be unloaded upon exiting an interactive session or the completion of a job.
- Software libraries not a part of a programming language's standard library will often exist in Sapelo2 as their own module. For example:
bc06026@ra4-2 ~$ ml Python/3.10.4-GCCcore-11.3.0 bc06026@ra4-2 ~$ bc06026@ra4-2 ~$ python -c "import scipy;print(scipy.__version__)" Traceback (most recent call last): File "<string>", line 1, in <module> ModuleNotFoundError: No module named 'scipy' bc06026@ra4-2 ~$ bc06026@ra4-2 ~$ ml SciPy-bundle/2022.05-foss-2022a bc06026@ra4-2 ~$ bc06026@ra4-2 ~$ python -c "import scipy;print(scipy.__version__)" 1.8.1 bc06026@ra4-2 ~$
In the above example we see that Python/3.10.4-GCCcore-11.3.0 was loaded, and then there was an attempt to import and print the version of SciPy from the command line. This returned an error, because SciPy has its own software module on Sapelo2, which includes Python. We can see that after loading the SciPy-bundle/2022.05-foss-2022a module, the same command printed out 1.8.1 for the version of SciPy.
Singularity Containers
Some software on Sapelo2 is installed in the form of a Singularity container, in /apps/singularity-images. Singularity is an open-source container technology, similar to Docker, but designed for HPC cluster environments. For more information on Singularity, please see their documentation here. Please note that Docker cannot be run on Sapelo2 due to security considerations of Docker containers giving a user effectively root privileges. Docker also requires a daemon, whereas Singularity does not. The good news is that you can convert Docker images into Singularity images.
Singularity images are run to create Singularity containers. An image will be either a .sif or .simg (old extension name) file on Sapelo2. From Singularity's documentation, an image is "a single executable file based container image, cryptographically signed, auditable, secure, and easy to move using existing data mobility paradigms." Like a Docker image, this is essentially an executable program to start a container. From Singularity's documentation, "Singularity containers can be used to package entire scientific workflows, software and libraries, and even data." Like a Docker container, a Singularity container is a user space apart from the underlying operating system from where you started the container. This typically contain any necessary dependencies for the application, and in some cases its own operating system in the container. An example where this can be very convenient is if an application was designed to run in an Ubuntu environment, but you need to run it on Sapelo2, which is Rocky Linux. By using a Singularity container, you are able to create the necessary environment for your software in your job, whether interactive or through a submission script.
Singularity Commands
Thankfully there are very few commands needed to run software from a Singularity container.
singularity exec
path/to/image - Launches a Singularity container and executes a command inside the container. For more information, please see the Singularity documentation.singularity run
path/to/image - Launches a Singularity container and executes a runscript (something defined at creation time for the container to do if treated as an executable or ran withsingularity run
) if one is defined for that container. This is less common thansingularity exec
, which will usually be used.singularity shell
- Launches a Singularity container and starts an interactive shell inside the container.
Please note: You can run singularity
on the compute nodes, either in an interactive session or in a batch job, without loading any modules. Note that singularity cannot be run on the Sapelo2 login nodes.
Building Singularity Containers
Upon request, we will be happy to build Singularity images for you and put them in our central /apps/singularity-images directory. However, if you would like to build your own Singularity images on the cluster, you can certainly do that, using apptainer build
.
- Building a Singularity image with a local definition file:
apptainer build mycontainer.sif mycontainer.def
This will build a container called mycontainer.sif in the current directory. For more information on writing Singularity definition files, please see the official documentation
- Building a Singularity image with a remote Docker image:
apptainer build mycontainer.sif docker://user/image:tag
This will convert a Docker image to a Singularity image, and then put it to the local machine from which the build command was ran.
- Building a Singularity image by pulling a Docker image from Docker Hub:
apptainer pull docker://user/image:tag
This will download a Docker image and convert it to a Singularity image, and then put it to the local machine from which the build command was ran.
Using Singularity Containers
To use software from a Singularity container create your submission script or start your interactive session, just as if you were going to use a software module, specifying the appropriate resource values in the Slurm #SBATCH headers or command line options, and then call your software by executing a command within the container. No loading of any software module is required.
Here is an example of a submission script using a Singularity container:
#!/bin/bash #SBATCH --job-name=trinity #SBATCH --partition=highmem_p #SBATCH --cpus-per-task=16 #SBATCH --mem=100gb #SBATCH --time=10:00:00 cd $SLURM_SUBMIT_DIR singularity exec /apps/singularity-images/trinity-2.8.4.simg Trinity --seqType <string> --max_memory <int> --CPU <int> --no_version_check --full_cleanup --normalize_reads
Note that all that is required to use the Trinity software in the above submission script is launching the container by running singularity exec
followed by the path to the Singularity image, and then the relevant options for the software. To run a Singularity container in an interactive session, enter the same singularity exec
command that you would put in a script.
Getting help output from software in a Singularity container is very much like getting help output from software in a software module. Start an interactive session and type singularity exec
followed by the path to the Singularity image, and then the name of the relevant compiled binary or script with no options, or with -h or --help.
Important Notes Regarding Singularity
- If the documentation of the software you want to use is lacking and you're unable to determine what executable binary or script is meant to be run from the Singularity image you're using, try using the
ls
command to search directories inside the container. It is very common for a container's software to be in /usr/bin or /opt. - Singularity containers have been configured to access to the user's home directory ($HOME), scratch directory (/scratch), and the local scratch directory on the node (/lscratch). The /tmp directory is defined inside the container.
- To run a GPU-enabled singularity container on the GPU, please submit the job to the gpu_p partition, request a GPU device and add the --nv option to the singularity command.