Software on sapslurm tmp

From Research Computing Center Wiki
Revision as of 14:38, 7 July 2020 by Gcormier (talk | contribs)
Jump to navigation Jump to search

Introduction

On SapSlurm users have the option to install their own software or use software installed centrally on SapSlurm by the GACRC staff. Centrally-installed software on SapSlurm will typically be in one of three formats: a software module, a Singularity container, or a Conda environment. Outlined here is an explanation of each of these three categories of software and how to use them.

Getting Help Using Software

Very often the best place to learn how to use various types of scientific software will be the software's official documentation. Some software may have its own website, whereas other software's documentation may be in a README file of a Github repository. This is where you will find information such as a how to call software, necessary input files, default parameters values, various options available when calling software, etc... This type of documentation is very important to read to make sure that you're using software as you intend to and to its fullest extent.

When the online documentation for software is lacking or non-existent, you may also be able to find some helpful information calling the software via the command line. To do this, first start an interactive session with srun --pty bash. In your interactive session, load the software you want to use (be it a software module, Conda environment, or Singularity conatiner) and then execute the relevant compiled binary or script with no options, or with -h or --help. For example:

bc06026@ra4-2 ~$ module load Bowtie2/2.4.1-GCC-8.3.0
bc06026@ra4-2 ~$ 
bc06026@ra4-2 ~$ bowtie2 --help
Bowtie 2 version 2.4.1 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea)
Usage: 
  bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r> | --interleaved <i> | -b <bam>} [-S <sam>]

... (shortened for readability)

 Performance:
  -p/--threads <int> number of alignment threads to launch (1)
  --reorder          force SAM output order to match order of input reads
  --mm               use memory-mapped I/O for index; many 'bowtie's can share

 Other:
  --qc-filter        filter out reads that are bad according to QSEQ filter
  --seed <int>       seed for random number generator (0)
  --non-deterministic seed rand. gen. arbitrarily instead of using read attributes
  --version          print version information and quit
  -h/--help          print this usage message

In the above help output for the Bowtie2 software, we see example usage, and some of the optional parameters that you can specify when calling bowtie2. If you're not sure what the name of the compiled binary to call is, it very often will be the name of the software you're using in lowercase (but not always). You can verify this by checking online documentation, or by loading the software and then starting to type what may be the appropriate executable, and then hitting the tab key twice on your keyboard, which would auto-complete the rest of the executable name if it is valid. Hitting tab three times in succession would show you if there are any similarly-named executables of the software with the same name. For example, this output is shown after loading the Bowtie2/2.4.1-GCC-8.3.0 module and typing "bow" and then hitting tab three times:

bc06026@ra4-2 ~$ bowtie2
bowtie2            bowtie2-build      bowtie2-inspect    
bowtie2-align-l    bowtie2-build-l    bowtie2-inspect-l  
bowtie2-align-s    bowtie2-build-s    bowtie2-inspect-s  
bc06026@ra4-2 ~$ bowtie2

Software Modules

The majority of software centrally installed on SapSlurm is installed in the form of a software module. A software module is a grouping of some software and dependencies. By leveraging software modules, you gain access to only the software you need when you need it. This is achieved by modifying your PATH environmental variable as well as creating other environment variables.

Software modules are in the format Name/Version-Toolchain. A toolchain is a collection of ancillary software discussed further here.

Here are some examples of Python modules on SapSlurm:

Python/2.7.16-GCCcore-8.3.0
Python/3.7.4-GCCcore-8.3.0
Python/3.8.2-GCCcore-8.3.0

Here is an example of how the command "python" changes its path and thus version after a module is loaded:

bc06026@ra4-2 ~$ which python
/usr/bin/python
bc06026@ra4-2 ~$ python -V
Python 2.7.5
bc06026@ra4-2 ~$ module load Python/3.8.2-GCCcore-8.3.0
bc06026@ra4-2 ~$ which python
/apps/eb/Python/3.8.2-GCCcore-8.3.0/bin/python
bc06026@ra4-2 ~$ python -V
Python 3.8.2
bc06026@ra4-2 ~$ 

Software Module Commands

Software modules are managed with module commands. Here are some useful module commands:

  • module spider pattern - search for available software, i.e., module spider Python
  • module load moduleName – load a software module for use, i.e., module load Python/3.8.2-GCCcore-8.3.0
  • module unload moduleName – unload a software module, i.e., module unload Python/3.8.2-GCCcore-8.3.0
  • module list – list currently loaded software modules
  • module show moduleName - show detailed information about a software module, including its description and homepage URL
  • module avail – list all available software modules

Using Software Modules

When you first log into ss-sub1, you will not have any modules loaded. There are two scenarios in which you would module load a software module, in a submission script and in an interactive job session. You may search for modules on the login node with the module spider command, but please never load software modules from the login node.

Here is an example of loading a software module in a submission script:

#!/bin/bash
#SBATCH --job-name=testserial         # Job name
#SBATCH --partition=batch             # Partition (queue) name
#SBATCH --ntasks=1                    # Run on a single CPU
#SBATCH --mem=1gb                     # Job memory request
#SBATCH --time=02:00:00               # Time limit hrs:min:sec
#SBATCH --output=testserial.%j.out    # Standard output log
#SBATCH --error=testserial.%j.err    # Standard error log

#SBATCH --mail-type=END,FAIL          # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=username@uga.edu  # Where to send mail	

cd $SLURM_SUBMIT_DIR

module load R/3.6.2-foss-2019b

R CMD BATCH add.R

Note that the module is loaded after the Slurm headers (#SBATCH lines), but before calling the software, which in this case is R. This is important as the Slurm headers need to come first, and the proper version of the software to be used must be loaded prior to being called.

Here is an example of a module being loaded in an interactive session on SapSlurm:

bc06026@ra4-2 ~$ module list
No modules loaded
bc06026@ra4-2 ~$ module load Python/3.8.2-GCCcore-8.3.0
bc06026@ra4-2 ~$ module list

Currently Loaded Modules:
  1) GCCcore/8.3.0                 5) ncurses/6.1-GCCcore-8.3.0       9) XZ/5.2.4-GCCcore-8.3.0
  2) zlib/1.2.11-GCCcore-8.3.0     6) libreadline/8.0-GCCcore-8.3.0  10) GMP/6.1.2-GCCcore-8.3.0
  3) binutils/2.32-GCCcore-8.3.0   7) Tcl/8.6.9-GCCcore-8.3.0        11) libffi/3.2.1-GCCcore-8.3.0
  4) bzip2/1.0.8-GCCcore-8.3.0     8) SQLite/3.29.0-GCCcore-8.3.0    12) Python/3.8.2-GCCcore-8.3.0

 

bc06026@ra4-2 ~$ module unload Python/3.8.2-GCCcore-8.3.0
bc06026@ra4-2 ~$ module list
No modules loaded
bc06026@ra4-2 ~$ 

In the above example we see that we start with no modules loaded. Then upon loading the Python/3.8.2-GCCcore-8.3.0 module, we load Python and software included in the GCCcore-8.3.0 toolchain (the version of the compiler suite from which this instance of Python was compiled). If we are finished using a software module, we can simply unload it with the module unload command, and it will unload everything that came with the software module.

Searching for Software Modules

Searching for software modules on Sapelo2 can be done with the module spider command. It is important to note that while the module spider command is case-insensitive, there are some cases in which the case of your search pattern can affect how the search results are displayed. For example, if you enter the command module spider python, it would return every software module on SapSlurm that has the string "python" in it (in upper or lowercase). On the other hand, if you were to enter the command module spider Python, with an uppercase "P", it would the software modules on SapSlurm specifically for Python. For example:

bc06026@ss-sub1 ~$ module spider Python

-------------------------------------------------------------------------------------------------------------------------------------------------------------
  Python:
-------------------------------------------------------------------------------------------------------------------------------------------------------------
    Description:
      Python is a programming language that lets you work more quickly and integrate your systems more effectively.

     Versions:
        Python/2.7.16-GCCcore-8.3.0
        Python/3.7.4-GCCcore-8.3.0
        Python/3.8.2-GCCcore-8.3.0
     Other possible modules matches:
        Biopython  Boost.Python  IPython  bx-python  netcdf4-python  openslide-python

-------------------------------------------------------------------------------------------------------------------------------------------------------------
  To find other possible module matches execute:

      $ module -r spider '.*Python.*'

-------------------------------------------------------------------------------------------------------------------------------------------------------------
  For detailed information about a specific "Python" package (including how to load the modules) use the module's full name.
  Note that names that have a trailing (E) are extensions provided by other modules.
  For example:

     $ module spider Python/3.8.2-GCCcore-8.3.0
-------------------------------------------------------------------------------------------------------------------------------------------------------------

 

bc06026@ss-sub1 ~$ 

If you do not find some software that you would like to use already installed on SapSlurm, you may request that we install it here.

Important Notes Regarding Software Modules

  • If loading more than one software module, make sure that there are no toolchain conflicts, as discussed on our toolchain wiki page. Loading multiple software modules with conflicting toolchains will cause your job to fail.
  • If you need to use software modules that have conflicting toolchains at the same time, you could reach out to us to see if we could install a version of your software with a particular toolchain, or you could try loading one software module, then unloading it and loading another, if your workflow allows this.
  • Note that software modules often have dependencies that are packaged together into one module. For example, the toolchain "BUSCO/4.0.5-foss-2019b-Python-3.7.4" will load Python 3.7.4, so you would not need to load Python separately when using BUSCO. For a full list of what a software module includes, use the module list command after loading a module.
  • Make sure the software module you're using is the software that you think it is. Sometimes different software will have the similar or the same name. This can be verified by reading the description of the software module from module spider pattern or module show moduleName, as well as by checking the homepage for the software, which is displayed in the module show moduleName output.
  • Loaded software modules do not persist across separate login sessions, and will be unloaded upon exiting an interactive session or the completion of a job.
  • Software libraries not a part of a programming language's standard library will often exist in SapSlurm as their own module. For example:
bc06026@ra4-2 ~$ module load Python/3.8.2-GCCcore-8.3.0
bc06026@ra4-2 ~$ 
bc06026@ra4-2 ~$ python -c "import scipy;print(scipy.__version__)"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'scipy'
bc06026@ra4-2 ~$ 
bc06026@ra4-2 ~$ module load SciPy-bundle/2020.03-foss-2019b-Python-3.8.2
bc06026@ra4-2 ~$ 
bc06026@ra4-2 ~$ python -c "import scipy;print(scipy.__version__)"
1.4.1
bc06026@ra4-2 ~$ 

In the above example we see that Python/3.8.2-GCCcore-8.3.0 was loaded, and then there was an attempt to import and print the version of SciPy from the command line. This returned an error, because SciPy has its own software module on SapSlurm, which includes Python. We can see that after loading the SciPy-bundle/2020.03-foss-2019b-Python-3.8.2 module, the same command printed out 1.4.1 for the version of SciPy.

Singularity Containers

Some software on SapSlurm is installed in the form of a Singularity container, in /apps/singularity-images. Singularity is an open-source container technology, similar to Docker, but designed for HPC cluster environments. For more information on Singularity, please see their documentation here. Please note that Docker cannot be ran on SapSlurm due to security considerations of Docker contianers giving a user effectively root privileges. Docker also requires a daemon, whereas Singularity does not. The good news is that you can convert Docker images into a Singularity images.

Singularity images are run to to create a Singularity containers. An image will be either a .simg or .sif file on SapSlurm. From Singularity's documentation, an image is "a single executable file based container image, cryptographically signed, auditable, secure, and easy to move using existing data mobility paradigms." Like a Docker image, this essentially that it is an executable program to start a container. From Singularity's documentation, "Singularity containers can be used to package entire scientific workflows, software and libraries, and even data." Like a Docker container, a Singularity container is a user space apart from the underlying operating system from where you started the container. This typically contain any necessary dependencies for the application, and in some cases its own operating system in the container. An example where this can be very convenient is if an application was designed to run in an Ubuntu environment, but you need to run it on SapSlurm, which is CentOS. By using a Singularity container, you are able to create the necessary environment for your software in your job, whether interactive or through a submission script.

Singularity Commands

Thankfully there are very few commands needed to run software from a Singularity container.

  • singularity exec /apps/singularity-images/imageName - This command launch a container and execute a command inside the given container. For more information, please see the Singularity documentation.
  • singularity run /apps/singularity-images/imageName - This command will launch a Singularity container and execute a runscript if one is defined for that container. This is less common that singularity exec, which will usually be used.

Using Singularity Containers

To use software from a Singularity container create your submission or script or start your interactive session as you normally would with a software module, specifying the appropriate resource values in the Slurm #SBATCH headers or command line options, and then call your software by executing a command within the container. No loading of any software module is required.

Here is an example of a submission script using a Singularity container:

#!/bin/bash

#SBATCH --job-name=trinity
#SBATCH --partition=highmem_q
#SBATCH --cpus-per-task=16
#SBATCH --mem=100gb
#SBATCH --time=10:00:00
 
cd $SLURM_SUBMIT_DIR

singularity exec /apps/singularity-images/trinity-2.8.4.simg Trinity --seqType <string> --max_memory <int> --CPU <int> --no_version_check --full_cleanup --normalize_reads    

Note that all that is required to use the Trinity software in the above submission script is launching the container by running singularity exec followed by the path to the Singularity image, and then the relevant options for the software. To do run a Singularity container in an interactive session, enter the same singularity exec command that you would put in a script.

Getting help output from software in a Singularity container is very much like getting help output from software in a software module. Start an interactive session and type singularity exec followed by the path to the Singularity image, and then the name of the relevant compiled binary or script with no options, or with -h or --help.

Important Notes Regarding Singularity

  • If the documentation of the software you want to use is lacking and you're unable to determine what executable binary or script is meant to be ran from the Singularity image you're using, try using the ls command to search directories inside the container. It is very common for a container's software to be in /usr/bin or /opt.

Conda Environments

[insert conda info here]