Installing Applications on Sapelo2
Introduction
We introduce here some guidance on how to install applications and libraries on Sapelo2.
In general, users can build applications under their home directory or other space owned by the users.
If an application will be used by several members of a group, the application can be installed in the user's group work space; GACRC can also help to set a shared directory for the group to host the application.
GACRC team takes requests to install applications at a central place (/apps) if the application satisfies the following conditions:
- The application is Linux compatible.
- The application has general interest among users.
- The application needs to be built with special settings, such as root privilege, shared common data set, complex structures, or other requirements.
- The application needs to be configured to integrate into the GACRC environment, such as queue settings or database connections.
Please use the GACRC Software Installation/Update online form to submit a ticket to GACRC team.
DO NOT install applications at login node
Login node has limited memory, most time the build fails due to insufficient memory. All of following installations are under qlogin environment.
How to check if an application is installed
To find if an application, e.g. Trinity, is already installed on Sapelo2, use the following command (the application name is NOT case sensitive):
module avail trinity
or
module spider trinity
To see a description of the module (the application name is case sensitive)
module whatis Trinity
To see configuration details, e.g., environment variables, of the module (the application name is case sensitive)
module show Trinity
How to check if a library is installed
Perl
To check if a perl library is installed, e.g. DBI, first load the Perl package of interest, for example Perl/5.30.0-GCCcore-8.3.0:
module load Perl/5.30.0-GCCcore-8.3.0 perl -MDBI -e 'print "OK\n"'
If the library is not installed, it will warn that the library is not in the path. Otherwise, it will print OK.
Python
To check if a Python library is installed, e.g. numpy, first load the Python package of interest, then run "pip show" or "pip list" commands. The Linux command "which" is to identify if pip is in place. In the following example, we use Python/2.7.16-GCCcore-8.3.0:
module load Python/2.7.16-GCCcore-8.3.0 which pip pip show numpy
To list all installed libraries with their versions:
module load Python/2.7.16-GCCcore-8.3.0 pip list
To generate an alphabetical list of the libraries installed under a Python version:
module load Python/2.7.16-GCCcore-8.3.0 pip freeze | sort
For Python3, some versions have both pip3 and pip. pip3 is the Python3 version of pip. If you load Python3, run pip3, otherwise run pip. You can use "which" command to identify if pip3 is in place. For example:
module load Python/3.8.2-GCCcore-8.3.0 which pip3
Then check if a Python3 library is installed, e.g. using pip3:
module load Python/3.8.2-GCCcore-8.3.0 pip3 show numpy
To list all installed libraries with their versions:
module load Python/3.8.2-GCCcore-8.3.0 pip3 list
To generate an alphabetical list of the libraries installed under a Python version:
module load Python/3.8.2-GCCcore-8.3.0 pip3 freeze | sort
If the library is not installed, it will not be shown by pip or pip3 commands. Otherwise, pip or pip3 will give you information about libraries' version and installation location.
Note: Some Python libraries are installed outside of the Python default location and are provided as a separate module file that needs to be loaded separately. To check if a python library is installed as a separate module, please use the ml spider command. For example, to check if matplotlib is installed as a separate module:
module spider matplotlib
This command will return all the matplotlib versions currently installed on the cluster, along with the version of Python that it uses. For example, the module named matplotlib/3.1.1-foss-2019b-Python-3.8.2 provides matplotlib version 3.1.1 for Python 3.8.2 and it uses the foss-2019b toolchain. To use this version of matplotlib, please load the module matplotlib/3.1.1-foss-2019b-Python-3.8.2. Similarly, you can load other module files that provide other Python libraries.
R
To check if a R library is installed, e.g. xtable, first load the R version of interest, for example R/4.0.0-foss-2019b:
qlogin module load R/4.0.0-foss-2019b R require("xtable") exit
If the library is not installed, it will warn that the library is not in the path. Otherwise, it will print "Loading required package: xtable".
Which node to build from
We suggest that you build applications from an interactive session that you can start with "qlogin" command from Sapelo2 login node:
qlogin
How to structure directories
Common practice is to set up the following three directories using "mkdir" command in the user's home directory /home/MyID:
- apps: directory where the applications will be installed
- src: directory under which you can store the source files and build the applications
- modulefiles: directory under which you can put your own module files
These directories can be created with
mkdir ~/apps mkdir ~/src mkdir ~/modulefiles
How to use local Lmod modules
To use your own modules, e.g. trinity/1.0:
module use ~/modulefiles module load trinity/1.0
If there are module files with the same name as in our central place, your local ones will take precedence over the central ones.
How to install software (general)
If you are interested in using an application, check the website of this application to see if (a) it is compatible with Linux, (b) it is distributed as binaries or source code, (c) and the kind of package that it is (Python library and scripts, Perl scripts, Java, C, C++, Fortran code, etc).
Perl scripts and Java jar files can be downloaded into the users' home directory and run from there.
If the application is distributed as pre-compiled binaries, check if binaries are available for the Linux OS that our cluster run (Sapelo2 runs CentOS 7). Binaries compiled for Window and for Mac OSX cannot be run on Sapelo2. Also, binaries generated in other Linux distributions (e.g. Ubuntu, Debian, etc) will in general not work on Sapelo2.
If pre-compiled binaries are not available for the OS on Sapelo2 (CentOS 7), then you can compile the code yourself, if the source code is available.
Another option for running binaries compatible with other Linux distributions is to create a Singularity container and run it on Sapelo2 (more information below).
How to build C, C++ applications
Here is an example on how to install an application called GERUDsim3, which comprise a single C++ program.
1. Login to sapelo2 login node. If you do not have a directory called apps, create one with
mkdir ~/apps
2. Download the application into the ~/apps directory. For the example used here, you can download it with
cd ~/apps git clone https://github.com/JonesLabIdaho/GERUDsim3.git
3. Start an interactive session with
qlogin
4. compile the code on the interactive node with:
cd ~/apps/GERUDsim3/GERUDsim3/Source_files g++ GerudSim3.cpp -o GerudSim3
The g++ command that is on user's default path is version 4.8.5. If you wish/need to use a different GNU compiler version, or a different compiler (e.g. Intel), then first load the corresponding module and then compile the code. For example, to use g++ 8.3.0:
cd ~/apps/GERUDsim3/GERUDsim3/Source_files module load foss/2019b g++ GerudSim3.cpp -o GerudSim3
5. The name of your executable is GerudSim3. You can run it with the full path
~/apps/GERUDsim3/GERUDsim3/Source_files/GerudSim3
or copy this binary to ~/bin (create this directory first, if it does not exist yet) and run it with
~/bin/GerudSim3
You can also add ~/bin to your default PATH. To do that, add the following in your .bashrc file:
export PATH=/home/MYID/bin/:$PATH
where MYID needs to be replaced by your UGA MyID. With this, any executable you put in ~/bin will be in your PATH and can be invoked without its full path.
For more information on compilers available on Sapelo2, please see Code Compilation on Sapelo2.
Cross build for mixed processor architecture
At Sapelo2, we have two types of processors: Intel and AMD.
In some c or c++ applications, the performance-optimized flags may be introduced from the application document or in the configuration, such as -march=native. To enable the built application running on both processors, these native should be removed from compile options or set the flags as -mtune=generic -march=x86-64, which are usually default value if these flags are not set.
The following login commands can provide different processors to test on the built.
qlogin_intel
qlogin_amd
How to install a Perl module
A convenient way to build and install perl modules is to use CPAN, which can be used to install Perl modules in the user's home directory. We suggest that you do this installation in an interactive session.
1. Start an interactive session with
qlogin
2. Decide where you want the perl modules to be installed, e.g. ~/perlmods, and create this directory if it does not exist yet
mkdir ~/perlmods
3. Load the perl module that you want to use, e.g. Perl/5.30.0-GCCcore-8.3.0
module load Perl/5.30.0-GCCcore-8.3.0
4. Start the CPAN shell with
cpan
When you start cpan for the first time, some configurations will be set (unless you already have configurations set in ~/.cpan/CPAN/MyConfig.pm):
cpan CPAN.pm requires configuration, but most of it can be done automatically. If you answer 'no' below, you will enter an interactive dialog for each configuration option instead. Would you like to configure as much as possible automatically? [yes] Autoconfiguration complete. commit: wrote '/home/MyID/.cpan/CPAN/MyConfig.pm' You can re-run configuration any time with 'o conf init' in the CPAN shell cpan shell -- CPAN exploration and modules installation (v2.14) Enter 'h' for help. cpan[1]>
Note that in the above commit output: 'wrote /home/MyID/.cpan/CPAN/MyConfig.pm', MyID is replaced by your own MyID.
5. From within the CPAN shell, enter the following two (2) commands to specify the installation directory:
cpan[1]> o conf mbuildpl_arg "--install_base /home/MyID/perlmods" mbuildpl_arg [--install_base /home/MyID/perlmods] Please use 'o conf commit' to make the config permanent! cpan[2]> o conf makepl_arg "PREFIX=/home/MyID/perlmods" makepl_arg [PREFIX=/home/MyID/perlmods] Please use 'o conf commit' to make the config permanent! cpan[3]>
Note that in the two commands above you will need to replace MyID by your own MyID and change perlmods if you are using a different directory name. If you want to set the above installation path as the default one, you can make the settings above permanent by entering "o conf commit". If this is not done, you will need to reset this value every time you restart CPAN. If you do make the settings permanent, you can always change them later and re-commit as shown above.
6. To install a module (for example, if you want to install "IO::CaptureOutput") enter:
cpan[3]> install IO::CaptureOutput
Respond to any prompts for information that might be requested. When you are finished, enter:
cpan[4]> quit
7. After you have successfully installed a local Perl module, set the PERL5LIB environmental variable to tell Perl where to find the module. For example:
export PERL5LIB=/home/MyID/perlmods:$PERL5LIB
where /home/MyID/perlmods has to be replaced by the path to your installation directory.
You can add this export command to your .bashrc file if you'd like to ensure that this PERL5LIB environment variable is always set when you login and for your non-interactive scripts. If this export line is not added in your .bashrc file, you can add it to your job submission scripts. You can also define a module file where this variable is defined.
How to install Python packages
To check what Python versions are installed:
module spider Python
Make sure the right Python version you need is loaded, for example, Python/3.8.2-GCCcore-8.3.0:
module load Python/3.8.2-GCCcore-8.3.0
Most Python versions have commands such as "python" and "pip", some versions have version concatenate after command, such as "python2.7", "python3", "pip3" etc.. So after Python module is loaded, usually we can check if it is the right command by:
which python which pip which pip3 (Python3 only)
or list the path to confirm the correct python commands, such as:
ls /apps/eb/Python/3.8.2-GCCcore-8.3.0/bin
The above Python binary path can be found by using "ml show" command:
ml show Python/3.8.2-GCCcore-8.3.0
From outputs, you will find the PATH environment variable pre-appended with the Python binary path, i.e., /apps/eb/Python/3.8.2-GCCcore-8.3.0/bin.
How to install Python package tarball in Sapelo2 home directory
There are several options you can take to install a Python package in your Sapelo2 home directory /home/MyID; MyID should be replaced by your UGA MyID.
Common steps are listed as below:
1. Start an interactive session with "qlogin" command from Sapelo2 login node:
qlogin
2. Suppose you have downloaded a Python package tarball file, e.g., myPackage-1.0.tar.gz, in the source folder (src) in your home directory with a full path as:
/home/MyID/src/myPackage-1.0.tar.gz
Change directory to /home/MyID/src/:
cd ~/src
Uncompress and untar the tarball file using "tar xzvf" command:
tar xzvf myPackage-1.0.tar.gz
3. Once the tarball is untarred, you will have a package source folder in your current directory, e.g., myPackage-1.0/. You need to change directory into it (full path is /home/MyID/src/myPackage-1.0):
cd ./myPackage-1.0
4. Decide which version of Python you want myPackage-1.0 to be installed with. In this example we use Python/3.8.2-GCCcore-8.3.0:
module load Python/3.8.2-GCCcore-8.3.0
5. Install package using "python setup.py install" command
- Option 1: using "--user" option. The package will be installed into a default installation location, i.e., /home/MyID/.local/lib/python3.8/site-packages/:
python setup.py install --user
- Option 2 (recommended): using "--user" option with PYTHONUSERBASE environment variable defined and exported. The package will be installed into a location you specified, e.g., /home/MyID/python/3.8.2/lib/python3.8/site-packages/:
mkdir -p /home/MyID/python/3.8.2 export PYTHONUSERBASE=/home/MyID/python/3.8.2 python setup.py install --user
- option 3: using '--prefix' option with PYTHONPATH environment variable defined and exported. The package will be installed into a location you specified, e.g., /home/MyID/python/3.8.2/lib/python3.8/site-packages:
mkdir -p /home/MyID/python/3.8.2/lib/python3.8/site-packages export PYTHONPATH=/home/MyID/python/3.8.2/lib/python3.8/site-packages python setup.py install --prefix="/home/MyID/python/3.8.2"
6. Make sure to export the above library path when the package is needed. This can be included in your module file as well. For example:
export PYTHONPATH=/home/MyID/.local/lib/python3.8/site-packages/:$PYTHONPATH
or
export PYTHONPATH=/home/MyID/python/3.8.2/lib/python3.8/site-packages:$PYTHONPATH
How to install Python package using pip/pip3 in Sapelo2 home directory
The pip/pip3 is a package management system used to install and manage Python2/Python3 packages, such as those found in the Python Package Index (PyPI). PyPI is a software repository for the Python language. It helps to find and install software developed and shared by the Python community. In this example, let us try scikit-learn-0.23.2, which is the most recent version of scikit-learn package as of 10-26-2020, at PyPI.
Common steps are listed as below:
1. Start an interactive session with "qlogin" command from Sapelo2 login node:
qlogin
2. Decide which version of Python you want myPackage-1.0 to be installed with. In this example we load Python/3.8.2-GCCcore-8.3.0, then use "which" command to identify if pip3 is in place:
module load Python/3.8.2-GCCcore-8.3.0 which pip3
"which" command shows pip3 is from 3.8.2-GCCcore-8.3.0 module:
/apps/eb/Python/3.8.2-GCCcore-8.3.0/bin/pip3
3. Install package using "pip3 install" command
- Option 1: using "--user" option. The package will be installed into a default installation location, i.e., /home/MyID/.local/lib/python3.8/site-packages/:
pip3 install --user scikit-learn
"pip3 install" command will output and show you that scikit-learn-0.19.2 is successfully downloaded and installed. Then you can use "ls" command to show the package is indeed installed in .local/lib/python3.8/site-packages/, in your home directory:
ls ~/.local/lib/python3.8/site-packages/ scikit_learn-0.23.2.dist-info scikit_learn.libs
- Option 2 (recommended): using "--user" option with PYTHONUSERBASE environment variable defined and exported. The package will be installed into a location you specified, e.g., /home/MyID/python/3.8.2/lib/python3.8/site-packages/:
mkdir -p /home/MyID/python/3.8.2 export PYTHONUSERBASE=/home/MyID/python/3.8.2 pip3 install --user scikit-learn
"pip3 install" command will output and show you that scikit-learn-0.23.2 is successfully downloaded and installed. Then you can use "ls" command to show the package is indeed installed in /python/3.8.2/lib/python3.8/site-packages/, in your home directory:
ls ~/python/3.8.2/lib/python3.8/site-packages/ scikit_learn-0.23.2.dist-info scikit_learn.libs
4. Make sure to export the above library path when the package is needed. This can be included in your module file as well. For example:
export PYTHONPATH=/home/MyID/.local/lib/python3.8/site-packages/:$PYTHONPATH
or
export PYTHONPATH=/home/MyID/python/3.8.2/lib/python3.8/site-packages:$PYTHONPATH
How to install Conda packages
To install a Conda package or application as a Conda virtual environment in user's Sapelo2 home directory, please use the option "-p" to define Conda environment installation path, e.g., /home/MyID/busco_conda.
For example, to install busco v3.0.2 Conda virtual environment at /home/MyID/busco_conda:
moudle load Anaconda3/5.0.1 conda install -p "/home/MyID/busco_conda" busco=3.0.2 -c bioconda
where bioconda is the channel where busco v3.0.2 source is downloaded. If the busco version 3.0.2 is not specified, i.e.,
moudle load Anaconda3/5.0.1 conda install -p "/home/MyID/busco_conda" busco -c bioconda
the most recent version of busco will be downloaded and installed.
How to install R packages
If you wish to install an R package in your home directory, first decide the directory where you will install it. For example, to install it in ~/Rlibs, first create this dir if it does not exist yet:
mkdir ~/Rlibs
Then, create a file called ~/.Renviron containing the following line:
R_LIBS_USER=/path/to/Rlibs
replacing /path/to/Rlibs with the path that you want to use. For example, to /home/MyID/Rlibs, where MyID needs to be replaced by your UGA MyID.
Start an interactive session and load the module for the version of R you want to use (e.g. R/3.4.4)
qlogin module load R/4.0.0-foss-2019b
If you download the R package (e.g. brocolors_0.1.tar.gz) to your home dir, you can install it with R CMD INSTALL at the command line, but use the flag --library=/path/to/Rlibs, as follows:
R CMD INSTALL --library=/path/to/Rlibs brocolors_0.1.tar.gz
If you install the package within R using install.packages() or devtools::install(), you just need the ~/.Renviron file; you don’t need to do anything different with the install() command. devtools will use the path defined by the R_LIBS variable.
How to install Java applications
Most third-party Java applications are distributed as pre-compiled binaries (jar file), which users can download into their own home directories. For example, to install picard 2.4.1 in your home directory.
1. Create a directory where you want to install the java application (e.g. picard). For example
mkdir -p ~apps/picard
2. Download the package (e.g. picard-tools-2.4.1.zip) from their website, put into your chosen directory (e.g. ~apps/picard) and extract the file. For example
cd ~apps/picard unzip picard-tools-2.4.1.zip
This will create a directory called picard-tools-2.4.1 that contains picard.jar. You can rename this dir e.g.
cd ~apps/picard mv picard-tools-2.4.1 2.4.1
3. To run this application, load a java module and invoke this application in your job submission script. For example
module load Java/1.8.0_144 java -Xmx20g -classpath "/home/MyID/apps/picard/2.4.1" -jar /home/MyID/apps/picard/2.4.1/picard.jar [options]
where MyID needs to be replaced by your own MyID.
How to build complex applications
Many applications use a configure step to check on system libraries and path to dependencies in order to create Makefiles. The Makefiles are then used to build and install the application. To illustrate how this process is typically set up, we will describe how to install the GNU Scientific Libraries (GSL) v. 2.6 using the GNU 8.3.0 compilers.
1. Create a directory to use for building the application, e.g. ~/src/gsl
mkdir -p ~/src/gsl
2. Download the source tarball into your chosen directory, e.g.
cd ~/src/gsl wget http://mirror.sbb.rs/gnu/gsl/gsl-2.6.tar.gz
3. Start an interactive session
qlogin
4. Load the module for the compiler suite that you want to use, e.g. GCC 8.3.0:
module load foss/2019b
5. Change into your working directory and check the tarball with
cd ~/src/gsl tar ztvf gsl-2.6.tar.gz
and extract the files with
tar zxvf gsl-2.6.tar.gz
6. Change into the extracted directory
cd gsl-2.6
7. Create a subdirectory to build the application
mkdir build_gcc830 cd build_gcc830
8. Check all the configure options with
../configure --help
9. Configure the application with e.g.
../configure --prefix=/home/MyID/apps/gsl/2.6/gcc830
where /home/MyID/apps/gsl/2.6/gcc830 should be replaced by the directory where you want to install GSL.
You can capture the standard error and standard output of the configure step into a file, to help troubleshoot the step if it encounters any issues. This can be done with
../configure --prefix=/home/MyID/apps/gsl/2.6/gcc830 2>&1 | tee my.config.log
10. If the configure step worked, you can build the application with
make 2>&1 | tee my.make.log
11. Some applications, including GSL, provide some tests that can be run after the build step to ensure the application was built correctly. For example, for GSL you can run
make check 2>&1 | tee my.check.log
12. Install the application with
make install 2>&1 | tee my.install.log
How to link dependencies
Some applications need third party program, library, header files to build. There are various ways to introduce these to the build command.
Usually the manual/README of the application would elaborate more about the needed configuration variables. To explore these configuration variables from command is to issue --help options following the main build command.
Such as cmake build,
cmake --help
Or if it is a configuration build,
./configure -h
To set needed components in build environment, this could simply be done by loading modules prior to build.
module load CMake/3.15.3-GCCcore-8.3.0 zlib/1.2.11-GCCcore-8.3.0 cmake -DCMAKE_INSTALL_PREFIX:PATH=/home/MyID/app/diamond/1.0 ...[skipped] make install
Line 1: load CMake module and zlib module. Line 2: set prefix, there are other variable could be defined here.
Or set the variables explicitly by export command
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/apps/eb/GSL/2.6-GCC-8.3.0/lib export CFLAGS="-I/apps/eb/GSL/2.6-GCC-8.3.0/include" export LDFLAGS="-L/apps/eb/GSL/2.6-GCC-8.3.0/lib" export LIBS="-lgsl" python setup.py build_ext --inplace
Line 1: specify the location of gsl shared dynamic lib, pass this to shared library loader at runtime
Line 2: specify the location of gsl c header file, pass this to GNU C compiler at compilation time
Line 3 and 4: specify the location of gsl shared dynamic lib and the lib, pass this to GNU linker at linkage time
Another way is to define variables at configuration line.
./configure --prefix=/home/MyID/app/diamond/1.0 --with-jemalloc=/apps/eb/jemalloc/5.2.1-GCCcore-8.3.0/lib
How to download Singularity images
Singularity container can be searched and downloaded from Singularity Container Library. For example, to pull Trinity v2.9.1 Singularity container:
singularity pull --arch amd64 library://colinsauze/default/trinity:v2.9.1
You can also use pull command to pull and build a Singularity container from Docker Hub. For example, to pull Trinity latest version Singularity container from Docker Hub:
singularity pull docker://trinityrnaseq/trinityrnaseq:latest
Detailed instructions on how to build Singularity container can be found at Singularity Build a Container