Installing Applications on Sapelo2: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
No edit summary
 
(49 intermediate revisions by 3 users not shown)
Line 6: Line 6:
We introduce here some guidance on how to install applications and libraries on Sapelo2.  
We introduce here some guidance on how to install applications and libraries on Sapelo2.  


In general, users can build applications under their [https://wiki.gacrc.uga.edu/wiki/Disk_Storage#Home_file_system home] directory or other space owned by the users.
In general, users can build applications under their [https://wiki.gacrc.uga.edu/wiki/Disk_Storage#Home_file_system home] directory or other space owned by the users (/home/MyID/).


If an application will be used by several members of a group, the application can be installed in the user's group [https://wiki.gacrc.uga.edu/wiki/Disk_Storage#Work_file_system work] space.
If an application will be used by several members of a group, the application can be installed in the user's group [https://wiki.gacrc.uga.edu/wiki/Disk_Storage#Work_file_system work] space (/work/abclab/).


GACRC team takes requests to install applications at a central place (/apps) if the application satisfies the following conditions:   
GACRC team takes requests to install applications at a central place (/apps) if the application satisfies the following conditions:   
Line 24: Line 24:
==General Guidlines==
==General Guidlines==


===DO NOT install applications at login node===
====<span style="color:darkred"><big>IMPORTANT: Please DO NOT install applications on the login node. Instead, please install applications from an interactive session started with the interact command.</big></span>====
Sapelo2 login node (sapelo2.gacrc.uga.edu) has limited memory. Most of the time software installation fails due to insufficient memory. More importantly, the process of installing applications on the login node can degrade the performance of the cluster for everyone. So '''please do not install any applications on the cluster while you are on the login node'''. We strongly advise that you build or install applications from an interactive session that you can start with '''qlogin''' command from the login node:
 
 
Sapelo2 login node (sapelo2.gacrc.uga.edu) has limited memory. Most of the time software installation fails due to insufficient memory. More importantly, the process of installing applications on the login node can degrade the performance of the cluster for everyone. So '''please do not install any applications on the cluster while you are on the login node'''. We strongly advise that you build or install applications from an interactive session that you can start with '''[[Running Jobs on Sapelo2#How to open an interactive session|interact]]''' command from the login node:


<pre class="gcommand">
<pre class="gcommand">
qlogin
interact
</pre>
</pre>


All of the following installation examples were executed in a '''qlogin''' environment.
All of the following installation examples were executed in a '''interact''' environment.


===How to check if an application is installed ===
===How to check if an application is installed ===
To find if an application, e.g. Trinity, is already installed on Sapelo2, use the following command (the application name is NOT case sensitive):
To find if an application, e.g. Trinity, is already installed on Sapelo2, use the following command (the application name is NOT case sensitive):
<pre class="gcommand">
<pre class="gcommand">
module spider trinity
</pre>or<pre class="gcommand">
module avail trinity
module avail trinity
</pre>  
</pre>To see a description of the module (the application name is case sensitive)
or
<pre class="gcommand">
module spider trinity
</pre>
 
To see a description of the module (the application name is case sensitive)
<pre class="gcommand">
<pre class="gcommand">
module whatis Trinity
module whatis Trinity
Line 60: Line 58:
'''Perl'''
'''Perl'''


To check if a perl library is installed, e.g. DBI, first load the Perl package of interest, for example Perl/5.30.0-GCCcore-8.3.0:
To check if a perl library is installed, e.g. DBI, first load the Perl package of interest, for example Perl/5.34.1-GCCcore-11.3.0:


<pre class="gcommand">
<pre class="gcommand">
module load Perl/5.30.0-GCCcore-8.3.0
module load Perl/5.34.1-GCCcore-11.3.0
perl -MDBI -e 'print "OK\n"'
perl -MDBI -e 'print "OK\n"'
</pre>
</pre>
Line 71: Line 69:
'''Python'''
'''Python'''


To check if a Python library is installed, e.g. numpy, first load the Python package of interest, then run "pip show" or "pip list" commands. The Linux command "which" is to identify if pip is in place. In the following example, we use Python/2.7.16-GCCcore-8.3.0:
To check if a Python library is installed, e.g. numpy, first load the Python package of interest, then run "pip show" or "pip list" commands. The Linux command "which" is to identify if pip is in place. In the following example, we use Python/2.7.18-GCCcore-11.3.0:


<pre class="gcommand">
<pre class="gcommand">
module load Python/2.7.16-GCCcore-8.3.0
module load Python/2.7.18-GCCcore-11.3.0
which pip
which pip
pip show numpy
pip show numpy
Line 82: Line 80:


<pre class="gcommand">
<pre class="gcommand">
module load Python/2.7.16-GCCcore-8.3.0
module load Python/2.7.18-GCCcore-11.3.0
pip list
pip list
</pre>
</pre>
Line 88: Line 86:
To generate an alphabetical list of the libraries installed under a Python version:
To generate an alphabetical list of the libraries installed under a Python version:


<pre class="gcommand">
<pre class="gcommand">
module load Python/2.7.16-GCCcore-8.3.0
module load Python/2.7.18-GCCcore-11.3.0
pip freeze | sort
pip freeze| sort | awk -F'@' '{print $1}'
</pre>
</pre>


Line 96: Line 94:


<pre class="gcommand">
<pre class="gcommand">
module load Python/3.8.2-GCCcore-8.3.0
module load Python/3.10.4-GCCcore-11.3.0
which pip3
which pip3
</pre>
</pre>
Line 103: Line 101:


<pre class="gcommand">
<pre class="gcommand">
module load Python/3.8.2-GCCcore-8.3.0
module load Python/3.10.4-GCCcore-11.3.0
pip3 show numpy
pip3 show numpy
</pre>
</pre>
Line 110: Line 108:


<pre class="gcommand">
<pre class="gcommand">
module load Python/3.8.2-GCCcore-8.3.0
module load Python/3.10.4-GCCcore-11.3.0
pip3 list
pip3 list
</pre>
</pre>
Line 116: Line 114:
To generate an alphabetical list of the libraries installed under a Python version:
To generate an alphabetical list of the libraries installed under a Python version:


<pre class="gcommand">
<pre class="gcommand">
module load Python/3.8.2-GCCcore-8.3.0
module load Python/3.8.2-GCCcore-8.3.0
pip3 freeze | sort
pip3 freeze| sort | awk -F'@' '{print $1}'
</pre>
</pre>


If the library is not installed, it will not be shown by pip or pip3 commands. Otherwise, pip or pip3 will give you information about libraries' version and installation location.  
If the library is not installed, it will not be shown by pip or pip3 commands. Otherwise, pip or pip3 will give you information about libraries' version and installation location.  


'''Note:''' Some Python libraries are installed outside of the Python default location and are provided as a separate module file that needs to be loaded separately. To check if a python library is installed as a separate module, please use the '''module spider''' command. For example, to check if matplotlib is installed as a separate module:
'''Note:''' Some Python libraries are installed outside of the Python default location and are provided as a separate module file that needs to be loaded separately. To check if a python library is installed as a separate module, please use the '''module spider''' command.
 
For example, to check if matplotlib is installed as a separate module:


<pre  class="gcommand">
<pre  class="gcommand">
Line 129: Line 129:
</pre>
</pre>


This command will return all the matplotlib versions currently installed on the cluster, along with the version of Python that it uses. For example, the module named matplotlib/3.1.1-foss-2019b-Python-3.8.2 provides matplotlib version 3.1.1 for Python 3.8.2 and it uses the foss-2019b toolchain. To use this version of matplotlib, please load the module matplotlib/3.1.1-foss-2019b-Python-3.8.2. Similarly, you can load other module files that provide other Python libraries.  
This command will return all the matplotlib versions currently installed on the cluster. For example, the module named matplotlib/3.5.2-foss-2022a provides matplotlib version 3.5.2 for Python 3.10.4 and it uses the foss-2022a toolchain.  
 
To use this version of matplotlib, please load the module matplotlib/3.5.2-foss-2022a. Similarly, you can load other module files that provide other Python libraries, for example, SciPy-bundle module provides numpy, spicy, and pandas, etc..  


'''R'''
'''R'''


To check if a R library is installed, e.g. xtable, first load the R version of interest, for example R/4.0.0-foss-2019b:
To check if a R library is installed, e.g. xtable, first load the R version of interest, for example R/4.3.1-foss-2022a:


<pre class="gcommand">
<pre class="gcommand">
qlogin
interact
module load R/4.0.0-foss-2019b
module load R/4.3.1-foss-2022a
R
R
require("xtable")
require("xtable")
exit
 
Ctrl+d
</pre>
</pre>


Line 150: Line 153:
===How to structure directories===
===How to structure directories===


Common practice is to set up the following three directories using "mkdir" command in the user's home directory /home/MyID:
Common practice is to set up the following three directories using "mkdir" command in the user's home directory /home/MyID/:


*'''apps''': directory where the applications will be installed  
*'''apps''': directory where the applications will be installed  
Line 186: Line 189:
Perl scripts and Java jar files can be downloaded into the users' home directory and run from there.  
Perl scripts and Java jar files can be downloaded into the users' home directory and run from there.  


If the application is distributed as pre-compiled binaries, check if binaries are available for the Linux OS that our cluster run (Sapelo2 runs CentOS 7.8). Binaries compiled for Window and for Mac OSX cannot be run on Sapelo2. Also, binaries generated in other Linux distributions (e.g. Ubuntu, Debian, etc) will in general not work on Sapelo2.  
If the application is distributed as pre-compiled binaries, check if binaries are available for the Linux OS that our cluster run (Sapelo2 runs Rocky Linux 8). Binaries compiled for Window and for Mac OSX cannot be run on Sapelo2. Also, binaries generated in other Linux distributions (e.g. Ubuntu, Debian, etc) will in general not work on Sapelo2.  


If pre-compiled binaries are not available for the OS on Sapelo2 (CentOS 7.8), then you can compile the code yourself, if the source code is available.  
If pre-compiled binaries are not available for the OS on Sapelo2 (Rocky Linux 8), then you can compile the code yourself, if the source code is available.  


Another option for running binaries compatible with other Linux distributions is to create a Singularity container and run it on Sapelo2 (more information below).
Another option for running binaries compatible with other Linux distributions is to create a Singularity container and run it on Sapelo2 (more information below).
Line 216: Line 219:
3. Start an interactive session with
3. Start an interactive session with
<pre class="gcommand">
<pre class="gcommand">
qlogin
interact -c 4 --mem 8gb
</pre>
</pre>


Line 226: Line 229:
</pre>
</pre>


The g++ command that is on user's default path is version 4.8.5. If you wish/need to use a different GNU compiler version, or a different compiler (e.g. Intel), then first load the corresponding module and then compile the code. For example, to use g++ 8.3.0:
The g++ command that is on user's default path is version '''8.5.0:'''
<pre class="gcommand">
which g++
/usr/bin/g++
 
which gcc
/usr/bin/gcc
 
g++ --version
g++ (GCC) 8.5.0 20210514 (Red Hat 8.5.0-18)
 
gcc --version
gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-18)
</pre>
 
If you wish/need to use a different GNU compiler version, or a different compiler (e.g. Intel), then first load the corresponding module and then compile the code. For example, to use g++ '''11.3.0''':


<pre class="gcommand">
<pre class="gcommand">
cd ~/apps/GERUDsim3/GERUDsim3/Source_files
cd ~/apps/GERUDsim3/GERUDsim3/Source_files


module load foss/2019b
module load foss/2022a


g++ GerudSim3.cpp -o GerudSim3
g++ GerudSim3.cpp -o GerudSim3
Line 251: Line 269:


<pre class="gcommand">
<pre class="gcommand">
export PATH=/home/MYID/bin/:$PATH
export PATH=/home/MyID/bin/:$PATH
</pre>
</pre>


where MYID needs to be replaced by your UGA MyID. With this, any executable you put in ~/bin will be in your PATH and can be invoked without its full path.
where MyID needs to be replaced by your UGA MyID. With this, any executable you put in ~/bin will be in your PATH and can be invoked without its full path.


For more information on compilers available on Sapelo2, please see [[Code Compilation on Sapelo2]].
For more information on compilers available on Sapelo2, please see [[Code Compilation on Sapelo2]].
Line 264: Line 282:
At Sapelo2, we have two types of processors: '''Intel''' and '''AMD'''.
At Sapelo2, we have two types of processors: '''Intel''' and '''AMD'''.


In some C or C++ applications, compiler performance-optimization flags, such as -march=native, may be introduced in the configuration step or in the Makefile. Executables compiled with such options on one type of processor might not run on a different processor type. To enable the compiled application to run on both types of processors, these processor specific compiler optimization options should be removed. One option is to set the flags as  
In some C or C++ applications, compiler performance-optimization flags, such as -march=native, may be introduced in the configuration step or in the Makefile. Executables compiled with such options on one type of processor might not run on a different processor type. To enable the compiled application to run on both types of processors, these processor specific compiler optimization options should be removed. One option is to set the flags as '''-mtune=generic -march=x86-64'''.  
'''-mtune=generic -march=x86-64'''.  


If you want to compile or test your program on different types of processors to ensure that they work there, you could start an interactive session on each of the processor types. For example, to start an interactive session on an AMD EPYC node, use
If you want to compile or test your program on different types of processors to ensure that they work there, you could start an interactive session on each of the processor types. For example, to start an interactive session on an AMD EPYC node, use


<pre class="gcommand">
<pre class="gcommand">
srun --pty  -p batch --constraint=EPYC --mem=2G --nodes=1 --ntasks-per-node=1 --time=12:00:00 --job-name=qlogin --export=TERM /bin/bash
interact --constraint EPYC
</pre>
</pre>


Line 284: Line 301:
1. Start an interactive session with
1. Start an interactive session with
<pre class="gcommand">
<pre class="gcommand">
qlogin
interact
</pre>
</pre>


Line 292: Line 309:
</pre>
</pre>
   
   
3. Load the perl module that you want to use, e.g. Perl/5.30.0-GCCcore-8.3.0
3. Load the perl module that you want to use, e.g. Perl/5.34.1-GCCcore-11.3.0
<pre class="gcommand">
<pre class="gcommand">
module load Perl/5.30.0-GCCcore-8.3.0
module load Perl/5.34.1-GCCcore-11.3.0
</pre>
</pre>


Line 307: Line 324:
cpan
cpan


CPAN.pm requires configuration, but most of it can be done automatically.
cpan shell -- CPAN exploration and modules installation (v2.28)
If you answer 'no' below, you will enter an interactive dialog for each
configuration option instead.
 
Would you like to configure as much as possible automatically? [yes]
 
Autoconfiguration complete.
 
commit: wrote '/home/MyID/.cpan/CPAN/MyConfig.pm'
 
You can re-run configuration any time with 'o conf init' in the CPAN shell
 
cpan shell -- CPAN exploration and modules installation (v2.14)
Enter 'h' for help.
Enter 'h' for help.


cpan[1]>  
cpan[1]>
 
</pre>
</pre>


Note that in the above commit output: 'wrote /home/MyID/.cpan/CPAN/MyConfig.pm', MyID is replaced by your own MyID.
5. From within the CPAN shell, enter the following two (2) commands to specify the installation directory:  
 
5. From within the CPAN shell, enter the following two (2) commands to specify the installation directory:


<pre class="gcommand">
<pre class="gcommand">
cpan[1]> o conf mbuildpl_arg "--install_base /home/MyID/perlmods"
cpan[1]> o conf mbuildpl_arg "--install_base /home/MyID/perlmods"
    mbuildpl_arg      [--install_base /home/MyID/perlmods]
    mbuildpl_arg      [--install_base /home/MyID/perlmods]
Please use 'o conf commit' to make the config permanent!
  Please use 'o conf commit' to make the config permanent!
 
 
cpan[2]> o conf makepl_arg "PREFIX=/home/MyID/perlmods"
cpan[2]> o conf makepl_arg "PREFIX=/home/MyID/perlmods"
     makepl_arg        [PREFIX=/home/MyID/perlmods]
     makepl_arg        [PREFIX=/home/MyID/perlmods]
  Please use 'o conf commit' to make the config permanent!
Please use 'o conf commit' to make the config permanent!
   
   
cpan[3]>
cpan[3]>
Line 359: Line 360:


<pre class="gcommand">
<pre class="gcommand">
export PERL5LIB=/home/MyID/perlmods:$PERL5LIB
export PERL5LIB=/home/MyID/perlmods/lib/perl5/site_perl/5.34.1/:$PERL5LIB
 
perl -MIO::CaptureOutput -e 'print "OK\n"'
OK
</pre>
</pre>


Line 370: Line 374:


===How to install Python packages===
===How to install Python packages===
To check what Python versions are installed:
When installing Python packages it's important to keep them organized based on the project you're working on (as opposed to installing all Python packages you'll ever use into one location).  This is typically done with Python virtual environments (or Conda environments if there's a scientific package you need only available through Conda).  For example, if you were working one project that involved machine learning and another project involving data visualization, you may need different versions of some of the same Python packages and should keep them separate.  A Python virtual environment is just a directory in which Python packages are installed, separate from other unrelated packages.  To create a virtual environment in which to install Python packages in your /home directory, follow these steps below:


<pre  class="gcommand">
==== Create a virtual environment in your /home directory ====
module spider Python
</pre>


Make sure the right Python version you need is loaded, for example, Python/3.8.2-GCCcore-8.3.0:


<pre class="gcommand">
1. Start an interactive job:<syntaxhighlight lang="bash">
module load Python/3.8.2-GCCcore-8.3.0
interact
</syntaxhighlight>2. Search for the Python version you would like to use:<pre class="gcommand">
gacrc-test@d2-13 ~$ module spider Python
</pre>
</pre>


Most Python versions have commands such as "python" and "pip", some versions have version concatenate after command, such as "python2.7", "python3", "pip3" etc.. So after Python module is loaded, usually we can check if it is the right command by:
3. Load the software module for the version of Python you would like to use:


<pre class="gcommand">
<pre class="gcommand">
which python
gacrc-test@d2-13 ~$ module load Python/3.10.4-GCCcore-11.3.0
which pip
which pip3        (Python3 only)
</pre>
</pre>


or list the path to confirm the correct python commands, such as:
4. Create the Python virtual environment in your /home directory.  You may find it beneficial to have a directory in your /home directory for all of your Python virtual environments.


<pre class="gcommand">
<pre class="gcommand">
ls /apps/eb/Python/3.8.2-GCCcore-8.3.0/bin
gacrc-test@d2-13 ~$ python -m venv ~/env/mypyenv
</pre>
</pre>


The above Python binary path can be found by using "ml show" command:
This creates a base virtual environment to work from, in this case called "mypyenv".  You should use more meaningful names for your actual Python environments.  The ~/envs directory in which I would install all of my Python virtual environments in this example doesn't have to exist already when I run the above command.  If it doesn't, the above command would create it.  This gives us a base Python virtual environment to work with:<syntaxhighlight lang="bash">
gacrc-test@d2-13 ~$ ls ~/env/mypyenv/
bin  include  lib  lib64  pyvenv.cfg


gacrc-test@d2-13 ~$ ls ~/env/mypyenv/bin/
Activate.ps1  activate activate.csh  activate.fish  pip  pip3 pip3.10  python  python3  python3.10
</syntaxhighlight>5. Now that the environment is created, we can activate it and install Python packages.  To do so, we need to source the bin/activate file.  This file sets the appropriate environment variables so that any Python packages you install will be installed in your virtual environment.  To source file and thus activate the virtual environment:<syntaxhighlight lang="bash">
gacrc-test@d2-13 ~$ . ~/env/mypyenv/bin/activate
</syntaxhighlight>Note the space in between the period (source command) and the path to the activate file.  Upon running this command, you will see that your command prompt has changed, showing you the name of your active environment:<syntaxhighlight lang="bash">
gacrc-test@d2-13 ~$ . ~/env/mypyenv/bin/activate
(mypyenv) gacrc-test@d2-13 ~$
</syntaxhighlight>Now any Python packages you install will be installed in the "lib" directory of your Python virtual environment (whether you use pip or download the package and install it with its setup.py script) and are usable when your Python virtual environment is active.
6. To deactivate a Python virtual environment:
<pre class="gcommand">
<pre class="gcommand">
ml show Python/3.8.2-GCCcore-8.3.0
deactivate
</pre>
</pre>


From outputs, you will find the PATH environment variable pre-appended with the Python binary path, i.e., /apps/eb/Python/3.8.2-GCCcore-8.3.0/bin.
==== Installing Python packages in your virtual environment using pip ====
 
The first time you ever use pip in your Python virtual environment you should probably upgrade pip.<syntaxhighlight lang="bash">
 
pip install --upgrade pip
====How to install Python package tarball in Sapelo2 home directory====
</syntaxhighlight>Then all you have to do is pip install the package, and it will be installed in your virtual environment:<syntaxhighlight lang="bash">
pip install <package name>
</syntaxhighlight>or using a requirements.txt file:


There are several options you can take to install a Python package in your Sapelo2 home directory /home/MyID; MyID should be replaced by your UGA MyID.
<code>pip install -r requirements.txt</code>
 
Common steps are listed as below:
 
1. Start an interactive session with "qlogin" command from Sapelo2 login node:
 
<pre class="gcommand">
qlogin
</pre>


2. Suppose you have downloaded a Python package tarball file, e.g., myPackage-1.0.tar.gz, in the source folder (src) in your home directory with a full path as:
==== Installing Python packages in your virtual environment from the source code ====
If the Python package that you want to install in your virtual environment is not available through pip for some reason, you can download the source code in your home directory and run the package's setup.py script. As long as your Python virtual environment is active when you do this it will be installed in that virtual environment.  Suppose you have downloaded a Python package tarball file, e.g., myPackage-1.0.tar.gz, in the source folder (src) in your /home directory with the full path being:


<pre class="gcommand">
<pre class="gcommand">
Line 435: Line 443:
</pre>
</pre>
   
   
3. Once the tarball is untarred, you will have a package source folder in your current directory, e.g., myPackage-1.0/. You need to change directory into it (full path is /home/MyID/src/myPackage-1.0):
cd into the new directory:<syntaxhighlight>
cd myPackage-1.0
</syntaxhighlight>Run the setup.py script:<syntaxhighlight lang="bash">
python setup.py --install
</syntaxhighlight>


<pre class="gcommand">
==== Using your Python virtual environment in a job ====
cd ./myPackage-1.0
</pre>
 
4. Decide which version of Python you want myPackage-1.0 to be installed with. In this example we use Python/3.8.2-GCCcore-8.3.0:
 
<pre class="gcommand">
module load Python/3.8.2-GCCcore-8.3.0
</pre>
 
5. Install package using "python setup.py install" command
 
*'''Option 1:''' using "--user" option. The package will be installed into a default installation location, i.e., /home/MyID/'''.local'''/lib/python3.8/site-packages/:
<pre class="gcommand">
python setup.py install --user
</pre>
 
*'''Option 2 (recommended):''' using "--user" option with PYTHONUSERBASE environment variable defined and exported. The package will be installed into a location you specified, e.g., /home/MyID/'''python/3.8.2'''/lib/python3.8/site-packages/:
 
<pre class="gcommand">
mkdir -p /home/MyID/python/3.8.2
export PYTHONUSERBASE=/home/MyID/python/3.8.2
python setup.py install --user
</pre>
 
*'''option 3:''' using '--prefix' option with PYTHONPATH environment variable defined and exported. The package will be installed into a location you specified, e.g., /home/MyID/'''python/3.8.2'''/lib/python3.8/site-packages:
 
<pre  class="gcommand">
mkdir -p /home/MyID/python/3.8.2/lib/python3.8/site-packages
export PYTHONPATH=/home/MyID/python/3.8.2/lib/python3.8/site-packages
python setup.py install --prefix="/home/MyID/python/3.8.2"
</pre>
 
6. Make sure to export the above library path when the package is needed. This can be included in your module file as well. For example:
 
<pre  class="gcommand">
export PYTHONPATH=/home/MyID/.local/lib/python3.8/site-packages/:$PYTHONPATH
</pre>
 
or
 
<pre  class="gcommand">
export PYTHONPATH=/home/MyID/python/3.8.2/lib/python3.8/site-packages:$PYTHONPATH
</pre>
 
====How to install Python package using pip/pip3 in Sapelo2 home directory====
 
The pip/pip3 is a package management system used to install and manage Python2/Python3 packages, such as those found in the Python Package Index (PyPI). PyPI is a software repository for the Python language. It helps to find and install software developed and shared by the Python community. In this example, let us try scikit-learn-0.23.2, which is the most recent version of scikit-learn package as of 10-26-2020, at PyPI.
 
Common steps are listed as below:
 
1. Start an interactive session with '''qlogin''' command from Sapelo2 login node:
 
<pre class="gcommand">
qlogin
</pre>
 
2. Decide which version of Python you want myPackage-1.0 to be installed with. In this example we load Python/3.8.2-GCCcore-8.3.0, then use "which" command to identify if pip3 is in place:
 
<pre class="gcommand">
module load Python/3.8.2-GCCcore-8.3.0
which pip3
</pre>
 
"which" command shows pip3 is from 3.8.2-GCCcore-8.3.0 module:
 
<pre class="gcommand">
/apps/eb/Python/3.8.2-GCCcore-8.3.0/bin/pip3
</pre>
 
3. Install package using "pip3 install" command
 
*'''Option 1:''' using "--user" option. The package will be installed into a default installation location, i.e., /home/MyID/'''.local'''/lib/python3.8/site-packages/:
<pre class="gcommand">
pip3 install --user scikit-learn
</pre>
 
"pip3 install" command will output and show you that scikit-learn-0.23.2 is successfully downloaded and installed. Then you can use "ls" command to show the package is indeed installed in '''.local'''/lib/python3.8/site-packages/, in your home directory:
 
<pre class="gcommand">
ls ~/.local/lib/python3.8/site-packages/
scikit_learn-0.23.2.dist-info  scikit_learn.libs
</pre>
 
*'''Option 2 (recommended):''' using "--user" option with PYTHONUSERBASE environment variable defined and exported. The package will be installed into a location you specified, e.g., /home/MyID/'''python/3.8.2'''/lib/python3.8/site-packages/:
 
<pre class="gcommand">
mkdir -p /home/MyID/python/3.8.2
export PYTHONUSERBASE=/home/MyID/python/3.8.2
pip3 install --user scikit-learn
</pre>
 
"pip3 install" command will output and show you that scikit-learn-0.23.2 is successfully downloaded and installed. Then you can use "ls" command to show the package is indeed installed in '''/python/3.8.2'''/lib/python3.8/site-packages/, in your home directory:
 
<pre class="gcommand">
ls ~/python/3.8.2/lib/python3.8/site-packages/
scikit_learn-0.23.2.dist-info  scikit_learn.libs
</pre>
 
4. Make sure to export the above library path when the package is needed. This can be included in your module file as well. For example:
 
<pre  class="gcommand">
export PYTHONPATH=/home/MyID/.local/lib/python3.8/site-packages/:$PYTHONPATH
</pre>
 
or
 
<pre  class="gcommand">
export PYTHONPATH=/home/MyID/python/3.8.2/lib/python3.8/site-packages:$PYTHONPATH
</pre>


Once you have created your Python virtual environment and installed the packages you need, you can use it in a job on the cluster by loading the Python module you used to create the environment and then activating the environment:<syntaxhighlight lang="bash">
module load Python/3.10.4-GCCcore-11.3.0
. ~/env/mypyenv/bin/activate
</syntaxhighlight>
----
----
[[#top|Back to Top]]
[[#top|Back to Top]]
Line 556: Line 461:
To install a software package as a conda environment in user's Sapelo2 home directory, please use the option "-p" to define your environment installation path, e.g., /home/MyID/busco_conda.
To install a software package as a conda environment in user's Sapelo2 home directory, please use the option "-p" to define your environment installation path, e.g., /home/MyID/busco_conda.


For example, to install busco v3.0.2 conda environment at /home/MyID/busco_conda:  
For example, to install busco v5.7.1 conda environment at /home/MyID/busco_conda:  
   
   
<pre class="gcommand">
<pre class="gcommand">
module load Miniconda3/4.7.10
module load Miniforge3/24.7.1-0
conda create -p /home/MyID/busco_conda -c bioconda  busco=3.0.2
conda create -p /home/MyID/busco_conda -c bioconda  busco=5.7.1
</pre>
</pre>


where '''bioconda''' is the channel where busco v3.0.2 source is downloaded. Other common conda channels are, for example, '''conda-forge''', '''r''', '''defaults''', and '''qiime2''', etc.. Please refer to [https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/channels.html# Conda channels]
where '''bioconda''' is the channel where busco v5.7.1 source is downloaded. Other common conda channels are, for example, '''conda-forge''', '''r''', '''defaults''', and '''qiime2''', etc.. Please refer to [https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/channels.html# Conda channels]


If the busco version 3.0.2 is not specified, the most recent version of busco will be downloaded and installed, i.e.,
If the busco version 5.7.1 is not specified, the most recent version of busco will be downloaded and installed, i.e.,


<pre class="gcommand">
<pre class="gcommand">
module load Miniconda3/4.7.10
module load Miniforge3/24.7.1-0
conda create -p /home/zhuofei/busco_conda -c bioconda busco
conda create -p /home/MyID/busco_conda -c bioconda busco
</pre>
</pre>


To list conda environments you installed:
To list conda environments you installed:


<pre class="gcommand">
<pre class="gcommand">
module load Miniconda3/4.7.10
module load Miniforge3/24.7.1-0
conda env list
conda env list
</pre>
</pre>
Line 581: Line 486:
To activate your conda environment, you need to give your environment path to '''source activate''' command:
To activate your conda environment, you need to give your environment path to '''source activate''' command:


<pre class="gcommand">
<pre class="gcommand">
module load Miniconda3/4.7.10
module load Miniforge3/24.7.1-0
source activate /home/MyID/busco_conda/
source activate /home/MyID/busco_conda
</pre>
</pre>


Once a conda environment is activated, you can list packages installed inside, with their names, versions, and downloading channels. For example, to list packages installed in busco v3.0.2 conda environment:
Once a conda environment is activated, you can list packages installed inside, with their names, versions, and downloading channels. For example, to list packages installed in busco v5.7.1 conda environment:


<pre class="gcommand">
<pre class="gcommand">
module load Miniconda3/4.7.10
module load Miniforge3/24.7.1-0
source activate /home/MyID/busco_conda/
source activate /home/MyID/busco_conda
conda list
conda list
</pre>
</pre>


You can add or install new packages to an existing environment. For example, to install scipy v1.5.0 package to busco v3.0.2 conda environment:
You can add or install new packages to an existing environment. For example, to install scipy v1.14.1 package to busco v5.7.1 conda environment:


<pre class="gcommand">
<pre class="gcommand">
module load Miniconda3/4.7.10
module load Miniforge3/24.7.1-0
source activate /home/MyID/busco_conda/
source activate /home/MyID/busco_conda
conda install scipy=1.5.0
conda install scipy=1.14.1
</pre>
</pre>
   
   
Likewise, if the scipy version 1.5.0 is not specified, the most recent version of scipy will be downloaded and installed for you.
Likewise, if the scipy version 1.14.1 is not specified, the most recent version of scipy will be downloaded and installed for you.


To remove an installed package, e.g., the scipy package, from your conda environment:
To remove an installed package, e.g., the scipy package, from your conda environment:


<pre class="gcommand">
<pre class="gcommand">
module load Miniconda3/4.7.10
module load Miniforge3/24.7.1-0
source activate /home/MyID/busco_conda/
source activate /home/MyID/busco_conda
conda remove scipy
conda remove scipy
</pre>
</pre>


To deactivate a conda environment:
To deactivate a conda environment that is currently activated:


<pre class="gcommand">
<pre class="gcommand">
module load Miniconda3/4.7.10
conda deactivate
conda deactivate /home/MyID/busco_conda/
</pre>
</pre>


Once a conda environment is deactivated, its conda layer will be removed from your Linux shell environment.
Once a conda environment is deactivated, its conda layer will be removed from your Linux shell environment.
If you would like to use your conda environment in a Jupyter Notebook, please see [[Using a Conda environment in Jupyter]].


----
----
Line 633: Line 539:


Then, create a file called '''~/.Renviron''' containing the following line:
Then, create a file called '''~/.Renviron''' containing the following line:
<pre class="gscript">
 
<pre class="gcommand">
R_LIBS_USER=/path/to/Rlibs
R_LIBS_USER=/path/to/Rlibs
</pre>
</pre>
replacing /path/to/Rlibs with the path that you want to use. For example, to /home/MyID/Rlibs, where MyID needs to be replaced by your UGA MyID.


Start an interactive session and load the module for the version of R you want to use (e.g. R/4.0.0-foss-2019b)
Please replace /path/to/Rlibs with the path that you want to use. For example, to /home/MyID/Rlibs, where MyID needs to be replaced by your UGA MyID.
 
Start an interactive session and load the module for the version of R you want to use (e.g. R/4.3.1-foss-2022a)
<pre class="gcommand">
<pre class="gcommand">
qlogin
interact
module load R/4.0.0-foss-2019b
module load R/4.3.1-foss-2022a
</pre>
</pre>


====Install R package using R command line====
====Install R package using R command line====
If you download the R package tarball (e.g. brocolors_0.1.tar.gz) to your home dir, you can install it with R CMD INSTALL at the command line, but use the flag --library=/path/to/Rlibs, as follows:
If you download the R package tarball (e.g. brocolors_0.1.tar.gz) to your home dir, you can install it with R CMD INSTALL at the command line, but use the flag '''--library=/path/to/Rlibs''', as follows:


<pre class="gcommand">
<pre class="gcommand">
R CMD INSTALL --library=/path/to/Rlibs brocolors_0.1.tar.gz
R CMD INSTALL --library=/path/to/Rlibs brocolors_0.1.tar.gz
</pre>
</pre>
Note that the path for --library flag should be the same path contained in '''~/.Renviron''' as described above.


====Install R package in an interactive R session====
====Install R package in an interactive R session====


You can install a package in an interactive R session using install.packages(). To use this method, you need to create '''~/.Renviron''' file as described above. For example, to open an interactive R session to install ggplot2 package into the path given in ~/.Renviron:
You can install a package in an interactive R session using install.packages(). To use this method, you need to create the '''~/.Renviron''' file as described above. For example, to open an interactive R session to install ggplot2 package into the path given in ~/.Renviron:


<pre class="gcommand">
<pre class="gcommand">
> install.packages('ggplots2')
$ R
</pre>
 
R version 4.3.1 (2023-06-16) -- "Beagle Scouts"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
 
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.


You will be prompted to enter your selection of a CRAN mirror site for downloading package source. Usually we select #75: USA (TX 1) [https]. If one site does not work well, you could try others.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.


Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.


Another package installation method is to use devtools::install(). R_LIBS variable.
> install.packages('ggplot2')
</pre>


You will be prompted to enter your selection of a CRAN mirror site for downloading package source. Usually we select site '''72: USA (OH) [https]'''. If one site does not work well for you, you could try others.
----
----
[[#top|Back to Top]]
[[#top|Back to Top]]
Line 673: Line 598:
1. Create a directory where you want to install the java application (e.g. picard). For example
1. Create a directory where you want to install the java application (e.g. picard). For example
<pre class="gcommand">
<pre class="gcommand">
mkdir -p ~apps/picard
mkdir -p ~/apps/picard
</pre>
</pre>


2. Download the package (e.g. picard-tools-2.4.1.zip) from their website, put into your chosen directory (e.g. ~apps/picard) and extract the file. For example
2. Download the package (e.g. picard-tools-2.4.1.zip) from their website, put into your chosen directory (e.g. ~/apps/picard) and extract the file. For example
<pre class="gcommand">
<pre class="gcommand">
cd ~apps/picard
cd ~/apps/picard


unzip picard-tools-2.4.1.zip
unzip picard-tools-2.4.1.zip
Line 685: Line 610:
This will create a directory called picard-tools-2.4.1 that contains picard.jar. You can rename this dir e.g.   
This will create a directory called picard-tools-2.4.1 that contains picard.jar. You can rename this dir e.g.   
<pre class="gcommand">
<pre class="gcommand">
cd ~apps/picard
cd ~/apps/picard


mv picard-tools-2.4.1  2.4.1
mv picard-tools-2.4.1  2.4.1
Line 692: Line 617:
3. To run this application, load a java module and invoke this application in your job submission script. For example
3. To run this application, load a java module and invoke this application in your job submission script. For example
<pre class="gcommand">
<pre class="gcommand">
module load Java/1.8.0_144
module load Java/15.0.1


java -Xmx20g -classpath "/home/MyID/apps/picard/2.4.1" -jar /home/MyID/apps/picard/2.4.1/picard.jar [options]
java -Xmx20g -classpath "/home/MyID/apps/picard/2.4.1" -jar /home/MyID/apps/picard/2.4.1/picard.jar [options]
</pre>
</pre>


Line 704: Line 629:
===How to build complex applications===
===How to build complex applications===


Many applications use a ''configure'' step to check on system libraries and path to dependencies in order to create Makefiles. The Makefiles are then used to build and install the application. To illustrate how this process is typically set up, we will describe how to install the GNU Scientific Libraries (GSL) v2.6 using the GNU 8.3.0 compilers.
Many applications use a ''configure'' step to check on system libraries and path to dependencies in order to create Makefiles. The Makefiles are then used to build and install the application. To illustrate how this process is typically set up, we will describe how to install the GNU Scientific Libraries (GSL) v2.6 using the GNU 11.3.0 compilers.


1. Create a directory to use for building the application, e.g. ~/src/gsl
1. Create a directory to use for building the application, e.g. ~/src/gsl
Line 720: Line 645:
3. Start an interactive session
3. Start an interactive session
<pre class="gcommand">
<pre class="gcommand">
qlogin
interact
</pre>
</pre>


4. Load the module for the compiler suite that you want to use, e.g. GCC 8.3.0:
4. Load the module for the compiler suite that you want to use, e.g. GCC 11.3.0:
<pre class="gcommand">
<pre class="gcommand">
module load foss/2019b
module load foss/2022a
</pre>
</pre>


Line 746: Line 671:
7. Create a subdirectory to build the application
7. Create a subdirectory to build the application
<pre class="gcommand">
<pre class="gcommand">
mkdir build_gcc830
mkdir build_gcc1130


cd build_gcc830
cd build_gcc1130
</pre>
</pre>


Line 758: Line 683:
9. Configure the application with e.g.
9. Configure the application with e.g.
<pre class="gcommand">
<pre class="gcommand">
../configure --prefix=/home/MyID/apps/gsl/2.6/gcc830
../configure --prefix=/home/MyID/apps/gsl/2.6/gcc1130
</pre>
</pre>


where /home/MyID/apps/gsl/2.6/gcc830 should be replaced by the directory where you want to install GSL.
where /home/MyID/apps/gsl/2.6/gcc1130 should be replaced by the directory where you want to install GSL.


You can capture the standard error and standard output of the configure step into a file, to help troubleshoot the step if it encounters any issues. This can be done with
You can capture the standard error and standard output of the configure step into a file, to help troubleshoot the step if it encounters any issues. This can be done with
<pre class="gcommand">
<pre class="gcommand">
../configure --prefix=/home/MyID/apps/gsl/2.6/gcc830 2>&1 | tee my.config.log
../configure --prefix=/home/MyID/apps/gsl/2.6/gcc1130 2>&1 | tee my.config.log
</pre>
</pre>


Line 808: Line 733:
-->
-->
<pre class="gcommand">
<pre class="gcommand">
module load CMake/3.15.3-GCCcore-8.3.0 zlib/1.2.11-GCCcore-8.3.0
module load CMake/3.24.3-GCCcore-11.3.0
module load zlib/1.2.12-GCCcore-11.3.0
 
cmake -DCMAKE_INSTALL_PREFIX:PATH=/home/MyID/app/diamond/1.0 ...[skipped]
cmake -DCMAKE_INSTALL_PREFIX:PATH=/home/MyID/app/diamond/1.0 ...[skipped]
make install  
make install  
</pre>
</pre>
Line 815: Line 743:
</syntaxhighlight>
</syntaxhighlight>
-->
-->
Line 1: load CMake module and zlib module.  
Line 1: load CMake module and zlib module.
 
Line 2: set prefix, there are other variable could be defined here.  
Line 2: set prefix, there are other variable could be defined here.  


Line 821: Line 750:


<pre class="gcommand">
<pre class="gcommand">
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/apps/eb/GSL/2.6-GCC-8.3.0/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/apps/eb/GSL/2.7-GCC-11.3.0/lib
export CFLAGS="-I/apps/eb/GSL/2.6-GCC-8.3.0/include"
 
export LDFLAGS="-L/apps/eb/GSL/2.6-GCC-8.3.0/lib"
export CFLAGS="-I/apps/eb/GSL/2.7-GCC-11.3.0/include"
 
export LDFLAGS="-L/apps/eb/GSL/2.7-GCC-11.3.0/lib"
 
export LIBS="-lgsl"
export LIBS="-lgsl"
python setup.py build_ext --inplace
python setup.py build_ext --inplace
</pre>
</pre>
Line 831: Line 764:
Line 2: specify the location of gsl c header file, pass this to GNU C compiler at compilation time
Line 2: specify the location of gsl c header file, pass this to GNU C compiler at compilation time


Line 3 and 4: specify the location of gsl shared dynamic lib and the lib, pass this to GNU linker at linkage time
Line 3 and 4: specify the location of gsl shared dynamic lib and the lib, pass this to GNU linker at linkage time


Another way is to define variables at configuration line.   
Another way is to define variables at configuration line.   


<pre class="gcommand">
<pre class="gcommand">
./configure --prefix=/home/MyID/app/diamond/1.0 --with-jemalloc=/apps/eb/jemalloc/5.2.1-GCCcore-8.3.0/lib
./configure --prefix=/home/MyID/app/diamond/1.0 --with-jemalloc=/apps/eb/jemalloc/5.3.0-GCCcore-11.3.0/lib
</pre>  
</pre>  
   
   
Line 849: Line 782:
</pre>  
</pre>  
   
   
[https://www.docker.com/ Docker container], as you might know, is the most well-known container system. Docker also has a bigger ecosystem than Singularity. However, Docker was initially designed for ephemeral servers; by default Docker tries to isolate the running container as much as possible, which makes it not suitable for running in a HPC environment. Like Docker, Singularity is a container runtime too. But it starts from a very different place. It favors integration rather than isolation. Singularity is also the best friend of Docker and can import images from Docker registries. You can search a docker image at [https://hub.docker.com/ Docker Hub]. Once you found the image you need, you can pull and build it into a Singularity image. For example, to pull and build Trinity latest version Singularity image from Docker Hub:
[https://www.docker.com/ Docker container] is the most well-known container system. Docker also has a bigger ecosystem than Singularity. However, Docker was initially designed for ephemeral servers; by default Docker tries to isolate the running container as much as possible, which makes it not suitable for running in a HPC environment. Like Docker, Singularity is a container runtime too. But it starts from a very different place. It favors integration rather than isolation. Singularity is also the best friend of Docker and can import images from Docker registries. You can search a docker image at [https://hub.docker.com/ Docker Hub]. If the image you need is on Docker Hub, you can pull and build it into a Singularity image.
 
For example, to pull and build Trinity latest version Singularity image from Docker Hub:


<pre class="gcommand">
<pre class="gcommand">
Line 855: Line 790:
</pre>  
</pre>  


Detailed instructions on how to build Singularity container can be found at [https://sylabs.io/guides/3.6/user-guide/build_a_container.html# Singularity Build a Container]
Detailed instructions on how to build Singularity container can be found at [[Software on Sapelo2#Singularity Containers|GACRC Singularity wiki page]] or at [https://sylabs.io/guides/3.6/user-guide/build_a_container.html# Singularity Build a Container]


----
----
[[#top|Back to Top]]
[[#top|Back to Top]]

Latest revision as of 11:04, 18 October 2024


Introduction

We introduce here some guidance on how to install applications and libraries on Sapelo2.

In general, users can build applications under their home directory or other space owned by the users (/home/MyID/).

If an application will be used by several members of a group, the application can be installed in the user's group work space (/work/abclab/).

GACRC team takes requests to install applications at a central place (/apps) if the application satisfies the following conditions:

  • The application is Linux compatible.
  • The application has general interest among users.
  • The application needs to be built with special settings, such as root privilege, shared common data set, complex structures, or other requirements.
  • The application needs to be configured to integrate into the GACRC environment, such as queue settings or database connections.

Please use the GACRC Software Installation/Update online form to submit a support ticket to GACRC team if you need any help from us.


Back to Top

General Guidlines

IMPORTANT: Please DO NOT install applications on the login node. Instead, please install applications from an interactive session started with the interact command.

Sapelo2 login node (sapelo2.gacrc.uga.edu) has limited memory. Most of the time software installation fails due to insufficient memory. More importantly, the process of installing applications on the login node can degrade the performance of the cluster for everyone. So please do not install any applications on the cluster while you are on the login node. We strongly advise that you build or install applications from an interactive session that you can start with interact command from the login node:

interact

All of the following installation examples were executed in a interact environment.

How to check if an application is installed

To find if an application, e.g. Trinity, is already installed on Sapelo2, use the following command (the application name is NOT case sensitive):

module spider trinity

or

module avail trinity

To see a description of the module (the application name is case sensitive)

module whatis Trinity

To see configuration details, e.g., environment variables, of the module (the application name is case sensitive)

module show Trinity

Back to Top

How to check if a library is installed

Perl

To check if a perl library is installed, e.g. DBI, first load the Perl package of interest, for example Perl/5.34.1-GCCcore-11.3.0:

module load Perl/5.34.1-GCCcore-11.3.0
perl -MDBI -e 'print "OK\n"'

If the library is not installed, it will warn that the library is not in the path. Otherwise, it will print OK.

Python

To check if a Python library is installed, e.g. numpy, first load the Python package of interest, then run "pip show" or "pip list" commands. The Linux command "which" is to identify if pip is in place. In the following example, we use Python/2.7.18-GCCcore-11.3.0:

module load Python/2.7.18-GCCcore-11.3.0
which pip
pip show numpy

To list all installed libraries with their versions:

module load Python/2.7.18-GCCcore-11.3.0
pip list

To generate an alphabetical list of the libraries installed under a Python version:

module load Python/2.7.18-GCCcore-11.3.0
pip freeze| sort | awk -F'@' '{print $1}'

For Python3, some versions have both pip3 and pip. pip3 is the Python3 version of pip. If you load Python3, run pip3, otherwise run pip. You can use "which" command to identify if pip3 is in place. For example:

module load Python/3.10.4-GCCcore-11.3.0
which pip3

Then check if a Python3 library is installed, e.g. using pip3:

module load Python/3.10.4-GCCcore-11.3.0
pip3 show numpy

To list all installed libraries with their versions:

module load Python/3.10.4-GCCcore-11.3.0
pip3 list

To generate an alphabetical list of the libraries installed under a Python version:

module load Python/3.8.2-GCCcore-8.3.0
pip3 freeze| sort | awk -F'@' '{print $1}'

If the library is not installed, it will not be shown by pip or pip3 commands. Otherwise, pip or pip3 will give you information about libraries' version and installation location.

Note: Some Python libraries are installed outside of the Python default location and are provided as a separate module file that needs to be loaded separately. To check if a python library is installed as a separate module, please use the module spider command.

For example, to check if matplotlib is installed as a separate module:

module spider matplotlib

This command will return all the matplotlib versions currently installed on the cluster. For example, the module named matplotlib/3.5.2-foss-2022a provides matplotlib version 3.5.2 for Python 3.10.4 and it uses the foss-2022a toolchain.

To use this version of matplotlib, please load the module matplotlib/3.5.2-foss-2022a. Similarly, you can load other module files that provide other Python libraries, for example, SciPy-bundle module provides numpy, spicy, and pandas, etc..

R

To check if a R library is installed, e.g. xtable, first load the R version of interest, for example R/4.3.1-foss-2022a:

interact
module load R/4.3.1-foss-2022a
R
require("xtable")

Ctrl+d

If the library is not installed, it will warn that the library is not in the path. Otherwise, it will print "Loading required package: xtable".


Back to Top

How to structure directories

Common practice is to set up the following three directories using "mkdir" command in the user's home directory /home/MyID/:

  • apps: directory where the applications will be installed
  • src: directory under which you can store the source files and build the applications
  • modulefiles: directory under which you can put your own module files

These directories can be created with

mkdir ~/apps
mkdir ~/src
mkdir ~/modulefiles

Back to Top

How to use local Lmod modules

To use your own modules, e.g. trinity/1.0:

module use ~/modulefiles
module load trinity/1.0

If there are module files with the same name as in our central place, your local ones will take precedence over the central ones.


Back to Top

How to Install Software Packages (general)

If you are interested in using an application, check the website of this application to see if (a) it is compatible with Linux, (b) it is distributed as binaries or source code, (c) and the kind of package that it is (Python library and scripts, Perl scripts, Java, C, C++, Fortran code, etc).

Perl scripts and Java jar files can be downloaded into the users' home directory and run from there.

If the application is distributed as pre-compiled binaries, check if binaries are available for the Linux OS that our cluster run (Sapelo2 runs Rocky Linux 8). Binaries compiled for Window and for Mac OSX cannot be run on Sapelo2. Also, binaries generated in other Linux distributions (e.g. Ubuntu, Debian, etc) will in general not work on Sapelo2.

If pre-compiled binaries are not available for the OS on Sapelo2 (Rocky Linux 8), then you can compile the code yourself, if the source code is available.

Another option for running binaries compatible with other Linux distributions is to create a Singularity container and run it on Sapelo2 (more information below).


Back to Top

Installing Software Packages

How to build C, C++ applications

Here is an example on how to install an application called GERUDsim3, which comprise a single C++ program.

1. Login to sapelo2 login node. If you do not have a directory called apps, create one with

mkdir ~/apps

2. Download the application into the ~/apps directory. For the example used here, you can download it with

cd ~/apps

git clone https://github.com/JonesLabIdaho/GERUDsim3.git

3. Start an interactive session with

interact -c 4 --mem 8gb

4. compile the code on the interactive node with:

cd ~/apps/GERUDsim3/GERUDsim3/Source_files

g++ GerudSim3.cpp -o GerudSim3

The g++ command that is on user's default path is version 8.5.0:

which g++
/usr/bin/g++

which gcc
/usr/bin/gcc

g++ --version
g++ (GCC) 8.5.0 20210514 (Red Hat 8.5.0-18)

gcc --version
gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-18)

If you wish/need to use a different GNU compiler version, or a different compiler (e.g. Intel), then first load the corresponding module and then compile the code. For example, to use g++ 11.3.0:

cd ~/apps/GERUDsim3/GERUDsim3/Source_files

module load foss/2022a

g++ GerudSim3.cpp -o GerudSim3

5. The name of your executable is GerudSim3. You can run it with the full path

~/apps/GERUDsim3/GERUDsim3/Source_files/GerudSim3

or copy this binary to ~/bin (create this directory first, if it does not exist yet) and run it with

~/bin/GerudSim3

You can also add ~/bin to your default PATH. To do that, add the following in your .bashrc file:

export PATH=/home/MyID/bin/:$PATH

where MyID needs to be replaced by your UGA MyID. With this, any executable you put in ~/bin will be in your PATH and can be invoked without its full path.

For more information on compilers available on Sapelo2, please see Code Compilation on Sapelo2.


Back to Top

Cross build for mixed processor architecture

At Sapelo2, we have two types of processors: Intel and AMD.

In some C or C++ applications, compiler performance-optimization flags, such as -march=native, may be introduced in the configuration step or in the Makefile. Executables compiled with such options on one type of processor might not run on a different processor type. To enable the compiled application to run on both types of processors, these processor specific compiler optimization options should be removed. One option is to set the flags as -mtune=generic -march=x86-64.

If you want to compile or test your program on different types of processors to ensure that they work there, you could start an interactive session on each of the processor types. For example, to start an interactive session on an AMD EPYC node, use

interact --constraint EPYC

You could choose a different processor type with the --constraint option. For more information on how to start interactive sessions on Sapelo2, please see Running interactive jobs on Sapelo2


Back to Top

How to install a Perl module

A convenient way to build and install perl modules is to use CPAN, which can be used to install Perl modules in the user's home directory. We suggest that you do this installation in an interactive session.

1. Start an interactive session with

interact

2. Decide where you want the perl modules to be installed, e.g. ~/perlmods, and create this directory if it does not exist yet

mkdir ~/perlmods

3. Load the perl module that you want to use, e.g. Perl/5.34.1-GCCcore-11.3.0

module load Perl/5.34.1-GCCcore-11.3.0

4. Start the CPAN shell with

cpan

When you start cpan for the first time, some configurations will be set (unless you already have configurations set in ~/.cpan/CPAN/MyConfig.pm):

cpan

cpan shell -- CPAN exploration and modules installation (v2.28)
Enter 'h' for help.

cpan[1]>

5. From within the CPAN shell, enter the following two (2) commands to specify the installation directory:

cpan[1]> o conf mbuildpl_arg "--install_base /home/MyID/perlmods"
    mbuildpl_arg       [--install_base /home/MyID/perlmods]
Please use 'o conf commit' to make the config permanent!


cpan[2]> o conf makepl_arg "PREFIX=/home/MyID/perlmods"
    makepl_arg         [PREFIX=/home/MyID/perlmods]
Please use 'o conf commit' to make the config permanent!
 
cpan[3]>

Note that in the two commands above you will need to replace MyID by your own MyID and change perlmods if you are using a different directory name. If you want to set the above installation path as the default one, you can make the settings above permanent by entering "o conf commit". If this is not done, you will need to reset this value every time you restart CPAN. If you do make the settings permanent, you can always change them later and re-commit as shown above.

6. To install a module (for example, if you want to install "IO::CaptureOutput") enter:

cpan[3]> install IO::CaptureOutput

Respond to any prompts for information that might be requested. When you are finished, enter:

cpan[4]> quit

7. After you have successfully installed a local Perl module, set the PERL5LIB environmental variable to tell Perl where to find the module. For example:

export PERL5LIB=/home/MyID/perlmods/lib/perl5/site_perl/5.34.1/:$PERL5LIB

perl -MIO::CaptureOutput -e 'print "OK\n"'
OK

where /home/MyID/perlmods has to be replaced by the path to your installation directory.

You can add this export command to your .bashrc file if you'd like to ensure that this PERL5LIB environment variable is always set when you login and for your non-interactive scripts. If this export line is not added in your .bashrc file, you can add it to your job submission scripts. You can also define a module file where this variable is defined.


Back to Top

How to install Python packages

When installing Python packages it's important to keep them organized based on the project you're working on (as opposed to installing all Python packages you'll ever use into one location). This is typically done with Python virtual environments (or Conda environments if there's a scientific package you need only available through Conda). For example, if you were working one project that involved machine learning and another project involving data visualization, you may need different versions of some of the same Python packages and should keep them separate. A Python virtual environment is just a directory in which Python packages are installed, separate from other unrelated packages. To create a virtual environment in which to install Python packages in your /home directory, follow these steps below:

Create a virtual environment in your /home directory

1. Start an interactive job:

interact

2. Search for the Python version you would like to use:

gacrc-test@d2-13 ~$ module spider Python

3. Load the software module for the version of Python you would like to use:

gacrc-test@d2-13 ~$ module load Python/3.10.4-GCCcore-11.3.0

4. Create the Python virtual environment in your /home directory. You may find it beneficial to have a directory in your /home directory for all of your Python virtual environments.

gacrc-test@d2-13 ~$ python -m venv ~/env/mypyenv

This creates a base virtual environment to work from, in this case called "mypyenv". You should use more meaningful names for your actual Python environments. The ~/envs directory in which I would install all of my Python virtual environments in this example doesn't have to exist already when I run the above command. If it doesn't, the above command would create it. This gives us a base Python virtual environment to work with:

gacrc-test@d2-13 ~$ ls ~/env/mypyenv/
bin  include  lib  lib64  pyvenv.cfg

gacrc-test@d2-13 ~$ ls ~/env/mypyenv/bin/
Activate.ps1  activate	activate.csh  activate.fish  pip  pip3	pip3.10  python  python3  python3.10

5. Now that the environment is created, we can activate it and install Python packages. To do so, we need to source the bin/activate file. This file sets the appropriate environment variables so that any Python packages you install will be installed in your virtual environment. To source file and thus activate the virtual environment:

gacrc-test@d2-13 ~$ . ~/env/mypyenv/bin/activate

Note the space in between the period (source command) and the path to the activate file. Upon running this command, you will see that your command prompt has changed, showing you the name of your active environment:

gacrc-test@d2-13 ~$ . ~/env/mypyenv/bin/activate
(mypyenv) gacrc-test@d2-13 ~$

Now any Python packages you install will be installed in the "lib" directory of your Python virtual environment (whether you use pip or download the package and install it with its setup.py script) and are usable when your Python virtual environment is active.

6. To deactivate a Python virtual environment:

deactivate

Installing Python packages in your virtual environment using pip

The first time you ever use pip in your Python virtual environment you should probably upgrade pip.

pip install --upgrade pip

Then all you have to do is pip install the package, and it will be installed in your virtual environment:

pip install <package name>

or using a requirements.txt file:

pip install -r requirements.txt

Installing Python packages in your virtual environment from the source code

If the Python package that you want to install in your virtual environment is not available through pip for some reason, you can download the source code in your home directory and run the package's setup.py script. As long as your Python virtual environment is active when you do this it will be installed in that virtual environment. Suppose you have downloaded a Python package tarball file, e.g., myPackage-1.0.tar.gz, in the source folder (src) in your /home directory with the full path being:

/home/MyID/src/myPackage-1.0.tar.gz

Change directory to /home/MyID/src/:

cd ~/src

Uncompress and untar the tarball file using "tar xzvf" command:

tar xzvf myPackage-1.0.tar.gz

cd into the new directory:

cd myPackage-1.0

Run the setup.py script:

python setup.py --install

Using your Python virtual environment in a job

Once you have created your Python virtual environment and installed the packages you need, you can use it in a job on the cluster by loading the Python module you used to create the environment and then activating the environment:

module load Python/3.10.4-GCCcore-11.3.0
. ~/env/mypyenv/bin/activate

Back to Top

How to install Conda packages

To install a software package as a conda environment in user's Sapelo2 home directory, please use the option "-p" to define your environment installation path, e.g., /home/MyID/busco_conda.

For example, to install busco v5.7.1 conda environment at /home/MyID/busco_conda:

module load Miniforge3/24.7.1-0
conda create -p /home/MyID/busco_conda -c bioconda  busco=5.7.1

where bioconda is the channel where busco v5.7.1 source is downloaded. Other common conda channels are, for example, conda-forge, r, defaults, and qiime2, etc.. Please refer to Conda channels

If the busco version 5.7.1 is not specified, the most recent version of busco will be downloaded and installed, i.e.,

module load Miniforge3/24.7.1-0
conda create -p /home/MyID/busco_conda -c bioconda busco

To list conda environments you installed:

module load Miniforge3/24.7.1-0
conda env list

To activate your conda environment, you need to give your environment path to source activate command:

module load Miniforge3/24.7.1-0
source activate /home/MyID/busco_conda

Once a conda environment is activated, you can list packages installed inside, with their names, versions, and downloading channels. For example, to list packages installed in busco v5.7.1 conda environment:

module load Miniforge3/24.7.1-0
source activate /home/MyID/busco_conda
conda list

You can add or install new packages to an existing environment. For example, to install scipy v1.14.1 package to busco v5.7.1 conda environment:

module load Miniforge3/24.7.1-0
source activate /home/MyID/busco_conda
conda install scipy=1.14.1

Likewise, if the scipy version 1.14.1 is not specified, the most recent version of scipy will be downloaded and installed for you.

To remove an installed package, e.g., the scipy package, from your conda environment:

module load Miniforge3/24.7.1-0
source activate /home/MyID/busco_conda
conda remove scipy

To deactivate a conda environment that is currently activated:

conda deactivate

Once a conda environment is deactivated, its conda layer will be removed from your Linux shell environment.

If you would like to use your conda environment in a Jupyter Notebook, please see Using a Conda environment in Jupyter.


Back to Top

How to install R packages

If you wish to install an R package in your home directory, first decide the directory where you will install it. For example, to install it in ~/Rlibs, first create this dir if it does not exist yet:

mkdir ~/Rlibs

Then, create a file called ~/.Renviron containing the following line:

R_LIBS_USER=/path/to/Rlibs

Please replace /path/to/Rlibs with the path that you want to use. For example, to /home/MyID/Rlibs, where MyID needs to be replaced by your UGA MyID.

Start an interactive session and load the module for the version of R you want to use (e.g. R/4.3.1-foss-2022a)

interact
module load R/4.3.1-foss-2022a

Install R package using R command line

If you download the R package tarball (e.g. brocolors_0.1.tar.gz) to your home dir, you can install it with R CMD INSTALL at the command line, but use the flag --library=/path/to/Rlibs, as follows:

R CMD INSTALL --library=/path/to/Rlibs brocolors_0.1.tar.gz

Note that the path for --library flag should be the same path contained in ~/.Renviron as described above.

Install R package in an interactive R session

You can install a package in an interactive R session using install.packages(). To use this method, you need to create the ~/.Renviron file as described above. For example, to open an interactive R session to install ggplot2 package into the path given in ~/.Renviron:

$ R

R version 4.3.1 (2023-06-16) -- "Beagle Scouts"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> install.packages('ggplot2')

You will be prompted to enter your selection of a CRAN mirror site for downloading package source. Usually we select site 72: USA (OH) [https]. If one site does not work well for you, you could try others.


Back to Top

How to install Java applications

Most third-party Java applications are distributed as pre-compiled binaries (jar file), which users can download into their own home directories. For example, to install picard 2.4.1 in your home directory.

1. Create a directory where you want to install the java application (e.g. picard). For example

mkdir -p ~/apps/picard

2. Download the package (e.g. picard-tools-2.4.1.zip) from their website, put into your chosen directory (e.g. ~/apps/picard) and extract the file. For example

cd ~/apps/picard

unzip picard-tools-2.4.1.zip

This will create a directory called picard-tools-2.4.1 that contains picard.jar. You can rename this dir e.g.

cd ~/apps/picard

mv picard-tools-2.4.1  2.4.1

3. To run this application, load a java module and invoke this application in your job submission script. For example

module load Java/15.0.1

java -Xmx20g -classpath "/home/MyID/apps/picard/2.4.1" -jar /home/MyID/apps/picard/2.4.1/picard.jar [options]

where MyID needs to be replaced by your own MyID.


Back to Top

How to build complex applications

Many applications use a configure step to check on system libraries and path to dependencies in order to create Makefiles. The Makefiles are then used to build and install the application. To illustrate how this process is typically set up, we will describe how to install the GNU Scientific Libraries (GSL) v2.6 using the GNU 11.3.0 compilers.

1. Create a directory to use for building the application, e.g. ~/src/gsl

mkdir -p ~/src/gsl

2. Download the source tarball into your chosen directory, e.g.

cd ~/src/gsl

wget http://mirror.sbb.rs/gnu/gsl/gsl-2.6.tar.gz

3. Start an interactive session

interact

4. Load the module for the compiler suite that you want to use, e.g. GCC 11.3.0:

module load foss/2022a

5. Change into your working directory and check the tarball with

cd ~/src/gsl

tar ztvf gsl-2.6.tar.gz

and extract the files with

tar zxvf gsl-2.6.tar.gz 

6. Change into the extracted directory

cd gsl-2.6

7. Create a subdirectory to build the application

mkdir build_gcc1130

cd build_gcc1130

8. Check all the configure options with

../configure --help

9. Configure the application with e.g.

../configure --prefix=/home/MyID/apps/gsl/2.6/gcc1130

where /home/MyID/apps/gsl/2.6/gcc1130 should be replaced by the directory where you want to install GSL.

You can capture the standard error and standard output of the configure step into a file, to help troubleshoot the step if it encounters any issues. This can be done with

../configure --prefix=/home/MyID/apps/gsl/2.6/gcc1130 2>&1 | tee my.config.log

10. If the configure step worked, you can build the application with

make 2>&1 | tee my.make.log

11. Some applications, including GSL, provide some tests that can be run after the build step to ensure the application was built correctly. For example, for GSL you can run

make check 2>&1 | tee my.check.log

12. Install the application with

make install 2>&1 | tee my.install.log

Back to Top

How to link dependencies

Some applications need third party program, library, header files to build. There are various ways to introduce these to the build command.

Usually the manual/README of the application would elaborate more about the needed configuration variables. To explore these configuration variables from command is to issue --help options following the main build command.

Such as cmake build,

cmake --help

Or if it is a configuration build,

./configure -h 


To set needed components in build environment, this could simply be done by loading modules prior to build.

module load CMake/3.24.3-GCCcore-11.3.0
module load zlib/1.2.12-GCCcore-11.3.0

cmake -DCMAKE_INSTALL_PREFIX:PATH=/home/MyID/app/diamond/1.0 ...[skipped]

make install 

Line 1: load CMake module and zlib module.

Line 2: set prefix, there are other variable could be defined here.

Or set the variables explicitly by export command

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/apps/eb/GSL/2.7-GCC-11.3.0/lib

export CFLAGS="-I/apps/eb/GSL/2.7-GCC-11.3.0/include"

export LDFLAGS="-L/apps/eb/GSL/2.7-GCC-11.3.0/lib"

export LIBS="-lgsl"

python setup.py build_ext --inplace

Line 1: specify the location of gsl shared dynamic lib, pass this to shared library loader at runtime

Line 2: specify the location of gsl c header file, pass this to GNU C compiler at compilation time

Line 3 and 4: specify the location of gsl shared dynamic lib and the lib, pass this to GNU linker at linkage time

Another way is to define variables at configuration line.

./configure --prefix=/home/MyID/app/diamond/1.0 --with-jemalloc=/apps/eb/jemalloc/5.3.0-GCCcore-11.3.0/lib

Back to Top

How to download Singularity images

Singularity image can be searched and downloaded from Singularity Container Library. For example, to pull Trinity v2.9.1 Singularity image built for amd64 (default) architecture:

singularity pull --arch amd64 library://colinsauze/default/trinity:v2.9.1

Docker container is the most well-known container system. Docker also has a bigger ecosystem than Singularity. However, Docker was initially designed for ephemeral servers; by default Docker tries to isolate the running container as much as possible, which makes it not suitable for running in a HPC environment. Like Docker, Singularity is a container runtime too. But it starts from a very different place. It favors integration rather than isolation. Singularity is also the best friend of Docker and can import images from Docker registries. You can search a docker image at Docker Hub. If the image you need is on Docker Hub, you can pull and build it into a Singularity image.

For example, to pull and build Trinity latest version Singularity image from Docker Hub:

singularity pull docker://trinityrnaseq/trinityrnaseq:latest

Detailed instructions on how to build Singularity container can be found at GACRC Singularity wiki page or at Singularity Build a Container


Back to Top