Module system#
In NHPCC a module system is used to load many softwares and libraries. This provides users access to multiple versions of an installed software or library. We use Lmod utility on both login and compute nodes to manage modules. Using modules, users can easily load/unload all PATH
s and other required Environment variables for a software/library and its dependencies, automatically.
Why do you need a module environment system?
Whenever users login to a Gnu/Linux OS, they get a login shell and the shell uses Environment variables to execute commands and run softwares. Most common are:
PATH
: list of directories in which the OS looks for executable files;MANPATH
: list of directories in whichman
searches for the man pages;LD_LIBRARY_PATH
: list of directories in which the OS looks for*.so
shared libraries at runtime needed by softwares.
In addition, there are also application specific environment variables such as CPATH
, LIBRARY_PATH
, SYSTEM
, etc.
A typical way to setup the Environment variables is by customizing the shell initialization files such as: /etc/profile
, .bash_profile
, and .bashrc
.
But this could be very hard and practically impossible on an HPC facility which is a multi-user system with many softwares with different versions installed.
It is very hard if not impossible to set and change Environment variables manually, so many HPC centers use Lmod as their Environment Modules managing system. Lmod is a tool that simplify shell initialization and lets users easily modify their environment during the session with modulefiles.
Each modulefile contains the information needed to configure the shell for running a software or using a library. Typically modulefiles tell the module
command how to change or set shell environment variables such as PATH
, MANPATH
, etc.
Modules can be loaded and unloaded dynamically, very cleanly. All popular shells are supported, including bash
, ksh
, zsh
, sh
, csh
, tcsh
, as well as some scripting languages such as python
, ruby
, tcl
, ruby
, cmake
and R
. Modules can also be bundled into metamodules that will load an entire suite of different modules. This is how we manage the NHPCC Software Set.
How to use module
system#
Module system supports different commands for working with modulefiles. For more simplicity, the ml
alias can be used but with slightly different syntax.
Module names auto-completion
The module
command supports auto-completion, so you can just start typing the name of a module, and press Tab to let the shell automatically complete the module name and/or version.
Module command | Short version | Description |
---|---|---|
module avail |
ml av |
List available software |
module spider fftw |
ml spider fftw |
Search for particular library/software (here FFTW ). Search is done case-insensitively. |
module keyword lapack |
ml key lapack |
Search for lapack in module names and descriptions |
module whatis ScaLAPACK |
ml whatis ScaLAPACK |
Display information about the ScaLAPACK module |
module help ScaLAPACK |
ml help ScaLAPACK |
Display module specific help |
module load ScaLAPACK |
ml ScaLAPACK |
Load a module to use the associated software |
module load CUDA/11.4 |
ml GCUDA/11.4 |
Load specific version of a module |
module unload CUDA |
ml -CUDA |
Unload a module |
module swap gcc icc |
ml -gcc icc |
Swap a module (unload gcc and replace it with icc ) |
module purge |
ml purge |
Remove all modules |
module save foo |
ml save foo |
Save the state of all loaded modules in a collection named foo |
module restore foo |
ml restore foo |
Restore the state of saved modules from the foo collection |
Additional module sub-commands are documented in the module help
command. For complete reference, please refer to the official Lmod documentation.
Architecture dependent softwares/libraries
Currently, at NHPCC we have two CPU architectures: The new one is AMD EPYC 7542 and the old one is AMD Opteron 6174. We have tried to compile most if not all of the softwares/libraries according to each node architecture in two separate MODULEPATH
s:
/share/apps/modules/all
for the AMD EPYC/opt/share/modules/all
for the AMD Opteron
The correct path is applied automatically when users submits their jobs. But at previous steps (e.g. writing, compiling and/or testing a code in the login node) the architecture is important. So, we strongly recommend users to request an appropriate node via an interactive jobs for compiling and testing their softwares.
Loading modules in slurm batch scripts#
If you need to load modules to run your code, you need also to load them when you want to use the slurm batch scripts. You can always run any GNU/Linux command after the last #SBATCH
directive. This certainly includes the module
command. As an example, let's write a simple FORTRAN code, compile it and then write a batch script to run it.
- write a simple code, for example:
program HelloWorld
! Declare variables
character(len=20) :: message
! Initialize variables
message = "Hello, World!"
! Print the message
print *, message
end program HelloWorld
- in the login node, load
GCC
module by runningml GCC
. You can check which modules are loaded byml
. - compile your code:
gfortran hello_world.f90 -o hello_world.exe
- write a batch script for submitting your job. This could be:
#!/bin/bash
#SBATCH ...
#SBATCH ...
#SBATCH ...
ml purge
ml load GCC
srun ./hello_world.exe ...