Module system#

In NHPCC a module system is used to load many softwares and libraries. This provides users access to multiple versions of an installed software or library. We use Lmod utility on both login and compute nodes to manage modules. Using modules, users can easily load/unload all PATHs and other required Environment variables for a software/library and its dependencies, automatically.

Why do you need a module environment system?

Whenever users login to a Gnu/Linux OS, they get a login shell and the shell uses Environment variables to execute commands and run softwares. Most common are:

PATH: list of directories in which the OS looks for executable files;
MANPATH: list of directories in which man searches for the man pages;
LD_LIBRARY_PATH: list of directories in which the OS looks for *.so shared libraries at runtime needed by softwares.

In addition, there are also application specific environment variables such as CPATH, LIBRARY_PATH, SYSTEM, etc.

A typical way to setup the Environment variables is by customizing the shell initialization files such as: /etc/profile, .bash_profile, and .bashrc. But this could be very hard and practically impossible on an HPC facility which is a multi-user system with many softwares with different versions installed.

It is very hard if not impossible to set and change Environment variables manually, so many HPC centers use Lmod as their Environment Modules managing system. Lmod is a tool that simplify shell initialization and lets users easily modify their environment during the session with modulefiles.

Each modulefile contains the information needed to configure the shell for running a software or using a library. Typically modulefiles tell the module command how to change or set shell environment variables such as PATH, MANPATH, etc.

Modules can be loaded and unloaded dynamically, very cleanly. All popular shells are supported, including bash, ksh, zsh, sh, csh, tcsh, as well as some scripting languages such as python, ruby, tcl, ruby, cmake and R. Modules can also be bundled into metamodules that will load an entire suite of different modules. This is how we manage the NHPCC Software Set.

How to use `module` system#

Module system supports different commands for working with modulefiles. For more simplicity, the ml alias can be used but with slightly different syntax.

Module names auto-completion

The module command supports auto-completion, so you can just start typing the name of a module, and press Tab to let the shell automatically complete the module name and/or version.

Module command	Short version	Description
`module avail`	`ml av`	List available software
`module spider fftw`	`ml spider fftw`	Search for particular library/software (here `FFTW`). Search is done case-insensitively.
`module keyword lapack`	`ml key lapack`	Search for `lapack` in module names and descriptions
`module whatis ScaLAPACK`	`ml whatis ScaLAPACK`	Display information about the `ScaLAPACK` module
`module help ScaLAPACK`	`ml help ScaLAPACK`	Display module specific help
`module load ScaLAPACK`	`ml ScaLAPACK`	Load a module to use the associated software
`module load CUDA/11.4`	`ml GCUDA/11.4`	Load specific version of a module
`module unload CUDA`	`ml -CUDA`	Unload a module
`module swap gcc icc`	`ml -gcc icc`	Swap a module (unload `gcc` and replace it with `icc`)
`module purge`	`ml purge`	Remove all modules
`module save foo`	`ml save foo`	Save the state of all loaded modules in a collection named `foo`
`module restore foo`	`ml restore foo`	Restore the state of saved modules from the `foo` collection

Additional module sub-commands are documented in the module help command. For complete reference, please refer to the official Lmod documentation.

Architecture dependent softwares/libraries

Currently, at NHPCC we have two CPU architectures: The new one is AMD EPYC 7542 and the old one is AMD Opteron 6174. We have tried to compile most if not all of the softwares/libraries according to each node architecture in two separate MODULEPATHs:

/share/apps/modules/all for the AMD EPYC
/opt/share/modules/all for the AMD Opteron

The correct path is applied automatically when users submits their jobs. But at previous steps (e.g. writing, compiling and/or testing a code in the login node) the architecture is important. So, we strongly recommend users to request an appropriate node via an interactive jobs for compiling and testing their softwares.

Loading modules in slurm batch scripts#

If you need to load modules to run your code, you need also to load them when you want to use the slurm batch scripts. You can always run any GNU/Linux command after the last #SBATCH directive. This certainly includes the module command. As an example, let's write a simple FORTRAN code, compile it and then write a batch script to run it.

write a simple code, for example:

hello_world.f90

program HelloWorld
  ! Declare variables
  character(len=20) :: message

  ! Initialize variables
  message = "Hello, World!"

  ! Print the message
  print *, message

end program HelloWorld

in the login node, load GCC module by running ml GCC. You can check which modules are loaded by ml.
compile your code: gfortran hello_world.f90 -o hello_world.exe
write a batch script for submitting your job. This could be:

#!/bin/bash
#SBATCH ...
#SBATCH ...
#SBATCH ...

ml purge
ml load GCC

srun ./hello_world.exe ...

Module system#

How to use module system#

Loading modules in slurm batch scripts#

How to use `module` system#