Python#
Python is a dynamic very popular programming language. Python is an open source software. It has wide applications in many HPC softwares and related workflow. Python is well known for having a broad libraries and big community around the world.
Python at NHPCC#
Interactive mode#
To use python in its interactive mode, you can simply get an interactive job,
u111111@login1:~> srun -n 1 --mem=4G -p amd128 -t 10 --pty /bin/bash
srun: job 60242 queued and waiting for resources
srun: job 60242 has been allocated resources
u111111@en-1-3:~>
u111111@en-1-3:~> ml Python
u111111@en-1-3:~> python
Python 3.11.5 (main, Feb 17 2024, 15:35:29) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
To see which python versions are available, write ml Python
and press twice Tab. For example,
u111111@en-1-3:~> ml Python
Python Python/3.10.4-GCC-11 Python/3.10.4-GCCcore-12.2.0 Python/3.7.4-GCCcore-8.3.0
Python/2.7.15 Python/3.10.4-GCC-11-bare Python/3.10.4-GCCcore-12.2.0-bare Python/3.8.6-GCCcore-10.2.0
Python/2.7.18-GCC-11-bare Python/3.10.4-GCCcore-11 Python/3.10.8-GCCcore-12.2.0 Python/3.9.6-GCCcore-11.2.0
Python/2.7.18-GCCcore-10.2.0 Python/3.10.4-GCCcore-11.3.0 Python/3.10.8-GCCcore-12.2.0-bare Python/3.9.6-GCCcore-11.2.0-bare
Python/2.7.18-GCCcore-11.2.0-bare Python/3.10.4-GCCcore-11.3.0-bare Python/3.11.5-GCCcore-13.2.0 Python-bundle-PyPI
Python/2.7.18-GCCcore-12.2.0-bare Python/3.10.4-GCCcore-11-bare Python/3.7.4 Python-bundle-PyPI/2023.06-GCCcore-12.2.0
See Module section for more info on using the module
or its abbreviation ml
command. You can always check the software list section for an up to date list of different available python version.
Using Python in batch scripts#
It is also possible to load Python module in a job script. The script could be written as:
#!/bin/bash
#SBATCH ...
#SBATCH ...
#SBATCH ...
ml purge
ml load Python
./hello_world.py
Using Anaconda distribution#
You can use Anaconda python distribution instead of python modules. Anaconda could be more suitable for scientific and data science workflows. To load the default version run:
ml purge
ml Anaconda3
which python
To see all the pre-installed Anaconda packages and their versions use the conda list
command:
u111111@en-7-5:~> conda list
# packages in environment at /share/apps/eb/Anaconda3/2024.02-1:
#
# Name Version Build Channel
_anaconda_depends 2024.02 py311_mkl_1
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
abseil-cpp 20211102.0 hd4dd3e8_0
aiobotocore 2.7.0 py311h06a4308_0
aiohttp 3.9.3 py311h5eee18b_0
aioitertools 0.7.1 pyhd3eb1b0_0
aiosignal 1.2.0 pyhd3eb1b0_0
alabaster 0.7.12 pyhd3eb1b0_0
...
The Anaconda Python distribution is system software. This means that you can use any of its packages but you cannot make any modifications to them (such as an upgrade) and you cannot install new ones in their location. You can, however, install whatever packages you want in your home directory in custom environments. The two most popular package managers for installing Python packages are conda and pip. These commands automates the installation process, including resolving dependencies, compilation (only pip
and for not pure python code) and copying the files into the correct path.
Conda enables you to easily install complex packages and software. Creating multiple environments enables you to have installations of the same software in different versions or incompatible software collections at once. You can easily share a list of the installed packages with collaborators or colleagues, so they can setup the same environment in a matter of minutes.
Unlike pip, conda serves as both a package and environment manager. It is not limited to a single programming language, supporting packages for Python, R, Fortran, and more. Conda primarily uses the main channel of Anaconda Cloud for installations, but it can also access other channels like bioconda, intel, r, and conda-forge. It always installs pre-compiled binary files, which often offer better performance by utilizing Intel MKL. Below is an example of creating an environment and installing packages in it:
ml purge
ml Anaconda3/2024.02-1
conda create --name myenv <package-1> <package-2> ... <package-N>
conda activate myenv
Installing packages#
Internet access
To install any packages, you need to login to a node with internet access. So, you must submit your job into the short partition. Please note the short partition has 30 minutes time.
pip#
To install python packages using pip
, you should utilize the --user
option. This ensures that the packages are installed in a user-writable location, which is typically your home directory. As your home directory is shared across nodes on the cluster, you'll only need to install python packages once, and they'll be accessible and ready to use on every node in the cluster. Let's install a very small test package called "pip-install-test":
ml Python/3.10.8-GCCcore-12.2.0
pip install --user pip-install-test
This package will be installed in $HOME/.local/lib/python<version>/site-packages/pip-install-test
and can be imported as:
import pip_install_test
To list the installed packages:
pip list -v
You can use a requirements.txt file to install a list of packages:
pip install --user -r requirements.txt
It's possible to upgrade a package:
pip install --user --upgrade <package_name>
or upgrade all the packages listed in the requirement file:
pip install --user --upgrade -r requirements.txt
To uninstall a package:
pip uninstall <package_name>
pip uninstall -r requirements.txt
Using virtual environments#
If you want to install many packages or to make a sophisticated project or more importantly (believe me!) reproduce your work on any other computers at any time (even after all the oceans are evaporated because of conversion of the sun to a red giant!), you can consider using virtual environments. So after loading your desired python version environment module, please do:
mkdir projectA
cd projectA
python -m venv env
To activate it:
source env/bin/activate
To check your environment working:
(env) u111111@en-7-5:~/projectA> pip list
Package Version
---------- -------
pip 23.2.1
setuptools 65.5.0
[notice] A new release of pip is available: 23.2.1 -> 24.2
[notice] To update, run: pip install --upgrade pip
Now, you can install packages into this environment simply as before and finally to deactivate it just run
deactivate
Conda#
If you have load any of the Anaconda3
modules, it's recommended to use the conda
package manager to install your other needed packages.
As an example let us install the yt package. After loading Anaconda3/2024.02-1
, run:
conda create --name myproject yt
After a few seconds, it tell us it want to get and install many packages (~ 77 MB) including python-3.11.9
. As the default python version of the loaded Anaconda module is 3.11.7
and the python minor version is not important for installing yt
we decide to explicitly determine python version to reduce downloading packages:
conda create --name myproject yt python=3.11.7
Now the download size is decreased to ~ 44 MB. To activate and test:
conda activate myproject
yt --help
Install the environment in a specific path
If you want to install the conda environment in another directory than our
home, you can add --prefix PATH
. This also enables multiple users of a project to
share the conda environment by installing it into their project folder instead
of the users home. For example:
mkdir -p your-project-dir/envs/myproject
conda create --prefix your-project-dir/envs/myproject yt python=3.11.7
Mamba#
Mamba is a Python-based CLI conceived as a drop-in replacement for conda, offering higher speed and more reliable environment solutions
As an example, we are going to install the astropy package. We first load the Mamba
module and then make an environment with an arbitrary name:
ml Mamba
mamba create -n ENV_NAME
Now, we should activate it and then install the desired package(s):
mamba activate ENV_NAME
mamba install -c conda-forge astropy
After installation termination, you can list all the environments:
mamba env list
To remove an environment with all its packages, do:
mamba env remove -n ENV_NAME
You can export an environment, i.e. make a file that contains all the installed packages with their versions:
mamba env export -n ENV_NAME > ENV_NAME.yaml
This file can be used for cloning this environment in the future. This is very helpful for reproducibility of your works.
mamba env create --file ENV_NAME.yaml
Finally, you can deactivate your environment:
mamba deactivate
Working with jupyter lab#
Using the OnDemand interface#
At NHPCC the most straightforward way to use Jupyter lab is to use our nice web interface. Please see OnDemand Interactive Sessions page and the short movie below for more info.
Another harder way#
Submit an interactive job as already mentioned (e.g. srun -n 1 --mem=4G -p short -t 30 --pty /bin/bash
) and setup your environment, then in a node that is assigned to you (e.g. en-7-5), run
jupyter lab --no-browser --port=8888
If you see an error that this port number is already used, try another number more or less around that (e.g 8889). This will start jupyter and print a few lines (including an address where the jupyter is running at).
Then on the login node run
ssh -NL 8888:localhost:8888 en-7-5
and in your local computer (one that you used to connect to the login node) run
ssh -NL 8888:localhost:8888 your_username@login.hpc.iut.ac.ir
Finally open your web browser and go to the address where the jupyter is running , e.g.
http://localhost:8888/?token=276ba92d6b9834c3d748b03e31542f988ee3d10b147b7rdqs
This should open the jupyter lab interface now.