PyTorch#
PyTorch is a machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella.
PyTorch at NHPCC#
There are three ways to run PyTorch on Panthera cluster:
- Interactive mode
- Using subtorch command
- Using OnDemand web portal
Interactive mode#
First, create an interactive job to connect to one of the compute nodes:
u111112@login1:~/wrkdir> srun -n 4 --mem=1G -p short -t 10 --pty /bin/bash
srun: job 57293 queued and waiting for resources
srun: job 57293 has been allocated resources
u111112@cn-12-1:~/wrkdir>
Then, you have to load the PyTorch module:
u111112@cn-12-1:~/wrkdir> ml PyTorch
Tip
In order to see all PyTorch versions installed on Panthera you can type PyTorch and press the Tab key twice. For more information on how to use modules, please visit this page.
Finally, you can use your software:
u111112@cn-12-1:~/wrkdir> your_software_command ...
Using subtorch command#
This is the fast and easiest method most users prefer to load and run PyTorch on the cluster. Simply on login node press subtorch
command without any options to see it's help:
u111112@login1:~/wrkdir> subtorch
Create and submit job for PyTorch
Usage: subtorch <INPUT> [OPTION]
-n <nt:nc> Number of Tasks:Number of cpus per task.
-m <mem> Memory required for job (GB). Default: 4
-p <part> Partition name to submit the job. (use 'sinfo')
-v <ver> Software version.
1: PyTorch/1.12.1-foss-2022a-CUDA-11.7.0
2: PyTorch/1.13.1-Ma23.1-Py3.7.3
3: PyTorch/1.13.1-Py3.7.4
4: PyTorch/2.1-An23.03-CUDA-11.7.0
5: PyTorch/2.1.2-foss-2022b
6: PyTorch/2.2.0-Py3.10.8
7: PyTorch/2.3.0-Py3.10.8
-g <gpu> Number of GPU Device. Default: 1
-o <opt> Options for input python file.
-T Use the torchrun command instead of python.
-j <jobname> a name for the job allocation Default: name of input file.
-l <disk> Disk space required for scratch (GB). Run on local hard disk.
-t <time> run time of the job. Valid format: M, H:M:S, D-H, D-H:M
-po <popt> Options for python or torchrun command.
-so <sopt> Additional slurm options if needed.
-no Only write job file.
-h | --help Print this message and exit.
Example: subtorch run.py -n 2 -m 4 -v 4 -t 2-0 -o '-c config/gran_DD.yaml'
Using OnDemand web portal#
Sometimes you need to take a quick look at your project to make a change or have a graphical view of that and you are running out of time or even do not have an access on your personal pc at the moment. We have prepared an option for you to launch your software in graphical mode and watch your changes quickly.
Warning
We strongly recommend you, not to use this method all the way using Panthera. Having a graphical access to the cluster should be done just in case of emergency or special use cases.
Follow these instructions:
- Go to Interactive Apps > PyTorch
- Fill out the form according to your needs:
- Wait until your session is being ready and now press on Connect to Jupyter :
- After a few seconds, A fresh notebook project will be opened:
- Now write and run your code :
Tip
for more information about using OnDemand portal please visit this page.