Running Jupyter server via SLURM enables you to run your Jupyter server on a SLURM node with more resources; sometimes allowing you to use GPU. It is a better practice to run heavy code via SLURM, because login nodes are meant for login only. This documentation specifically covers this process.
This guide is adapted from nero-docs.stanford.edu
With Katahdin ACG, you’ll need to submit this as a SLURM job. That way, your jupyter lab server gets placed on a SLURM node with enough resources to host it. To summarize: We are creating a slurm job that runs jupyterlab on a SLURM node, for up to 2 days (max is 7). Once running, we are going to connect to the jupyterlab instance with SSH port forwarding from our local laptop. A tunnel must be created as you cannot directly SSH to SLURM nodes on Katahdin.
First create a SLURM sbatch file
Replace $USER with your own ACG username. Use Terminal On Your Laptop:
SSH to Katahdin ACG
Create your sbatch file. You can use your text editor of choice.
vi jupyterLab.sh
Paste the following text into your sbatch script, and save the file.
#!/bin/bash
#SBATCH --job-name=jupyter
#SBATCH --partition=grtx # You can pick from https://acg.maine.edu/hpc#h.b5slztm4yz12
#SBATCH --gres=gpu:1 # not clear if this is obeyed https://slurm.schedmd.com/gres.html
#SBATCH --time=2-00:00:00
#SBATCH --mem=5GB
#SBATCH --output=/home/%u/logs-jupyter-%x-%j.log
module load nv/pytorch # Load pytorch singularity image
singularity exec --nv $PYTORCH_CONT jupyter notebook --ip=0.0.0.0 # start jupyter notebook
Replace the $USER
part
of your
#SBATCH --output=home/$USER/jupyter.log
with your ACG Username. Or, provide an alternate path for
your log output.
This tells SLURM to launch a Jupyter Lab server on the node with your requested resources.
Now we need to send this as a job to SLURM.
Submit this sbatch to SLURM:
vdhiman@katahdin:~$ sbatch jupyterLab.sh
Submitted batch job 1005424
Now, you can check that your job is running:
vdhiman@katahdin:~$ squeue -u $USER
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1005424 grtx jupyter vdhiman PD 0:00 1 (Priority)
Note the ST (Status) column says PD (Pending). After a suitable node is found, it should change to ST=R (Running). This might take a while if the server is busy.
vdhiman@katahdin:~$ squeue -u vdhiman
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1005424 grtx jupyter vdhiman R 0:01 1 grtx-1
Once it is running you can continue…
Check the log output to find out the HOSTNAME we need to use to create an SSH tunnel:
Check log file for Jupyter URL
You can use the tail -f
command to follow the last 10 lines of the log file.
tail -f ~/jupyter.log
The log file will output something like this:
vdhiman@katahdin:~$ tail jupyter.log
[I 19:09:29.687 NotebookApp] or http://127.0.0.1:8888/?token=8a5d8e1laskdjfl1askjdfl1ksjadfl1kjsadlfsjadc3386
[I 19:09:30.097 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 19:09:30.302 NotebookApp] No web browser found: could not locate runnable browser.
[C 19:09:30.566 NotebookApp]
To access the notebook, open this file in a browser:
file:///home/vdhiman/.local/share/jupyter/runtime/nbserver-12238-open.html
Or copy and paste one of these URLs:
http://grtx-1.cluster:8888/?token=8a5d8e1laskdjfl1askjdfl1ksjadfl1kjsadlfsjadc3386
or http://127.0.0.1:8888/?token=8a5d8e1laskdjfl1askjdfl1ksjadfl1kjsadlfsjadc3386
Note the file after “open this file in a browser”. Remove the file://
search
the file for link. For example,
vdhiman@katahdin:~$ grep window.location.href /home/vdhiman/.local/share/jupyter/runtime/nbserver-12238-open.html
window.location.href = "http://0.0.0.0:8888/tree?token=9946b79c2cae1c4744aa90ce58e4133add9d58d0ded77161";
vdhiman@katahdin:~$
This will give you a link. Note the port number. For me that is 8888. Let’s call this REMOTE_PORT=8888.
First check if the status of the job under ST is R for running. Then look for string under NODELIST(REASON). This gives me the HOSTNAME. For me the HOSTNAME is grtx-1.
Note the http://HOSTNAME:REMOTE_PORT/token=TOKEN in the first part, you’ll need that info to setup a port-forwarding connection. For me, the HOSTNAME=grtx-1.cluster, REMOTE_PORT=8888, and TOKEN=8a5d8e1laskdjfl1askjdfl1ksjadfl1kjsadlfsjadc3386
If the output simply says http://hostname:8888/ then you have to use the ouptut of squeue -u $USER under the nodelist column
We need to find the host where the job got scheduled. We can use squeue command to to that.
vdhiman@katahdin:~$ squeue -u $USER
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1086781 grtx jupyter vdhiman R 3:14:58 1 grtx-1
Create an SSH tunnel
Then on your laptop, open a new Terminal Window and create an SSH
tunnel using ssh -L 8888:HOSTNAME:REMOTE_PORT [email protected]
.
For my output the command is:
ssh -L 8888:gtrx-1.cluster:8888 [email protected]
Important: You must replace the HOSTNAME:REMOTE_PORT (gtrx-1.cluster:8888) in the command below with your node name address from previous step
Note that whatever your log output says for hostname you will need to use in the command above. DO NOT just copy and paste the example, you have to replace the HOSTNAME:REMOTE_PORT (the gtrx-1.cluster:8888 part) to be the one your log output specifies.
On your laptop open a browser window and you can then browse to:
http://127.0.0.1:8888/?token=8a5d8e1laskdjfl1askjdfl1ksjadfl1kjsadlfsjadc3386
Important: Replace the “?token=TOKEN” part of the URL with your token from log file
Note that this is copied from the output log file where it defines your Jupyter address and TOKEN. You MUST copy the token from the log out put, and cannot just use the example above. It may take up to 10 minutes for the “jupyter.log” output to show you the text with your token.
For the remainder of your job run, the hostname and port will stay the same. If you close your laptop you will need to recreate an SSH Tunnel - you can reuse the “ssh -L8888” command above. If your job ends on Katahdin ACG, you need to resubmit your slurm job, and then modify your SSH Tunnel command with the new hostname. Jobs last for a maximum of 7 days.