We will learn how to run Jupyter notebook on ACG Katahdin and access it on your laptop by port forwarding. Please use this method only if your project has outscaled your laptop and colab resources.
Table of Contents
- Setup VPN
- SSH to Katahdin
- SLURM
- First create a SLURM sbatch file
Setup VPN
Instructions here: vpn.maine.edu
Click on “Remote Access VPN” and then follow instructions for your operating system.
SSH to Katahdin
Next, ssh
to katahdin.acg.maine.edu
. Replace vdhiman
with your own
username that you got in the email from Steve. For different operating
systems, we have different instructions. For Windows, we will use PowerShell, not Putty.
For Mac and Linux
vdhiman@office-desktop:~$ ssh [email protected]
Last login: Sat Jan 21 13:37:12 2023 from jx3cth3.um.maine.edu
vdhiman@katahdin:~$
For Windows PowerShell
Instructions from here: Tutorials ssh
By default, the OpenSSH client will be located in the directory: C:\Windows\System32\OpenSSH
. You can also check that it is installed in Windows Settings > Apps > Optional features, then search for “OpenSSH” in your installed features.
SLURM
Running Jupyter server via SLURM enables you to run your Jupyter server on a SLURM node with more resources; sometimes allowing you to use GPU. It is a better practice to run heavy code via SLURM, because login nodes are meant for login only. This documentation specifically covers this process.
This guide is adapted from nero-docs.stanford.edu
With Katahdin ACG, you’ll need to submit this as a SLURM job. That way, your jupyter lab server gets placed on a SLURM node with enough resources to host it. To summarize: We are creating a slurm job that runs jupyterlab on a SLURM node, for up to 2 days (max is 7). Once running, we are going to connect to the jupyterlab instance with SSH port forwarding from our local laptop. A tunnel must be created as you cannot directly SSH to SLURM nodes on Katahdin.
First create a SLURM sbatch file
Replace $USER with your own ACG username. Use Terminal On Your Laptop:
SSH to Katahdin ACG
Create your sbatch file. You can use your text editor of choice.
vi jupyterLab.sh
Paste the following text into your sbatch script, and save the file.
#!/bin/bash
#SBATCH --job-name=jupyter
#SBATCH --partition=grtx # You can pick from https://acg.maine.edu/hpc#h.b5slztm4yz12
#SBATCH --gres=gpu:1 # not clear if this is obeyed https://slurm.schedmd.com/gres.html
#SBATCH --time=2-00:00:00
#SBATCH --mem=5GB
#SBATCH --output=/home/%u/logs-jupyter-%x-%j.log
module load nv/pytorch # Load pytorch singularity image
singularity exec --nv $PYTORCH_CONT jupyter notebook --ip=0.0.0.0 # start jupyter notebook
Replace the $USER
part
of your
#SBATCH --output=home/$USER/jupyter.log
with your ACG Username. Or, provide an alternate path for
your log output.
This tells SLURM to launch a Jupyter Lab server on the node with your requested resources.
Now we need to send this as a job to SLURM.
Submit this sbatch to SLURM:
vdhiman@katahdin:~$ sbatch jupyterLab.sh
Submitted batch job 1005424
Now, you can check that your job is running:
vdhiman@katahdin:~$ squeue -u $USER
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1005424 grtx jupyter vdhiman PD 0:00 1 (Priority)
Note the ST (Status) column says PD (Pending). After a suitable node is found, it should change to ST=R (Running). This might take a while if the server is busy.
vdhiman@katahdin:~$ squeue -u vdhiman
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1005424 grtx jupyter vdhiman R 0:01 1 grtx-1
Once it is running you can continue…
Check the log output to find out the HOSTNAME we need to use to create an SSH tunnel:
Check log file for Jupyter URL
You can use the tail -f
command to follow the last 10 lines of the log file.
tail -f ~/jupyter.log
The log file will output something like this:
vdhiman@katahdin:~$ tail jupyter.log
[I 19:09:29.687 NotebookApp] or http://127.0.0.1:8888/?token=8a5d8e1laskdjfl1askjdfl1ksjadfl1kjsadlfsjadc3386
[I 19:09:30.097 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 19:09:30.302 NotebookApp] No web browser found: could not locate runnable browser.
[C 19:09:30.566 NotebookApp]
To access the notebook, open this file in a browser:
file:///home/vdhiman/.local/share/jupyter/runtime/nbserver-12238-open.html
Or copy and paste one of these URLs:
http://grtx-1.cluster:8888/?token=8a5d8e1laskdjfl1askjdfl1ksjadfl1kjsadlfsjadc3386
or http://127.0.0.1:8888/?token=8a5d8e1laskdjfl1askjdfl1ksjadfl1kjsadlfsjadc3386
Note the file after “open this file in a browser”. Remove the file://
search
the file for link. For example,
vdhiman@katahdin:~$ grep window.location.href /home/vdhiman/.local/share/jupyter/runtime/nbserver-12238-open.html
window.location.href = "http://0.0.0.0:8888/tree?token=9946b79c2cae1c4744aa90ce58e4133add9d58d0ded77161";
vdhiman@katahdin:~$
This will give you a link. Note the port number. For me that is 8888. Let’s call this REMOTE_PORT=8888.
First check if the status of the job under ST is R for running. Then look for string under NODELIST(REASON). This gives me the HOSTNAME. For me the HOSTNAME is grtx-1.
Note the http://HOSTNAME:REMOTE_PORT/token=TOKEN in the first part, you’ll need that info to setup a port-forwarding connection. For me, the HOSTNAME=grtx-1.cluster, REMOTE_PORT=8888, and TOKEN=8a5d8e1laskdjfl1askjdfl1ksjadfl1kjsadlfsjadc3386
If the output simply says http://hostname:8888/ then you have to use the ouptut of squeue -u $USER under the nodelist column
We need to find the host where the job got scheduled. We can use squeue command to to that.
vdhiman@katahdin:~$ squeue -u $USER
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1086781 grtx jupyter vdhiman R 3:14:58 1 grtx-1
Create an SSH tunnel
We cannot access this yet. If grtx-1 was not firewalled, anyone (without your
password) could have
acessed this by replacing localhost
with grtx-1.acg.maine.edu
. However,
grtx-1.acg.maine.edu
is firewalled and we cannot access arbitrary ports on
katahdin. Luckily, ssh can help us access with port-forwarding.
TCP/UDP ports
Ports in computer and electrical engineering stands for an input-output interface. When we talk about TCP/UDP ports, the port numbers are software ports that are managed by the operating sytem. However, you can still “imagine” them as connections to the outside world. For example, the default ports for HTTP protocol is 80, HTTPS is 443, for SSH is 22. What does this mean? This means that you can add the correct port number to any URL and it will still work. For example, you can access umaine.edu using https://maine.edu or using https://maine.edu:443. You can access katahdin.acg.maine.edu through ssh by specifying 22 as the port number. This also means that the server side processes “listen” to the assigned port numbers on the server. The “ssh daemon” listens to the port 22, “https server daemon” listens to the port 443 and so on.
You can learn more about this in the COS 440:Computer Networking class or explore on your own.
Port forwarding
We want the communication between our browser and Jupyter notebook server to tunnel through the ssh connection. The information flow for the port forwarding is visualized here:
Then on your laptop, open a new Terminal Window and create an SSH
tunnel using ssh -L 8888:HOSTNAME:REMOTE_PORT [email protected]
.
For my output the command is:
ssh -L 8888:gtrx-1.cluster:8888 [email protected]
Important: You must replace the HOSTNAME:REMOTE_PORT (gtrx-1.cluster:8888) in the command below with your node name address from previous step
Note that whatever your log output says for hostname you will need to use in the command above. DO NOT just copy and paste the example, you have to replace the HOSTNAME:REMOTE_PORT (the gtrx-1.cluster:8888 part) to be the one your log output specifies.
On your laptop open a browser window and you can then browse to:
http://127.0.0.1:8888/?token=8a5d8e1laskdjfl1askjdfl1ksjadfl1kjsadlfsjadc3386
Important: Replace the “?token=TOKEN” part of the URL with your token from log file
Note that this is copied from the output log file where it defines your Jupyter address and TOKEN. You MUST copy the token from the log out put, and cannot just use the example above. It may take up to 10 minutes for the “jupyter.log” output to show you the text with your token.
For the remainder of your job run, the hostname and port will stay the same. If you close your laptop you will need to recreate an SSH Tunnel - you can reuse the “ssh -L8888” command above. If your job ends on Katahdin ACG, you need to resubmit your slurm job, and then modify your SSH Tunnel command with the new hostname. Jobs last for a maximum of 7 days.