In this post we will explore how to deploy a Heron streaming topology on a HPC cluster using the SLURM scheduler.
First lets install Heron. Some of the core parts of Heron is written using C++. So it is important to get the correct binary for your OS in the HPC cluster. If there are no suitable binaries for Heron for your HPC environment, you can always build Heron from the source code.
Heron documentation discusses how to compile from source code in different environments in great detail.
Lets say we have an Heron installation that works on the environment. Now lets try to install it. Usually HPC environments has a shared file system like NFS. You can use a shared location to install Heron.
Install Heron
We need to install heron client and tools packages. Heron client provides all the functionalities required to run a topology. Heron tools provide the things like UI for viewing the topologies.
In this setup deploy folder in the home directory is used to install Heron. The home directory is shared across the cluster. Note we are using the binaries built from the source.
cd /N/u/skamburu/deploy
mkdir heron
sh ./heron-client-install.sh --prefix=/N/u/skamburu/deploy/heron
sh ./heron-tools-install.sh --prefix=/N/u/skamburu/deploy/heron
You can add the heron bin directory to the PATH environment variable.
export PATH=$PATH:/N/u/skamburu/deploy/heron/bin
Run Topology
Now lets run an example topology shipped with Heron using the slurm scheduler.
cd /N/u/skamburu/deploy/heron/bin
./heron submit slurm /N/u/skamburu/deploy/heron/heron/examples/heron-examples.jar com.twitter.heron.examples.MultiSpoutExclamationTopology example
Heron UI
After running the example topology lets start the Heron tracker and UI. Before starting the tracker make sure to change the tracker configuration to point to slurm cluster.
vi /N/u/skamburu/deploy/heron/herontools/conf/heron_tracker.yaml
statemgrs:
-
type: "file"
name: "local"
rootpath: "~/.herondata/repository/state/slurm"
tunnelhost: "localhost"
Now lets start the tracker and UI.
cd /N/u/skamburu/deploy/heron/bin
./heron-tracker &
./heron-ui &
This will start the Heron UI on port 8889. Since this is an HPC cluster usually the ports will be blocked from firewall. Usually we forward the ports to the local machine so that we can view the UI from the desktop.
ssh -i ~/.ssh/id_rsa -L 8889:localhost:8889 user@cluster
Now we can view the UI in the browser by pointing to the URL
http://localhost:8889/
Error handling
The Heron job is submitted to the Slurm scheduler using a bash script. This script only provides minimal configurations for Slurm. You can modify this script according to your environment to submit the jobs.
For example in one cluster we had to specify the Slurm partition in the script. So we added that. Here is an example of the slurm script.
vi /N/u/skamburu/deploy/heron/heron/conf/slurm/slurm.sh
#!/usr/bin/env bash
# arg1: the heron executable
# arg2: arguments to executable
#SBATCH --ntasks-per-node=1
#SBATCH --time=00:30:00
#SBATCH --partition=delta
module load python
args_to_start_executor=$2
ONE=1
for i in $(seq 1 $SLURM_NNODES); do
index=`expr $i - $ONE`
echo "Exec" $1 $index ${@:3}
srun -lN1 -n1 --nodes=1 --relative=$index $1 $index ${@:2} &
done
echo $SLURM_JOB_ID > slurm-job.pid
wait
If your job gets canceled before killing it using the heron kill command you may have to manually delete some files to submit it again. Usually you can delete the files in the
rm -rf ~/.herondata/topologies/slurm/skamburu/example
Killing the topology
To kill the topology you can use the heron kill command.
cd /N/u/skamburu/deploy/heron/bin
./heron kill slurm example