Tuesday, November 1, 2016

Deploying Heron typologies using SLURM

In this post we will explore how to deploy a Heron streaming topology on a HPC cluster using the SLURM scheduler.

First lets install Heron. Some of the core parts of Heron is written using C++. So it is important to get the correct binary for your OS in the HPC cluster. If there are no suitable binaries for Heron for your HPC environment, you can always build Heron from the source code. Heron documentation discusses how to compile from source code in different environments in great detail.

Lets say we have an Heron installation that works on the environment. Now lets try to install it. Usually HPC environments has a shared file system like NFS. You can use a shared location to install Heron.

Install Heron

We need to install heron client and tools packages. Heron client provides all the functionalities required to run a topology. Heron tools provide the things like UI for viewing the topologies.

In this setup deploy folder in the home directory is used to install Heron. The home directory is shared across the cluster. Note we are using the binaries built from the source.

cd /N/u/skamburu/deploy
mkdir heron
sh ./heron-client-install.sh  --prefix=/N/u/skamburu/deploy/heron
sh ./heron-tools-install.sh  --prefix=/N/u/skamburu/deploy/heron

You can add the heron bin directory to the PATH environment variable.

export PATH=$PATH:/N/u/skamburu/deploy/heron/bin

Run Topology

Now lets run an example topology shipped with Heron using the slurm scheduler.

cd /N/u/skamburu/deploy/heron/bin
./heron submit slurm /N/u/skamburu/deploy/heron/heron/examples/heron-examples.jar com.twitter.heron.examples.MultiSpoutExclamationTopology example

Heron UI

After running the example topology lets start the Heron tracker and UI. Before starting the tracker make sure to change the tracker configuration to point to slurm cluster.

vi /N/u/skamburu/deploy/heron/herontools/conf/heron_tracker.yaml

statemgrs:
  -
    type: "file"
    name: "local"
    rootpath: "~/.herondata/repository/state/slurm"
    tunnelhost: "localhost"

Now lets start the tracker and UI.

cd /N/u/skamburu/deploy/heron/bin
./heron-tracker &
./heron-ui &

This will start the Heron UI on port 8889. Since this is an HPC cluster usually the ports will be blocked from firewall. Usually we forward the ports to the local machine so that we can view the UI from the desktop.

ssh -i ~/.ssh/id_rsa -L 8889:localhost:8889 user@cluster

Now we can view the UI in the browser by pointing to the URL

http://localhost:8889/

Error handling

The Heron job is submitted to the Slurm scheduler using a bash script. This script only provides minimal configurations for Slurm. You can modify this script according to your environment to submit the jobs. For example in one cluster we had to specify the Slurm partition in the script. So we added that. Here is an example of the slurm script.
vi /N/u/skamburu/deploy/heron/heron/conf/slurm/slurm.sh

#!/usr/bin/env bash

# arg1: the heron executable
# arg2: arguments to executable

#SBATCH --ntasks-per-node=1
#SBATCH --time=00:30:00
#SBATCH --partition=delta

module load python

args_to_start_executor=$2
ONE=1
for i in $(seq 1 $SLURM_NNODES); do
    index=`expr $i - $ONE`
    echo "Exec" $1 $index ${@:3}
    srun -lN1 -n1 --nodes=1 --relative=$index $1 $index ${@:2} &
done

echo $SLURM_JOB_ID > slurm-job.pid

wait

If your job gets canceled before killing it using the heron kill command you may have to manually delete some files to submit it again. Usually you can delete the files in the

rm -rf ~/.herondata/topologies/slurm/skamburu/example

Killing the topology

To kill the topology you can use the heron kill command.
cd /N/u/skamburu/deploy/heron/bin
./heron kill slurm example

No comments:

Post a Comment