Lets say we have an Heron installation that works on the environment. Now lets try to install it. Usually HPC environments has a shared file system like NFS. You can use a shared location to install Heron.
Install Heron
We need to install heron client and tools packages. Heron client provides all the functionalities required to run a topology. Heron tools provide the things like UI for viewing the topologies.In this setup deploy folder in the home directory is used to install Heron. The home directory is shared across the cluster. Note we are using the binaries built from the source.
cd /N/u/skamburu/deploy mkdir heron sh ./heron-client-install.sh --prefix=/N/u/skamburu/deploy/heron sh ./heron-tools-install.sh --prefix=/N/u/skamburu/deploy/heron
You can add the heron bin directory to the PATH environment variable.
export PATH=$PATH:/N/u/skamburu/deploy/heron/bin
Run Topology
Now lets run an example topology shipped with Heron using the slurm scheduler.cd /N/u/skamburu/deploy/heron/bin ./heron submit slurm /N/u/skamburu/deploy/heron/heron/examples/heron-examples.jar com.twitter.heron.examples.MultiSpoutExclamationTopology example
Heron UI
After running the example topology lets start the Heron tracker and UI. Before starting the tracker make sure to change the tracker configuration to point to slurm cluster.vi /N/u/skamburu/deploy/heron/herontools/conf/heron_tracker.yaml statemgrs: - type: "file" name: "local" rootpath: "~/.herondata/repository/state/slurm" tunnelhost: "localhost"
Now lets start the tracker and UI.
cd /N/u/skamburu/deploy/heron/bin ./heron-tracker & ./heron-ui &
This will start the Heron UI on port 8889. Since this is an HPC cluster usually the ports will be blocked from firewall. Usually we forward the ports to the local machine so that we can view the UI from the desktop.
ssh -i ~/.ssh/id_rsa -L 8889:localhost:8889 user@cluster
Now we can view the UI in the browser by pointing to the URL
http://localhost:8889/
Error handling
The Heron job is submitted to the Slurm scheduler using a bash script. This script only provides minimal configurations for Slurm. You can modify this script according to your environment to submit the jobs. For example in one cluster we had to specify the Slurm partition in the script. So we added that. Here is an example of the slurm script.vi /N/u/skamburu/deploy/heron/heron/conf/slurm/slurm.sh #!/usr/bin/env bash # arg1: the heron executable # arg2: arguments to executable #SBATCH --ntasks-per-node=1 #SBATCH --time=00:30:00 #SBATCH --partition=delta module load python args_to_start_executor=$2 ONE=1 for i in $(seq 1 $SLURM_NNODES); do index=`expr $i - $ONE` echo "Exec" $1 $index ${@:3} srun -lN1 -n1 --nodes=1 --relative=$index $1 $index ${@:2} & done echo $SLURM_JOB_ID > slurm-job.pid wait
If your job gets canceled before killing it using the heron kill command you may have to manually delete some files to submit it again. Usually you can delete the files in the
rm -rf ~/.herondata/topologies/slurm/skamburu/example
Killing the topology
To kill the topology you can use the heron kill command.cd /N/u/skamburu/deploy/heron/bin ./heron kill slurm example
No comments:
Post a Comment