How to set up Apache Storm to run topologies locally

in

Apache Storm is similar to Hadoop and it’s MapReduce jobs concept. Apache Storm allows you to run topologies which execute indefinitely (until killed), this makes it an awesome tool for real time processing, analytic’s processing and more.

You can write topologies in any language and run them on a storm cluster. Topologies are small groups of code which mainly consist of Spouts and Bolts. Spouts will continuously emit data (reading from any data source), once it has a new data subset, it will pass it onto a Bolt for processing, which may also pass it onto other bolts or ack (acknowledge) it, to signify we have successfully processed this tuple.

Like Hadoop, Apache Storm automatically handles the management of running all Spouts and Bolts across multiple servers for best performance.

It’s no wonder companies such as Yahoo, Twitter, Spotify, Yell, Groupon, Flipboard and many others are utilizing this incredible tool.

Let’s have a look at how to setup and run Apache Storm topologies locally.


Download and Configuration setup

We need:

  • Zookeeper
  • Apache Storm

Save and unzip them into a directory e.g: ~/tools


Configure Zookeeper

open ~/tools/zookeeper/conf/zoo.cfg

Uncomment/change or add the following lines:

dataDir=/Users/anillakhman/tools/data/zookeeper_storage
clientPort=2181

Configure Apache Storm

Add the following lines to the top in open ~/tools/storm-src/conf/storm.yaml.example:

# Basic Settings
storm.local.dir: "/Users/anillakhman/tools/data/storm-local-data"

Build Apache Storm.

Add the bin paths to your environment path so you can access them anywhere.


Start it all up

Open a new terminal window for each command below:

# Start ZooKeeper
zkServer.sh start

# Start the Nimbus node
storm nimbus

# Start the Supervisor Node
storm supervisor

# Start the Storm Web UI
storm ui

You should now be able to visit http://localhost:8080 and see your storm cluster.


Run your topology

You’ll have to build your topology to get a .jar file in your target directory.

Once you have that you can submit it to your Storm UI.

# the `core` directory in Apache Storm
cd ~/storm-crawler/core

# clean and compile
mvn clean compile

# Package
mvn package -DskipTests=true

# Run it locally (This will run it in your terminal and output debugging info)
# You will not see this in your Web UI - It's for development only
storm jar target/storm-crawler-core-0.6-SNAPSHOT-jar-with-dependencies.jar com.digitalpebble.storm.crawler.CustomTopology TestTopology -local

cd core
mvn clean compile exec:java -Dstorm.topology=com.digitalpebble.storm.crawler.CrawlTopology -Dexec.args="-conf crawler-conf.yaml -local"

# Submit it to your storm cluster
# This will show up in your web ui, it will run continously, you can kill it through the UI
storm jar target/storm-crawler-core-0.6-SNAPSHOT-jar-with-dependencies.jar com.digitalpebble.storm.crawler.CustomTopology