How to set up Apache Storm to run topologies locally¶
Apache Storm is similar to Hadoop and it’s MapReduce jobs concept. Apache Storm allows you to run topologies which execute indefinitely (until killed), this makes it an awesome tool for real time processing, analytic’s processing and more.
You can write topologies in any language and run them on a storm cluster. Topologies are small groups of code which mainly consist of Spouts and Bolts. Spouts will continuously emit data (reading from any data source), once it has a new data subset, it will pass it onto a Bolt for processing, which may also pass it onto other bolts or ack (acknowledge) it, to signify we have successfully processed this tuple.
Like Hadoop, Apache Storm automatically handles the management of running all Spouts and Bolts across multiple servers for best performance.
It’s no wonder companies such as Yahoo, Twitter, Spotify, Yell, Groupon, Flipboard and many others are utilizing this incredible tool.
Let’s have a look at how to setup and run Apache Storm topologies locally.
Download and Configuration setup¶
- Apache Storm
Save and unzip them into a directory e.g: ~/tools
Uncomment/change or add the following lines:
Configure Apache Storm¶
Add the following lines to the top in open ~/tools/storm-src/conf/storm.yaml.example:
# Basic Settings storm.local.dir: "/Users/anillakhman/tools/data/storm-local-data"
Build Apache Storm.
Add the bin paths to your environment path so you can access them anywhere.
Start it all up¶
Open a new terminal window for each command below:
# Start ZooKeeper zkServer.sh start # Start the Nimbus node storm nimbus # Start the Supervisor Node storm supervisor # Start the Storm Web UI storm ui
You should now be able to visit http://localhost:8080 and see your storm cluster.
Run your topology¶
You’ll have to build your topology to get a .jar file in your target directory.
Once you have that you can submit it to your Storm UI.
# the `core` directory in Apache Storm cd ~/storm-crawler/core # clean and compile mvn clean compile # Package mvn package -DskipTests=true # Run it locally (This will run it in your terminal and output debugging info) # You will not see this in your Web UI - It's for development only storm jar target/storm-crawler-core-0.6-SNAPSHOT-jar-with-dependencies.jar com.digitalpebble.storm.crawler.CustomTopology TestTopology -local cd core mvn clean compile exec:java -Dstorm.topology=com.digitalpebble.storm.crawler.CrawlTopology -Dexec.args="-conf crawler-conf.yaml -local" # Submit it to your storm cluster # This will show up in your web ui, it will run continously, you can kill it through the UI storm jar target/storm-crawler-core-0.6-SNAPSHOT-jar-with-dependencies.jar com.digitalpebble.storm.crawler.CustomTopology