10 Key Features in Apache Storm 1.0.0
10 Key Features in Apache Storm 1.0.0
The Apache Storm community recently announced the release of Apache Storm 1.0.0 stable. This is a noteworthy release that delivers several powerful features that relate to enterprise readiness, operational simplicity and ease of use by dramatically enhancing areas around performance, scalability, debugging ability, maintainability, and manageability.
Apache Storm is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use! Storm has many use cases: real-time analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.
Storm integrates with the queueing and database technologies you already use. A Storm topology consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the streams between each stage of the computation however needed.
Here are 10 Key features and highlights in Apache Storm 1.0.0,
- Improved Nimbus HA: Allow multiple instances of the Nimbus service to run in a cluster and perform leader election when a Nimbus node fails such that Nimbus hosts can join or leave the cluster at any time. This prevents Nimbus from being a single point of failure allowing existing topologies that undergo failures to be automatically detected and recovered.
- Automatic Back Pressure Support – Provide backpressure support so that if a receiver component is unable to handle incoming data/tuples, then the sender component can throttle the input based on configurable high/low watermarks. This throttling can be done without enabling ACKing and in a manner that is implemented independently of the Spout APIs.
- Windowing and State Management – Windowing computations are one of the most common use cases in stream processing. Support for windowing computations is a must for deriving actionable insights from real time data streams. Storm 1.0 now offers support for sliding and tumbling windows based on time duration and/or event count. With the addition of state management to core storm in Storm 1.0, the framework automatically and periodically snapshots the state of the bolts across the topology in a consistent manner. There is a default in-memory based state implementation and also a Redis backed implementation that provides state persistence.
- New Storm Connectors – Storm 1.0 introduces support for Cassandra and MongoDB NoSQL stores and support for Elasticsearch and Solr Indexing and Search servers.
- Distributed Cache API – Storm 1.0 introduces a distributed cache API that allows for the sharing of files (BLOBs) among topologies. Files in the distributed cache can be updated at any time from the command line, alleviating the need to repackage and redeploy the entire topology when updates are made to bundled resource data.
- Pacemaker Storm Daemon- Zookeeper was long considered as a bottleneck for managing heartbeats from workers/supervisors that affects Storm scalability due to the high volume of writes from workers. Pacemaker is an optional Storm daemon designed to process heartbeats from workers and it functions as a simple in-memory key/value store for persisting heartbeats.
- Storm Kafka Spout using new Client APIs – In our experience with solving real world enterprise use cases for real time analysis and rendering of streaming data, Storm and Kafka go together like peanut butter and jelly! This combination of messaging and processing technologies enables stream processing at linear scale. Storm 1.0 introduces support for the Storm Kafka Spout using Kafka 0.9 consumer APIs.
- Resource Aware Scheduling – Resource Aware Scheduling in Storm targets the goal of increasing overall throughput by maximizing resource utilization while minimizing network latency. In Storm 1.0, Resource Aware Scheduling schedules topology tasks among workers to best meet CPU and memory requirements specified for individual topology components while future Storm releases will extend this resource awareness to minimize network latency as well.
- Storm Topology Event Inspector – Storm 1.0 introduces the ability to view tuples flowing through the topology along with the ability to turn on/off debug events without having to stop/restart the entire topology. The user can select a specific Spout or Bolt, specify a configurable number of events to view and see incoming events and outgoing events from that component.
- Storm Performance Improvements – Storm 1.0 includes several performance related enhancements. These enhancements in areas such as Storm ACK-ing as well as in core Storm have delivered significant performance improvements to Storm 1.0 over previous versions.
Reference –Apache Storm Communities and http://hortonworks.com/blog/ (where it’s published already – all credit goes to them)
Interesting? Please subscribe to our blogs at www.dataottam.com to keep yourself trendy on Big Data, Analytics, and IoT.
And as always please feel free to suggest or comment firstname.lastname@example.org.