Understand lambda Architecture in 2 Minutes
Understand Lambda Architecture in 2 minutes
What is Lambda Architecture ?
Lambda architecture which provides us a combined solution of realtime data with batch data.
What is the Need for Lambda Architecture ?
lambda Architecture was implemented mainly due to the Latency provided by the Map reduce paradigm, where the batch views was created on outdated data at least by 2 to 3 hours. This is definitely not acceptable in use cases where real Time updates could make a significant difference to the overall computation.
What are the three important Layers of Lambda Architecture ?
- Speed/Event Layer : This has the Stream Processing Architecture where we will be processing the Streaming data, technologies like Storm or Spark will serve the purpose
- Batch Layer : This has the data Appended to the master data in HDFS periodically over a time window and update/refresh our master data basically we call them them Historical/outdated data. Technologies like Hadoop and Mapreduce will serve the purpose
- Serving Layer: This simply queries the batch and real-time views and merges them. Technologies like HBASE/Cassandra can serve this purpose
Explain a Use case of Lambda Architecture ?
Let us consider the Click Stream data where we need to capture the number of unique visitors clicking on an AD. We receive the files every 1 hour from the source system (SFTP) where our batch job process the data in Batch Layer, similarly the same data is streamed every second to our Event layer and the event layer drops all the data of previous 1 hour as they are available in the batch layer. Now we can see a difference of 1 hour data between the Event Layer and the Batch Layer. Both the event Layer and the Batch Layer provides a view. These views are merged together and are stored in the serving layer, thus the serving layer providing an upto date data to business.
|Layers||Unique Visitors clicking AD in the window of 1 hour|
Below is the diagram depicting the above scenario:
Problem with Lambda Architecture ?
We have a similar functionality existing in the Batch and Event Layer if we use systems like Storm for Event and Hadoop Mapreduce for Batch we end up using two different kinds of programming language one for Storm and other Map Reduce.
How do you overcome the problem ?
Apache Spark and Apache Flink will help us to use the same kind of programming language in both Batch layer and Speed layer improving the productivity
Happy Learning !!