Data Processing Architectures - Lambda and Kappa

Back to Data-Science

Lambda and Kappa architectures are popular design solutions for real-time data processing. A good real-time data processing architecture must be fault-tolerant, scalable, supports batch and incremental updates, and is extensible.

Lambda

Lambda architecture is good for its many use-cases. Lambda is composed of 3 layers; batch, speed and serving:

lambda figure

Batch layer has two major tasks:

Speed layer is used to provide results in low-latency. Does real-time data processing by performing incremental updates to the batch layer results (with significantly lower computation cost).

Serving layer enables various queries of stored results (the data warehouse).

Kappa

addresses some pitfalls of Lambda, but it is not a replacement for Lambda. Kappa avoids the need to maintain two separate code bases for the batch and speed layers of processing. The key idea is to handle both real-time processing and continuous reprocessing using a single stream. Process stream runs the processing job, to enable real-time data processing.

kappa figure

Implementations

Both architechures require combining technologies like the following Apache technologies: Kafka, HBase, Hadoop (HDFS, MapReduce), Apache Spark, Apache Drill, Spark Streaming, Apache Storm, and Apache Samza.

source