Wednesday, 15 January 2025

Streaming with Kafka API

The Kafka Streams API is a Java library for building real-time applications and microservices that efficiently process and analyze large-scale data streams using Kafka's distributed architecture.

Key Feature

  • Scalability and Fault Tolerance: Highly scalable and fault-tolerant, ensuring reliable data processing.
  • Stateful and Stateless Processing: Supports both stateful operations (e.g., aggregations and joins) and stateless operations (e.g., filtering and mapping).
  • Event-Time Processing: Handles event-time processing, crucial for applications needing to process events based on their occurrence time.
  • Integration with Kafka: Seamlessly integrates with Kafka, allowing consumption from and production to Kafka topics.

Use Cases

  • Real-Time Analytics: Analyze data in real-time for insights and decision-making.
  • Monitoring and Alerting: Monitor systems and trigger alerts based on real-time data.
  • Data Transformation: Transform and enrich data streams before storing or further processing.




Topology

A topology in Kafka Streams defines the computational logic of your application as a directed graph of processors and streams, specifying how input data is transformed into output data.


Processor


A processor in Kafka Streams is a node within the topology that represents a single processing step. Processors can be either stateless or stateful:

Processors use the Processor API to implement custom logic with methods like process(), init(), and close() for handling records, initialization, and cleanup. State is managed using state stores

State stores


In Kafka Streams, state stores are local, queryable storage engines that maintain state information for stateful operations, ensuring durability, fault tolerance, and data consistency

KStream

A KStream is an abstraction in Kafka Streams representing a continuous stream of records, where each record is a key-value pair. It is used for processing and transforming data in real-time.

  • KStream provides access to all records in a Kafka topic, treating each event independently.
  • Each event is processed by the entire topology and is immediately available to the KStream.
  • KStream, also known as a record stream or log, represents an infinite stream of records, similar to inserts in a database table.

Key Features of KStream

  • Record Stream: Represents an unbounded, continuous flow of records.
  • Transformations & Joins: Supports filtering, mapping, flat-mapping, and joining with KStreams, KTables, and GlobalKTables.
  • Aggregations & Fault Tolerance: Allows for aggregations, windowed operations, and ensures fault tolerance through Kafka's architecture.


How Kafka Streams executes the Topology ?




Serialization / Deserialization


Serdes is the factory class in Kafka Streams that takes care of handling serialization and deserialization of key and value.


Error Handlers in Kafka Streams





KTable

KTable is an abstraction in Kafka Streams which holds latest value for a given key.

  • Update-Stream/Change Log: KTable represents the latest value for a given key in a Kafka record.
  • Key-Based Updates: Records with the same key update the previous value; records without a key are ignored.
  • Relational DB Analogy: Similar to an update operation in a table for a given primary key.



Streaming with Kafka API 











No comments:

Post a Comment

Streaming with Kafka API

The Kafka Streams API is a Java library for building real-time applications and microservices that efficiently process and analyze large-sca...