In this lesson we will:
- Introduce the concept of stream processing;
- Introduce stream processing frameworks such as Apache Flink and Kafka Streams.
Stream processing is the method by which we analyze, manipulate and respond to streaming data as it is received. Common examples of things businesses need to achieve with stream processing include:
- Calculating real time analytics e.g. the total value of orders in the last hour;
- Identifying situations e.g. a potentially fraudulent transaction;
- Manipulating data - e.g. cleaning data to meet organisational standards;
- Providing a real time customer experience - e.g. informing the user of wait times in a queue;
- Filtering data - e.g. to focus only on the subset of data of interest.
By processing data in real time and building experiences such as the above, in this way, organizations are empowered to react quickly to changing conditions and respond to events as they occur.
Processing real time streams of data is complex. Not only do we have data volume and latency challenges, we also need to ensure correctness and reliability even in the case of failing hardware or crashing processes.
Rather than coding this logic from scratch, a number of Stream Processing frameworks have been developed which allow developers to work at a higher level of abstraction and leave lower level concerns to the framework. Popular choices include Apache Flink and Kafka Streams, though more are discussed in a subsequent lesson.