Stream Processing
Stream Processing is based on the fundamental concept of unbounded streams of events (in contrast to static sets of bounded data as we typically find in relational databases).
Taking that unbounded stream of events, we often want to do something with it. An unbounded stream of events could be temperature readings from a sensor, network data from a router, order from an e-commerce system, and so on.
Let's imagine we want to take this unbounded stream of events, perhaps its manufacturing events from a factory about 'widgets' being manufactured.
We want to filter that stream based on a characteristic of the 'widget', and if it's red
route it to another stream. Maybe that stream we'll use for reporting, or driving another application that needs to respond to only red widgets
events:
This, in a rather crude nutshell, is stream processing. Stream processing is used to do things like:
- filter streams
- aggregate (for example, the sum of a field over a period of time, or a count of events in a given window)
- enrichment (deriving values within a stream of a events, or joining out to another stream)
As you mentioned, there are a large number of articles about this; without wanting to give you yet another link to follow, I would recommend this one.
Kafka Streams
Kafka Streams a stream processing library, provided as part of Apache Kafka. You use it in your Java applications to do stream processing.
In the context of the above example it looks like this:
Kafka Streams is built on top of the Kafka producer/consumer API, and abstracts away some of the low-level complexities. You can learn more about it in the documentation.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…