#Tutorial: Reflections on Almost Two Decades of Research into Stream Processing
This tutorial reflects on this research history by highlighting
a number of trends and best practices that can be identied in hindsight. It also enumerates a list of directions for future research in stream processing.
NOTABLE SYSTEMS:
- TinyDB
- STREAM
- TelegraphCQ [13],
- Gigascope [15],
- Aurora
- Borealis
- IBM Infosphere Systems
- System S
- S4
- Storm
- Spark Streaming
- Flink
- Kafka/Samza
- Millwheel
- Google Cloud DataFlow
- StreamScope
TRENDS
- From DSMSs to Big “Streaming” Data Frameworks
- Combine the low-latency benefits of streaming systems with the scalability properties of Big Data frameworks.
- Increased Importance of Exact Results
- providing the exactly once delivery semantics. Even transactional properties
- One-pass Computation vs Replayability
- Previously data was primarily kept in memory but for a very short period of time for low latency requirement assumption. Now use high-throughput persisted message buses such as Kafka ease of debugging and replayability. Simplifies fault-tolerance mechanisms.
- Unification of Batch and Streaming Models
- Flink and apache Beam.
- Domain-specific to General-purpose
- wireless sensor networks or network traffic monitoring
- Richer Window Specifications
- Time or count-based windows
- Session Windows (Google Cloud Dataflow and Kafka Streams) and Episodes/Frames
##BEST PRACTICES
- System-wide State Management
- Simplified Reasoning via Punctuation