Tutorial: Reflections on Almost Two Decades of Research into Stream Processing

#Tutorial: Reflections on Almost Two Decades of Research into Stream Processing

This tutorial reflects on this research history by highlighting
a number of trends and best practices that can be identied in hindsight. It also enumerates a list of directions for future research in stream processing.

NOTABLE SYSTEMS:

  • TinyDB
  • STREAM
  • TelegraphCQ [13],
  • Gigascope [15],
  • Aurora
  • Borealis
  • IBM Infosphere Systems
  • System S
  • S4
  • Storm
  • Spark Streaming
  • Flink
  • Kafka/Samza
  • Millwheel
  • Google Cloud DataFlow
  • StreamScope
  • From DSMSs to Big “Streaming” Data Frameworks
    • Combine the low-latency benefits of streaming systems with the scalability properties of Big Data frameworks.
  • Increased Importance of Exact Results
    • providing the exactly once delivery semantics. Even transactional properties
  • One-pass Computation vs Replayability
    • Previously data was primarily kept in memory but for a very short period of time for low latency requirement assumption. Now use high-throughput persisted message buses such as Kafka ease of debugging and replayability. Simplifies fault-tolerance mechanisms.
  • Unification of Batch and Streaming Models
    • Flink and apache Beam.
  • Domain-specific to General-purpose
    • wireless sensor networks or network traffic monitoring
  • Richer Window Specifications
    • Time or count-based windows
    • Session Windows (Google Cloud Dataflow and Kafka Streams) and Episodes/Frames

##BEST PRACTICES

  • System-wide State Management
  • Simplified Reasoning via Punctuation
Share Comments