SOLE: Scalable On-Line Execution of Continuous Queries on Spatio-temporal Data Streams
Abstract
This paper presents the Scalable On-Line Execution algorithm (SOLE, for short) for continuous and on-line evaluation of concurrent continuous spatio- temporal queries over data streams. Incoming spatio- temporal data streams are processed in-memory against a set of outstanding continuous queries. The SOLE algo- rithm utilizes the scarce memory resource eciently by keeping track of only the signicant objects. In-memory stored objects are expired (i.e., dropped) from memory once they become insignicant. SOLE is a scalable algo- rithm where all the continuous outstanding queries share the same buer pool. In addition, SOLE is presented as a spatio-temporal join between two input streams, a stream of spatio-temporal objects and a stream of spatio-temporal queries. To cope with intervals of high arrival rates of objects and/or queries, SOLE utilizes a load-shedding approach where some of the stored objects are dropped from memory. SOLE is implemented as a pipelined query operator that can be combined with tra- ditional query operators in a query execution plan to sup- port a wide variety of continuous queries. Performance experiments based on a real implementation of SOLE in- side a prototype of a data stream management system show the scalability and eciency of SOLE in highly dynamic environments.
Introduction
- The high arrival rates of spatio-temporal data streams along with its massive data sizes
- Exiting techniques for spatio- temporal databases rely mainly on the basic assumption that all incoming spatio-temporal data can be stored on disk. Thus, continuous query processing techniques aim to utilize the disk storage to produce in- cremental results of continuous queries.
- combine traditional spatio-temporal query processors and data stream query processors.
- process uncertainty areas using aconservative caching technique.
- load shedding schemes to support larger numbers of continuous queries
Related work
- Spatio-temporal Databases
- Existing algorithm materializing incoming spatio-temporal data in disk-based index structures
Memory-based data structures have been proposed in to deal with reasonable size of data that can fit in memory, but it is not scalable to large data sizes or streaming environments. (to add)
Data Stream Management Systems
- the spatial and temporal properties of data streams and/or continuous queries are overlooked by existing stream management prototypes
- Scalable execution of continuous queries in tradi- tional data streams aims to either detect common subex- pressions or share resources at the operator level. The main idea is to either add a special operator to the query plan to regu- late the load by discarding unimportant incoming tuples or dynamically adjust the window size and time granu- larity at runtime.
Basic Concepts in SOLE
Input Output
- Input Model. The inputs to SOLE are two streams: 1. A spaial temporal stream 2. A stream of continuous queries
- Output Model. avoid continuous reevaluation of continuous spatio-
temporal queries, updates the query re- sult by computing and sending only updates of the pre- viously reported answer. Positive updates, negative update
Supporting Various Query Types
- Moving Queries.
- kNN Queries, (Use variable circle region size)
Pipelined Operator
- physical pipelined operator that can interact with traditional query operators in a large pipelined query plan.