Exploring big volume sensor data with Vroom
Comments:
ABSTRACT
State of the art sensors within a single autonomous vehicle (AV) can produce video and LIDAR data at rates greater than 30 GB/hour. Unsurprisingly, even small AV research teams can accumulate tens of terabytes of sensor data from multiple trips and multiple vehicles. AV practitioners would like to extract information about specic locations or specic situations for further study, but are often unable to. Queries over AV sensor data are dierent from generic analytics or spatial queries because they demand reasoning about elds of view as well as heavy computation to extract features from scenes. In this article and demo we present Vroom, a system for ad-hoc queries over AV sensor databases. Vroom combines domain specic properties of AV datasets with selective indexing and multi-query optimization to address challenges posed by AV sensor data.
##Introduction
AV generate from high-resolution cameras, lidar and GPS at about 10 MBps.
Queries:
- Q1 Compute basic statistics on recent trips such as data
rates by sensor and location coverage. - Q2 [building 3D maps] Retrieve all forward-facing video frames of the corner of Vassar and Main St. in Cambridge, MA., ordered clockwise.
- Q3 [ preparing labeled training] Retrieve lidar and video readings for all cameras in the vehicle, for intervals when any vehicle camera frame shows a bicycle. Group the data by trip, and order it by timestamp within each trip.
- Q4 [ preparing labeled training] Retrieve all sensor readings in the minute leading up to an interesting event, such as a possible near miss. e.g., where a vehicle’s CAN bus records a sudden brake or sharp steer, group the readings by trip and order them by timestamp within each trip.
Challenges
- Computational intensity of UDFs: such as deep learning based classification
- Big volumes: ad-hoc query on large historical data
- Many features of interest:
- Interface and storage issues:
Architecture
- Sophisticated feature precomputation and indexing:
- Synthesizing cheap predicates:
- Memoizing:
- Storage clustering, based on the workload
- Multi-query optimization:
- [to read] polystore data model
System
Query Interface
Q1:
Q2:
Q3:
Storage engines
Our demonstration prototype is implemented by combining a relational engine for metadata storage and file system based blob management.
Query processor
- For Query Q1, a per trip aggregate triggers checks for existing per-trip memoized computations.
- For Query Q2 explicitly uses geometric builtins referring to sensor points of view, geometric predicates involved and bound which parts of trip trajectories to skip completely look at. We leverage existing trajectory indexing work for storage.
For Query q3: