Exploring big volume sensor data with Vroom

Exploring big volume sensor data with Vroom

Comments:

ABSTRACT

State of the art sensors within a single autonomous vehicle (AV) can produce video and LIDAR data at rates greater than 30 GB/hour. Unsurprisingly, even small AV research teams can accumulate tens of terabytes of sensor data from multiple trips and multiple vehicles. AV practitioners would like to extract information about specic locations or specic situations for further study, but are often unable to. Queries over AV sensor data are dierent from generic analytics or spatial queries because they demand reasoning about elds of view as well as heavy computation to extract features from scenes. In this article and demo we present Vroom, a system for ad-hoc queries over AV sensor databases. Vroom combines domain specic properties of AV datasets with selective indexing and multi-query optimization to address challenges posed by AV sensor data.

##Introduction

  • AV generate from high-resolution cameras, lidar and GPS at about 10 MBps.

    Queries:

  • Q1 Compute basic statistics on recent trips such as data
    rates by sensor and location coverage.
  • Q2 [building 3D maps] Retrieve all forward-facing video frames of the corner of Vassar and Main St. in Cambridge, MA., ordered clockwise.
  • Q3 [ preparing labeled training] Retrieve lidar and video readings for all cameras in the vehicle, for intervals when any vehicle camera frame shows a bicycle. Group the data by trip, and order it by timestamp within each trip.
  • Q4 [ preparing labeled training] Retrieve all sensor readings in the minute leading up to an interesting event, such as a possible near miss. e.g., where a vehicle’s CAN bus records a sudden brake or sharp steer, group the readings by trip and order them by timestamp within each trip.

Challenges

  • Computational intensity of UDFs: such as deep learning based classification
  • Big volumes: ad-hoc query on large historical data
  • Many features of interest:
  • Interface and storage issues:

Architecture

  • Sophisticated feature precomputation and indexing:
  • Synthesizing cheap predicates:
  • Memoizing:
  • Storage clustering, based on the workload
  • Multi-query optimization:
  • [to read] polystore data model

System

Query Interface

Q1:

1
2
3
4
5
6
7
8
select sensor_id, sensor_type
sum(byte_size(sensor_reading)) as data_volume,
data_volume/trip_duration as data_rate,
count(*)/trip_duration as frequency
from raw_data
where time.now - trip_start < 6 days
group by trip_id, sensor_id
order by trip_id, data_rate desc

Q2:

1
2
3
4
5
6
7
let vassar_and_main = lat_lon_height(42.3628,-71.0915,7) IN
SELECT sensor_reading
FROM raw_data
WHERE sensor_reading.type IN (VideoFrame)
AND let sensor_pose = pose_estimate(sensor_id, TIMESTAMP) IN distance(sensor_pose, vassar_and_main) < 20
AND angle(sensor_pose.x_axis, line(sensor_pose, vassar_and_main)) < 30
ORDER BY angle(line(sensor_pose, std.east), line(sensor_pose, vassar_and_main))

Q3:

1
2
3
4
5
6
7
8
9
10
11
12
13
let bike_segments =
SELECT trip_id, TIMESTAMP - 5 AS t_start, TIMESTAMP + 5 AS t_end
FROM raw_data
WHERE sensor_reading.type IN (VideoFrame)
AND bike_detection_udf(sensor_reading) > 0.9 IN
SELECT DISTINCT TIMESTAMP, trip_id,
sensor_reading
FROM raw_data,
bike_segments WHERE sensor_reading.type (PointCloud, Video)
AND raw_data.trip_id = bike_segments.trip_id
AND raw_data.TIMESTAMP BETWEEN (t_start,
t_end) AND
ORDER BY trip_id, TIMESTAMP

Storage engines

Our demonstration prototype is implemented by combining a relational engine for metadata storage and file system based blob management.

Query processor

  • For Query Q1, a per trip aggregate triggers checks for existing per-trip memoized computations.
  • For Query Q2 explicitly uses geometric builtins referring to sensor points of view, geometric predicates involved and bound which parts of trip trajectories to skip completely look at. We leverage existing trajectory indexing work for storage.

    For Query q3:

Share Comments