ModelHub: Lifecycle Management for Deep Learning

2017-09-19

paper reading

Abstract

Deep learning has improved state-of-the-art results in many important ﬁelds, and has been the subject of much research in recent years, leading to the development of several systems for facili- tating deep learning. Current systems, however, mainly focus on model building and training phases, while the issues of data management, model sharing, and lifecycle management are largely ignored. Deep learning modeling lifecycle contains a rich set of artiacts, such as learned parameters and training logs, and frequently conducted tasks, e.g., to understand the model behaviors and to try out new models. Dealing with such artifacts and tasks is cumbersome and left to the users. To address these issues in a comprehensive manner, we propose ModelHub, which includes a novel model versioning system (dlv); a domain speciﬁc language for searching through model space (DQL); and a hosted service (ModelHub) to store developed models, explore existing models, enumerate new models and share models with others.

Introduction

QQ20170919-221253@2x

The above model lifecycle demostrate the following challenges:

It is diﬃcult to keep track of the many models developed and/or understand the diﬀerences amongst them.
The development lifecycle itself has time-consuming repetitive sub-steps, such as adding a layer at diﬀerent places to adjust a model, searching through a set of hyper-parameters for the diﬀerent variations, reusing learned weights to train models, etc.
The storage footprint of deep learning models tends to be very large.
Sharing and reusing models is not easy,

To solve this, ModelHub

a model versioning system (DLV) to store and query the models and their ver- sions.
a model enumeration and hyper-parameter tuning domain speciﬁc language (DQL)
a hosted deep learning model sharing system (ModelHub) to publish, discover and reuse models from others.

System Architecture

QQ20170919-222245@2x

Data Model

ModelHub works on two levels of data models: conceptual DNN model, and data model for the model versions in the DLV repository.

DNN Model: DAG of deep learning weights & bias
VCS Data Model: It consists of a network definition, weights, extracted metadata, files used to-
gether with the model instance.
- In the implementation, model versions can be viewed as a relation of M(name, id, N, W, M, F)

Query Facilities

Model Exploration Queries

users use this query to understand a particular model, query lineages of the models, and compare several models.
Model Enumeration Queries
explore variations of cur- rently available models in a repository.
- Select models to improve
- Slice particular models
- Construct new models
- Try the new models on diﬀerent hyper-parameters

Query 1: DQL select query to pick the models.
select m1
where
m1.name like " alexnet_%" and
m1.creation_time > " 2015-11-22" and
m1[" conv[1,3,5]" ].next has POOL(" MAX" )
Query 2: DQL slice query to get a sub-network.
slice m2 from m1
where m1.name like " alexnet -origin%"
mutate m2.input = m1[" conv1" ] and
m2.output = m1[" fc7" ]
Query 3: DQL construct query to derive more models on existing ones.
construct m2 from m1
where
m1.name like " alexnet -avgv1%" and
m1[" conv*($1)" ].next has POOL(" AVG" )
mutate m1[" conv*($1)" ].insert = RELU(" relu$1" )
Query 4: DQL evaluate query to enumerate models with diﬀerent net- work deﬁnitions, search hyper-parameters, and eliminate models.
evaluate m
from " query3"
with config = " path to config"
vary config.base_lr in [0.1, 0.01, 0.001] and
config.net[" conv*" ].lr auto and
config.input_data in [" path1" , " path2" ]
keep top(5, m[" loss" ], 100)

Database Meets Deep Learning: Challenges and Opportunities

2017-09-18

paper reading

Abstract

Deep learning has recently become very popular on account of its incredible success in many complex data- driven applications, including image classiﬁcation and speech recognition. The database community has worked on data-driven applications for many years, and therefore should be playing a lead role in supporting this new wave. However, databases and deep learning are differ- ent in terms of both techniques and applications. In this paper, we discuss research problems at the intersection of the two ﬁelds. In particular, we discuss possible improvements for deep learning systems from a database perspective, and analyze database applications that may beneﬁt from deep learning techniques.

Revisiting the Design of Data Stream Processing Systems on Multi-Core Processors

2017-09-17

Building Connected Car Applications on Top of the World-Wide Streams Platform

2017-09-15

paper reading

Demo: Building Connected Car Applications on Top of the World-Wide Streams Platform

Abstract

The connected car is likely to play a fundamental role in the fore- seeable Internet of ings. e connectivity aspect in combination with the available data (e.g. from GPS, on-board diagnostics, road sensors) and video (e.g. from dashcams and trac cameras) streams enable a range of new applications, e.g., accident avoidance, online route planning, energy optimization, etc.

ese applications, however, come with an additional set of re- quirements which are not accommodated by the state-of-the-art stream processing platforms. We have built World-Wide Streams (WWS), a novel stream processing platform that has been explicitly designed with those requirements in mind. In this demo presenta- tion, we will show a number of connected car scenarios that we have built on top of WWS.

Tutorial: Reflections on Almost Two Decades of Research into Stream Processing

2017-09-14

paper reading

#Tutorial: Reflections on Almost Two Decades of Research into Stream Processing

This tutorial reflects on this research history by highlighting
a number of trends and best practices that can be identied in hindsight. It also enumerates a list of directions for future research in stream processing.

Exploring big volume sensor data with Vroom

2017-09-13

paper reading

Exploring big volume sensor data with Vroom

Comments:

ABSTRACT

State of the art sensors within a single autonomous vehicle (AV) can produce video and LIDAR data at rates greater than 30 GB/hour. Unsurprisingly, even small AV research teams can accumulate tens of terabytes of sensor data from multiple trips and multiple vehicles. AV practitioners would like to extract information about specic locations or specic situations for further study, but are often unable to. Queries over AV sensor data are dierent from generic analytics or spatial queries because they demand reasoning about elds of view as well as heavy computation to extract features from scenes. In this article and demo we present Vroom, a system for ad-hoc queries over AV sensor databases. Vroom combines domain specic properties of AV datasets with selective indexing and multi-query optimization to address challenges posed by AV sensor data.

Abstract

Introduction

System Architecture

System Architecture

Data Model

Query Facilities

Model Exploration Queries

Model Enumeration Queries

Abstract

Demo: Building Connected Car Applications on Top of the World-Wide Streams Platform

Abstract

Exploring big volume sensor data with Vroom

Comments:

ABSTRACT