ModelHub: Lifecycle Management for Deep Learning

Abstract

Deep learning has improved state-of-the-art results in many important fields, and has been the subject of much research in recent years, leading to the development of several systems for facili- tating deep learning. Current systems, however, mainly focus on model building and training phases, while the issues of data management, model sharing, and lifecycle management are largely ignored. Deep learning modeling lifecycle contains a rich set of artiacts, such as learned parameters and training logs, and frequently conducted tasks, e.g., to understand the model behaviors and to try out new models. Dealing with such artifacts and tasks is cumbersome and left to the users. To address these issues in a comprehensive manner, we propose ModelHub, which includes a novel model versioning system (dlv); a domain specific language for searching through model space (DQL); and a hosted service (ModelHub) to store developed models, explore existing models, enumerate new models and share models with others.

Introduction

QQ20170919-221253@2x

The above model lifecycle demostrate the following challenges:

  • It is difficult to keep track of the many models developed and/or understand the differences amongst them.
  • The development lifecycle itself has time-consuming repetitive sub-steps, such as adding a layer at different places to adjust a model, searching through a set of hyper-parameters for the different variations, reusing learned weights to train models, etc.
  • The storage footprint of deep learning models tends to be very large.
  • Sharing and reusing models is not easy,

To solve this, ModelHub

  • a model versioning system (DLV) to store and query the models and their ver- sions.
  • a model enumeration and hyper-parameter tuning domain specific language (DQL)
  • a hosted deep learning model sharing system (ModelHub) to publish, discover and reuse models from others.

System Architecture

System Architecture

QQ20170919-222245@2x

Data Model

ModelHub works on two levels of data models: conceptual DNN model, and data model for the model versions in the DLV repository.

  • DNN Model: DAG of deep learning weights & bias
  • VCS Data Model: It consists of a network definition, weights, extracted metadata, files used to-
    gether with the model instance.
    • In the implementation, model versions can be viewed as a relation of M(name, id, N, W, M, F)

Query Facilities

Model Exploration Queries

  • users use this query to understand a particular model, query lineages of the models, and compare several models.

    Model Enumeration Queries

  • explore variations of cur- rently available models in a repository.
    • Select models to improve
    • Slice particular models
    • Construct new models
    • Try the new models on different hyper-parameters
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Query 1: DQL select query to pick the models.
select m1
where
m1.name like " alexnet_%" and
m1.creation_time > " 2015-11-22" and
m1[" conv[1,3,5]" ].next has POOL(" MAX" )
Query 2: DQL slice query to get a sub-network.
slice m2 from m1
where m1.name like " alexnet -origin%"
mutate m2.input = m1[" conv1" ] and
m2.output = m1[" fc7" ]
Query 3: DQL construct query to derive more models on existing ones.
construct m2 from m1
where
m1.name like " alexnet -avgv1%" and
m1[" conv*($1)" ].next has POOL(" AVG" )
mutate m1[" conv*($1)" ].insert = RELU(" relu$1" )
Query 4: DQL evaluate query to enumerate models with different net- work definitions, search hyper-parameters, and eliminate models.
evaluate m
from " query3"
with config = " path to config"
vary config.base_lr in [0.1, 0.01, 0.001] and
config.net[" conv*" ].lr auto and
config.input_data in [" path1" , " path2" ]
keep top(5, m[" loss" ], 100)
Share Comments

Database Meets Deep Learning: Challenges and Opportunities

Abstract

Deep learning has recently become very popular on account of its incredible success in many complex data- driven applications, including image classification and speech recognition. The database community has worked on data-driven applications for many years, and therefore should be playing a lead role in supporting this new wave. However, databases and deep learning are differ- ent in terms of both techniques and applications. In this paper, we discuss research problems at the intersection of the two fields. In particular, we discuss possible improvements for deep learning systems from a database perspective, and analyze database applications that may benefit from deep learning techniques.

Read More

Share Comments

Revisiting the Design of Data Stream Processing Systems on Multi-Core Processors

Share Comments

Building Connected Car Applications on Top of the World-Wide Streams Platform

Demo: Building Connected Car Applications on Top of the World-Wide Streams Platform

Abstract

ŒThe connected car is likely to play a fundamental role in the fore- seeable Internet of Œings. Œe connectivity aspect in combination with the available data (e.g. from GPS, on-board diagnostics, road sensors) and video (e.g. from dashcams and trac cameras) streams enable a range of new applications, e.g., accident avoidance, online route planning, energy optimization, etc.

Œese applications, however, come with an additional set of re- quirements which are not accommodated by the state-of-the-art stream processing platforms. We have built World-Wide Streams (WWS), a novel stream processing platform that has been explicitly designed with those requirements in mind. In this demo presenta- tion, we will show a number of connected car scenarios that we have built on top of WWS.

Share Comments

Tutorial: Reflections on Almost Two Decades of Research into Stream Processing

#Tutorial: Reflections on Almost Two Decades of Research into Stream Processing

This tutorial reflects on this research history by highlighting
a number of trends and best practices that can be identied in hindsight. It also enumerates a list of directions for future research in stream processing.

Read More

Share Comments

Exploring big volume sensor data with Vroom

Exploring big volume sensor data with Vroom

Comments:

ABSTRACT

State of the art sensors within a single autonomous vehicle (AV) can produce video and LIDAR data at rates greater than 30 GB/hour. Unsurprisingly, even small AV research teams can accumulate tens of terabytes of sensor data from multiple trips and multiple vehicles. AV practitioners would like to extract information about specic locations or specic situations for further study, but are often unable to. Queries over AV sensor data are dierent from generic analytics or spatial queries because they demand reasoning about elds of view as well as heavy computation to extract features from scenes. In this article and demo we present Vroom, a system for ad-hoc queries over AV sensor databases. Vroom combines domain specic properties of AV datasets with selective indexing and multi-query optimization to address challenges posed by AV sensor data.

Read More

Share Comments