Database Meets Deep Learning: Challenges and Opportunities

Abstract

Deep learning has recently become very popular on account of its incredible success in many complex data- driven applications, including image classification and speech recognition. The database community has worked on data-driven applications for many years, and therefore should be playing a lead role in supporting this new wave. However, databases and deep learning are differ- ent in terms of both techniques and applications. In this paper, we discuss research problems at the intersection of the two fields. In particular, we discuss possible improvements for deep learning systems from a database perspective, and analyze database applications that may benefit from deep learning techniques.

Introduction

  • These days, we witnessed the success of numerous data-driven machine-learning-based applications.
  • Insights that the database community can offer to deep learning?
    • Apply database techniques for optimizing deep learn- ing systems.
    • Donducted using deep learning techniques for database problems. Such as knowledge fusion and crowdsourcing, which are probabilistic problems. It is possible to apply deep learning techniques in these areas.

DATABASES TO DEEP LEARNING

  • Stand-alone Training
    • Use Nvidia GPU with the cuDNN library
    • Operation Scheduling, build a cost model for operation placing strategy like query plans estimate in database.
    • Memory Management, such as use paging and cacheing techniques in DB to GPU Memory management.
  • Distribute Training

The parameter server architecture is typically used, in which the workers compute parameter gradients and the servers update the parameter val- ues after receiving gradients from workers. There are two basic parallelism schemes for distributed training, namely, data parallelism and model par- allelism. In data parallelism, each worker is as- signed a data partition and a model replica, while for model parallelism, each worker is assigned a par- tition of the model and the whole dataset.

  • Communication and Synchronization
  • Concurrency and Consistency
  • Fault Tolerance

DEEP LEARNING TO DATABASES

[todo]

Conclusion

Databases have many techniques for optimizing system performance, while deep learning is good at learning e↵ective representation for data-driven applications. We note that these two “different” areas share some common techniques for improving the system performance, such as memory optimization and parallelism. We have discussed some possible improvements for deep learning sys- tems using database techniques, and research problems applying deep learning techniques in database applications

Share Comments