Scientific Journal

Applied Aspects of Information Technology

DP: A LIGHTWEIGHT LIBRARY FOR TEACHING DIFFERENTIABLE PROGRAMMING
Abstract:

Deep Learning has recently gained a lot of interest, as nowadays, many practical applications rely on it. Typically, these applications are implemented with the help of special deep learning libraries, which inner implementations are hard to understand. We developed such a library in a lightweight way with a focus on teaching. Our library DP (differentiable programming) has the following properties which fit particular requirements for education: small code base, simple concepts, and stable Application Programming Interface (API). Its core use case is to teach how deep learning libraries work in principle. The library is divided into two layers. The low-level part allows programmatically building a computational graph based on elementary operations. In machine learning, the computational graph is typically the cost function including a machine learning model, e.g. a neural network. Built-in reverse mode automatic differentiation on the computational graph allows the training of machine learning models. This is done by optimization algorithms, such as stochastic gradient descent. These algorithms use the derivatives to minimize the cost by adapting the parameters of the model. In the case of neural networks, the parameters are the neuron weights. The higher-level part of the library eases the implementation of neural networks by providing larger building blocks, such as neuron layers and helper functions, e.g., implementation of the optimization algorithms (optimizers) for training neural networks. Accompanied to the library, we provide exercises to learn the underlying principles of deep learning libraries and fundamentals of neural networks. An additional benefit of the library is that the exercises and corresponding programming assignments based on it do not need to be permanently refactored because of its stable API.

Authors:
Keywords
DOI
10.15276/aait.04.2019.3
References

1.    Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G. & Bengio, Y. (2010). “Theano: a CPU and GPU math Expression Compiler”. In Proceedings of the Python for scientific computing conference (SciPy) Vol. 4, No. 3, рр 3-10.

2.    Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C. & Ghemawat, S. (2016). “Tensorflow: Large-scale Machine Learning on Heterogeneous Distributed Systems”. arXiv preprint arXiv:1603.04467.

3.    Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L. & Lerer, A. (2017). “Automatic Differentiation in PyTorch”. NIPS 2017 Workshop on Autodiff.

4.    Maclaurin, D., Duvenaud, D. & Adams, R. P. (2015). “Autograd: Effortless Gradients in numpy. In ICML 2015. AutoML Workshop, Vol. 238.

5.    Baydin, A. G., Pearlmutter, B. A., Radul, A. A. & Siskind, J. M. (2018). “Automatic Differentiation in Machine Learning: a Survey”. In Journal of Machine Learning Research, 18(153), pp. 1-43. arXiv preprint arXiv:1502.05767.

6.    “Official Caffe Website”. [Electronic resource]. – Access mode https://caffe.berkeleyvision.org/ – Active link: – August 2019.

7.    Nickolls, J., Buck, I. & Garland, M. (2008, August). „Scalable Parallel Programming”. In 2008 IEEE Hot Chips 20 Symposiums (HCS), pp. 40-53. IEEE.

8.    Goodfellow, I., Bengio, Y. & Courville, A. (2016). “Deep Learning”. MIT press, рр. 271-273. DOI: 10.1007/s10710-017-9314-z.

9.    (2015). Blundell, Charles, et al. “Weight Uncertainty in Neural Networks”. Proceedings of the 32nd International Conference on International Conference on Machine Learning, Vol. 37, рр. 1613-1622, arXiv preprint arXiv: 1505.05424.

10.  Kingma, D. P. & Welling, M. (2013). “Auto-encoding Variational Bayes”. In 2nd International Conference on Learning Representations, {ICLR} 2014. arXiv preprint arXiv:1312.6114.

11.  Bottou, L., Curtis, F. E. & Nocedal, J. (2018). “Optimization Methods for large-scale Machine Learning”. SIAM Review, Vol. 60(2), pp. 223-311. DOI: 10.1137/16M1080173.

12.  Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A. & Badia, A. P. (2016). “Hybrid Computing using a Neural Network with Dynamic External Memory”. Nature, 538(7626), pp. 471–476. DOI: 10.1038/nature20101.

13.  Grefenstette, E., Hermann, K. M., Suleyman, M. & Blunsom, P. (2015). „Learning to Transducer with Unbounded Memory”. In Advances in neural information processing systems (NIPS), pp. 1828-1836. arXiv: 1506.02516.

14.  Gers, F. (2001). “Long Short-Term Memory in Recurrent Neural Networks”, PhD Thesis, Lausanne, EPF, Switzerland, pp 17-19.

15.  M. Collins. (2018). “Computational Graphs, and Backpropagation”, Lecture Notes, Columbia University, pp 11-23. [Electronic resource]. – Access mode http://www.cs.columbia.edu/ ~mcollins/ff2.pdf. – Active link: – August 2019.

16.  Travis E. Oliphant (2006). “A Guide to NumPy”, Trelgol Publishing, USA: рр. 13-17.

17.  (2016). Thomas Kluyver et al. “Jupyter Notebooks – a Publishing Format for Reproducible Computational Workflows”, In Positioning and Power in Academic Publishing: Players, Agents and Agendas. IOS Press. pp. 87-90. DOI:10.3233/978-1-61499-649-1-87.

18.  Herta, Christian et al. “deep.TEACHING.org – Website for Educational Material on Machine Learning”. [Electronic resource]. – Access mode https://www.deep-teaching.org/courses/differential-programming. – Active link: – August 2019

19.  Herta, Christian et al. “deep.TEACHING.org – “Repository of “deep.TEACHING.org”. [Electronic resource]. – Access mode: – https://gitlab.com/deep.TEACHING/educational-materials/blob/master/notebooks/differentiable-programming/dp.py – Active link: – August 2019.

20.  “Array Broadcasting in Numpy”. [Electronic resource]. – Access mode https://www.numpy.org/devdocs/user/theory.broadcasting.html. – Active link – August 2019.

21.  Ioffe, S. & Szegedy, C. (2015). “Batch Normalization: Accelerating deep Network Training by Reducing Internal Covariate Shift. ICML'15” Proceedings of the 32nd International Conference on International Conference on Machine Learning – Vol. 37, pp. 448-456. arXiv preprint arXiv:1502.03167.

22.  Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. (2014). “Dropout: a Simple way to Prevent Neural Networks from Overfitting”. The Journal оf Machine Learning Research, 15(1), pp. 1929-1958.

23.  Kingma, D. P. & Ba, J. (2014). “Adam: A Method for Stochastic Optimization”. 3rd International Conference on Learning Representations, ICLR 2015. arXiv preprint arXiv:1412.6980.

24.  Glorot, X. & Bengio, Y. (2010, March). “Understanding the Difficulty of Training deep feed Forward Neural Networks”. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249-256. PMLR 9:249-256, 2010.

25.  Hochreiter, S. (1991). „Untersuchungen zu Dynamischen Neuronalen Netzen“. Diploma thesis. TU Munich (in German).

26.  Jupyter Homepage. [Electronic resource]. – Access mode: https://jupyter.org/ – Active link: – August 2019.

27.  Jeffrey M. Perkel. (2018). “Why Jupyter is data Scientists' Computational Notebook of Choice”. Nature 563.7729., pp. 145-146. DOI: 10.1038/d41586-018-07196-1.

28.  OpenHub – Projects – TensorFlow [Electronic resource]. – Access mode: https://www.openhub.net/p/tensorflow/analyses/latest/languages_summary – Active link – August 2019.

29.  Herta, C., Voigt, B., Baumann, P., Strohmenger, K., Jansen, C., Fischer, O. & Hufnagel, P. (2019). “Deep Teaching: Materials for Teaching Machine and Deep Learning. In HEAD'19”. 5th International Conference on Higher Education Advances, pp. 1153-1131. DOI: http://dx.doi.org/10.4995/HEAd19.2019.9177.

Published:
Last download:
9 May 2021

Contents


[ © KarelWintersky ] [ All articles ] [ All authors ]
[ © Odessa National Polytechnic University, 2018.]