MICAS 913
Course title: Deep Learning Theory, MICAS 913
Program: MICAS: Master in Machine Learning, Communications and Security
Volume: 33h (lectures 30h, final exam 3h), 1 ECTS credit
Instructor: Mansoor Yousefi
TA: Jamal Darweesh
Office hours: 3D55, bi-weekly, Fridays 17h–18h30, as well as on Zoom
Description:
This is a graduate course on deep learning theory, an important topic in machine learning. The course consists
of three parts on approximation theory, optimization theory and statistical theory.
The first part is dedicated to the study of the approximation error rates of the feed-forward neural networks
(NNs). The second part is on the analysis of the gradient-based optimization algorithms, particularly the
optimization error rates of the stochastic gradient descent (SGD) and its variants. In the third part of the
course, generalization performance is studied, proving bounds on the generalization error of the NNs with i.i.d
data.
The course includes a semester simulation project where students will apply NNs for receiver design in data transmission
over a nonlinear communication channel. The course is complied from research papers on neural networks
and deep learning theory. The students should read the reference papers before the class.
Syllabus
There are 10 lectures each 3h (2x1.5h), and a final exam.
The program can be adjusted based on the pace of progress and students’ feedback.
Learning outcomes
The learning objectives of the course are as follows.
Understand the expressive power of NNs in approximating important functional classes
Demonstrate the universal approximation, when the number of neurons tends to infinity
Compute the approximation error rates of NNs with piece-wise linear activation, with finite neurons
Analyze the convergence rate of the gradient-based optimization algorithms
Explain momentum, adaptive step-size and accelerated gradient descent (GD)
Understand the landscape of loss, no spurious local minima results, role of positive homogeneity, and the success of the local search methods in non-convex NN optimization
Understand why deep learning generalization defies classical statistical learning theory, and obtain bounds on the generalization error in NNs
Linearize a NN, and analyze the excess risk of the resulting neural tangent kernel model
Deduce the implicit bias of the gradient descent in two models. Derive the double-descent curve, and
understand the role of the over-parameterization in generalization (and in facilitating optimization).
Grading
The evaluation is based on a semester simulation project, as well as a final exam.
Simulation project (50%). The project is on deep learning of the nonlinear partial differential equations (PDEs).
The class is partitioned into several groups, each consisting of two students.
Each group designs a NN for equalization in data transmission over optical fiber modeled by the stochastic nonlinear
Schrodinger (NLS) equation. The simulation project is explained initially in the class, and is gradually
completed during the semester. A detailed guide is provided on the project where the steps are outlined.
Students submit a final report, implementing the steps and answering questions in the guide. A quantitative scheme is provided for grading.
The instructor will hold bi-weekly office hours to review and guide students’ progress and provide feedback.
Prerequisites
Bibliography
M. Anthony and P. L. Bartlett, Neural network learning: Theoretical foundations, Cambridge University
Press, 1999
|