**The Analytical Foundations of Deep Learning: Interpretability and Performance Guarantees**

**October 19-21 & 23, 2020**

**9 am to 2 pm PT (Noon to 5 pm ET) Daily**

We are on the verge of a deep learning revolution that is leading to many disruptive technologies: from automatic speech recognition systems such as Apple Siri, to automated supermarkets such as Amazon Go, to autonomous vehicles such as Google Car. As we increasingly employ deep learning in our daily lives to support important decisions, it becomes critical to understand the predictions made by deep neural networks (DNNs). The purpose of this workshop is to share progress and foster collaboration on the analytical foundations of deep learning. Our goal is to help explain phenomena observed in practice from rigorous mathematical and statistical perspectives, and lead to new principles that help practitioners improve the design of algorithms and architectures, ultimately leading to new deep learning systems that are “correct by construction” and offer performance guarantees in terms of robustness or fairness etc.

This workshop is to have three well connected components: starting with two tutorials in one day, followed by two days of invited presentations, and a brainstorming session for the last day.

*Tutorials:* We plan to start the workshop with two tutorials on the first day. The first tutorial aims to provide a mathematical justification for properties of conventional deep networks, such as global optimality, invariance, and stability of the learned representations. The second tutorial will cover more recent developments on graph neural networks that are applicable to broader family of data structures.
*Invited Presentations: *We will have two days of invited presentations and discussions by experts in the field. Among all the diverse set of topics related to deep learning, this workshop will focus more on “Principled design and interpretability” for the first day and “Guaranteed robustness and fairness” for the second day.
*Discussion and Brainstorming: *The last day of the workshop will be devoted to discussion and brainstorming on exciting open problems related to the analytical foundations for deep learning. We will encourage each participant to prepare and bring a list of problems of their own and discuss them at the workshop. One outcome of the workshop is a report on a list of fundamental and challenging open problems for future research.

**ORGANIZERS**

Yi Ma (University of California, Berkeley) and René Vidal (Johns Hopkins University)

**SPEAKERS**

Peter Bartlett (University of California, Berkeley), Tom Goldstein (University of Maryland), Gitta Kutyniok (Ludwig-Maximilians Universität München), Yi Ma (University of California, Berkeley), Alejandro Ribeiro (University of Pennsylvania), Guillermo Sapiro (Duke University), René Vidal (Johns Hopkins University), Soledad Villar (Johns Hopkins University), Max Welling (University of Amsterdam), Bin Yu (University of California, Berkeley)

**PROGRAM**

*(All times are Pacific Time)*

**Day 1 (Monday, Oct. 19): Tutorials**

**Chair: **René Vidal

##

**Abstract: **The past few years have seen a dramatic increase in the performance of recognition systems thanks to the introduction of deep networks for representation learning. However, the mathematical reasons for this success remain elusive. For example, a key issue is that the neural network training problem is non-convex, hence optimization algorithms may not return a global minima. In addition, the regularization properties of algorithms such as dropout remain poorly understood. The first part of this tutorial will overview recent work on the theory of deep learning that aims to understand how to design the network architecture, how to regularize the network weights, and how to guarantee global optimality. The second part of this tutorial will present sufficient conditions to guarantee that local minima are globally optimal and that a local descent strategy can reach a global minima from any initialization. Such conditions apply to problems in matrix factorization, tensor factorization and deep learning. The third part of this tutorial will present an analysis of the optimization and regularization properties of dropout for matrix factorization in the case of matrix factorization.

**Speaker: **René Vidal is the Herschel Seder Professor of Biomedical Engineering and Director of the Mathematical Institute for Data Science at Johns Hopkins University. He is also an Amazon Scholar, Chief Scientist at NORCE, and Associate Editor in Chief of TPAMI. His current research focuses on the foundations of deep learning and its applications in computer vision and biomedical data science. He is an AIMBE Fellow, IEEE Fellow, IAPR Fellow and Sloan Fellow, and has received numerous awards for his work, including the D’Alembert Faculty Award, J.K. Aggarwal Prize, ONR Young Investigator Award, NSF CAREER Award as well as best paper awards in machine learning, computer vision, controls, and medical robotics.

##

**Abstract: **We will develop the concept of Graph Neural Networks (GNNs), which intend to extend the success of CNNs to the processing of high dimensional signals in non-Euclidean domains. They do so by leveraging possibly irregular signal structures described by graphs. The following topics will be covered: (1) Graph Convolutions and GNN Architectures. The key concept enabling the definition of GNNs is the graph convolutional filter. GNN architectures compose graph filters with pointwise nonlinearities. (2) Fundamental Properties of GNNs. Graph filters and GNNs are suitable architectures to process signals on graphs because of their permutation equivariance. GNNs tend to work better than graph filters because they are Lipschitz stable to deformations of the graph that describes their structure. This is a property that regular graph filters can't have. (3) Distributed Control of Multiagent Systems. An exciting application domain for GNNs is the distributed control of large scale multiagent systems. Applications to the control of robot swarms and wireless communication networks will be covered.

**Speaker: **Alejandro Ribeiro is Professor of Electrical and Systems Engineering at the University of Pennsylvania. He holds a B.Sc. degree from Universidad de la República Oriental del Uruguay and M.Sc. and Ph.D. degrees from the University of Minnesota. His research is on collaborative intelligent systems, wireless autonomous networks, machine learning on network data and distributed collaborative learning. He received the 2017 Lindback and the 2012 S. Reid Warren, Jr teaching awards and is co-recipient of paper awards at ICASSP 2020, EUSIPCO 2019, CDC 2017, SSP Workshop 2016, SAM Workshop 2016, Asilomar SSC Conference 2015, ACC 2013, ICASSP 2006, and ICASSP 2005. He is a Fulbright scholar and a Penn Fellow.

##

**Abstract: **The past few years have seen a dramatic increase in the performance of recognition systems thanks to the introduction of deep networks for representation learning. However, the mathematical reasons for this success remain elusive. For example, a key issue is that the neural network training problem is non-convex, hence optimization algorithms may not return a global minima. In addition, the regularization properties of algorithms such as dropout remain poorly understood. The first part of this tutorial will overview recent work on the theory of deep learning that aims to understand how to design the network architecture, how to regularize the network weights, and how to guarantee global optimality. The second part of this tutorial will present sufficient conditions to guarantee that local minima are globally optimal and that a local descent strategy can reach a global minima from any initialization. Such conditions apply to problems in matrix factorization, tensor factorization and deep learning. The third part of this tutorial will present an analysis of the optimization and regularization properties of dropout for matrix factorization in the case of matrix factorization.

**Speaker: **René Vidal is the Herschel Seder Professor of Biomedical Engineering and Director of the Mathematical Institute for Data Science at Johns Hopkins University. He is also an Amazon Scholar, Chief Scientist at NORCE, and Associate Editor in Chief of TPAMI. His current research focuses on the foundations of deep learning and its applications in computer vision and biomedical data science. He is an AIMBE Fellow, IEEE Fellow, IAPR Fellow and Sloan Fellow, and has received numerous awards for his work, including the D’Alembert Faculty Award, J.K. Aggarwal Prize, ONR Young Investigator Award, NSF CAREER Award as well as best paper awards in machine learning, computer vision, controls, and medical robotics.

##

**Abstract: **We will develop the concept of Graph Neural Networks (GNNs), which intend to extend the success of CNNs to the processing of high dimensional signals in non-Euclidean domains. They do so by leveraging possibly irregular signal structures described by graphs. The following topics will be covered: (1) Graph Convolutions and GNN Architectures. The key concept enabling the definition of GNNs is the graph convolutional filter. GNN architectures compose graph filters with pointwise nonlinearities. (2) Fundamental Properties of GNNs. Graph filters and GNNs are suitable architectures to process signals on graphs because of their permutation equivariance. GNNs tend to work better than graph filters because they are Lipschitz stable to deformations of the graph that describes their structure. This is a property that regular graph filters can't have. (3) Distributed Control of Multiagent Systems. An exciting application domain for GNNs is the distributed control of large scale multiagent systems. Applications to the control of robot swarms and wireless communication networks will be covered.

**Speaker: **Alejandro Ribeiro is Professor of Electrical and Systems Engineering at the University of Pennsylvania. He holds a B.Sc. degree from Universidad de la República Oriental del Uruguay and M.Sc. and Ph.D. degrees from the University of Minnesota. His research is on collaborative intelligent systems, wireless autonomous networks, machine learning on network data and distributed collaborative learning. He received the 2017 Lindback and the 2012 S. Reid Warren, Jr teaching awards and is co-recipient of paper awards at ICASSP 2020, EUSIPCO 2019, CDC 2017, SSP Workshop 2016, SAM Workshop 2016, Asilomar SSC Conference 2015, ACC 2013, ICASSP 2006, and ICASSP 2005. He is a Fulbright scholar and a Penn Fellow.

**Day 2 (Tuesday, Oct. 20): Principled Design & Interpretability**

**Chair: **Yi Ma

##

**Abstract: **A number of powerful principles underlie much of modern physics, such as the behavior of variables and fields under symmetry transformations and the strange statistical laws of quantum mechanics. Can these principles also be used in deep learning? While this may look strange at first sight, we only need to realize that both physics and deep learning can be understood as information processing systems. In this talk, I will explain how we can apply representation theory for both global as well as local (gauge) transformations to deep learning. In the second half, I will explain how even the language of quantum mechanics can be applied to deep learning and might, with the advent of quantum computers, become a new powerful paradigm for deep learning.

**Speaker: **Max Welling is a research chair in Machine Learning at the University of Amsterdam and a VP Technologies at Qualcomm. He has a secondary appointment as a fellow at the Canadian Institute for Advanced Research (CIFAR). Max Welling has served as associate editor in chief of IEEE TPAMI from 2011-2015. He serves on the board of the Neurips Foundation since 2015 and has been Program Chair and General Chair of Neurips in 2013 and 2014, respectively. He was also Program Chair of AISTATS in 2009 and ECCV in 2016 and General Chair of MIDL 2018. He is a founding board member of ELLIS. Max Welling is recipient of the ECCV Koenderink Prize in 2010. He directs the Amsterdam Machine Learning Lab (AMLAB), and co-directs the Qualcomm-UvA deep learning lab (QUVA) and the Bosch-UvA Deep Learning lab (DELTA).

##

**Abstract: **In this talk, we provide a theoretical framework for interpreting neural network decisions by formalizing the problem in a rate-distortion framework. The solver of the associated optimization, which we coin Rate-Distortion Explanation (RDE), is then accessible to a mathematical analysis. We will discuss theoretical results as well as present numerical experiments showing that our algorithmic approach outperforms established methods, in particular, for sparse explanations of neural network decisions.

**Speaker: **Gitta Kutyniok has a Chair for Mathematical Foundations of Artificial Intelligence at the Ludwig-Maximilians Universität München. Her research interests are in applied harmonic analysis, compressed sensing, high-dimensional data analysis, imaging science, inverse problems, machine learning, numerical mathematics, partial differential equations, and applications to life sciences and telecommunication. She is the recipient of numerous awards, including the Research Prize of Gießen, a Heisenberg-Fellowship, the von Kaven Prize by the DFG, and an Einstein Chair. She is a member of the Berlin-Brandenburg Academy of Sciences and Humanities, a SIAM Fellow, and an IEEE Senior Member and was Chair of the SIAM Activity Group on Imaging Sciences and Scientific Director of the graduate school BIMoS at TU Berlin and is currently Co-Chair of the first SIAM conference on Mathematics of Data Science and Chair of the GAMM Activity Groups on Mathematical Signal- and Image Processing and Computational and Mathematical Methods in Data Science.

##

**Abstract: **Predictability, computability, and stability (PCS) are three core principles for veridical data science that aims at responsible, reliable, reproducible, and transparent data analysis and decision-making. They embed the scientific principles of prediction and replication in data-driven decision-making while recognizing the central role of computation. Based on these principles, the PCS framework consists of a workflow and documentation (in R Markdown or Jupyter Notebook) for the entire data science life cycle (DSLC) from problem formulation, data collection, data cleaning to modeling and data result interpretation and conclusions. Veridical interpretability is defined as trustworthy interpretability of data results that captures reality with predictability as a minimum and is reliable through stability analysis relative to appropriate perturbations to DSLC including human judgment calls. The PCS framework provides a protocol towards veridical interpretability. Two interpretation methods, DeepTune and ACD for DNN models, will be demonstrated as case studies of PCS towards veridical interpretability. In particular, DeepTune elicits meaningful and testable (image) interpretations of DNN-based models of single neurons in the difficult primate visual cortex area V4. ACD (agglomerative contextual decomposition) provides hierarchical interpretations of DNN predictions, and is effective at diagnosing incorrect predictions and identifying dataset bias, while being largely stable to adversarial perturbations.

**Speaker: **Bin Yu is the Chancellor’s Distinguished Professor and Class of 1936 Second Chair in the Departments of Statistics and Electrical Engineering and Computer Sciences at the University of California, Berkeley. Her research interests extend beyond statistics to leverage new computational developments that solve scientific problems by combining novel statistical machine learning approaches with the domain expertise across fields such as neuroscience, genomics, remote sensing, and precision medicine. Yu is a member of the National Academy of Sciences and of the American Academy of Arts and Sciences, a Guggenheim Fellow, a Tukey Memorial Lecturer of the Bernoulli Society, a Rietz Lecturer of IMS, and a COPSS E. L. Scott prize winner. She is former President of the Institute of Mathematical Statistics (IMS) and serves on the editorial board of Proceedings of National Academy of Sciences (PNAS) and the Scientific Advisory Committee of the UK Turing Institute for Data Science and AI.

##

**Abstract: **In this talk, we offer an entirely “white box’’ interpretation of deep (convolutional) networks. In particular, we show how modern deep architectures, linear (convolution) operators and nonlinear activations, and parameters of each layer can be derived from the principle of rate reduction (and invariance). All layers, operators, and parameters of the network are explicitly constructed via forward propagation, instead of learned via back propagation. All components of such a network have precise optimization, geometric, and statistical meaning. There are also several nice surprises from this principled approach that shed new light on fundamental relationships between forward (optimization) and backward (variation) propagation, between invariance and sparsity, and between deep networks and Fourier analysis.

**Speaker: **Yi Ma is a Professor in residence at the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. He received his Bachelor’s degree from Tsinghua University and MS and PhD degrees from UC Berkeley. His research interests are in computer vision, high-dimensional data analysis, and intelligent systems. He has been on the faculty of UIUC ECE from 2000 to 2011, the manager of the Visual Computing group of Microsoft Research Asia from 2009 to 2014, and the Dean of the School of Information Science and Technology from 2014 to 2017. He has published over 160 papers and three textbooks in computer vision, statistical learning, and low-dimensional models for high-dimensional data analysis. He received NSF Career award in 2004 and ONR Young Investigator award in 2005. He received the David Marr prize in computer vision in 1999 and has served as Program Chair and General Chair of ICCV 2013 and 2015 respectively. He is a Fellow of IEEE, SIAM, and ACM.

##

**Day 3 (Wednesday, Oct. 21): Guaranteed Robustness & Fairness**

**Chair: **René Vidal

##

**Abstract: **Classical theory that guides the design of nonparametric prediction methods like deep neural networks involves a tradeoff between the fit to the training data and the complexity of the prediction rule. Deep learning seems to operate outside the regime where these results are informative, since deep networks can perform well even with a perfect fit to noisy training data. We investigate this phenomenon of ‘benign overfitting’ in the simplest setting, that of linear prediction. We give a characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy. The characterization is in terms of two notions of effective rank of the data covariance. It shows that overparameterization is essential: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size. It also shows an important role for finite-dimensional data: benign overfitting occurs for a much narrower range of properties of the data distribution when the data lies in an infinite dimensional space versus when it lies in a finite dimensional space whose dimension grows faster than the sample size. We discuss implications for deep networks, for robustness to adversarial examples, and for the rich variety of possible behaviors of excess risk as a function of dimension. This is joint work with Phil Long, Gábor Lugosi, and Alex Tsigler.

**Speaker: **Peter Bartlett is professor of Computer Science and Statistics at the University of California, Berkeley, Associate Director of the Simons Institute for the Theory of Computing, Director of the Foundations of Data Science Institute, and Director of the Collaboration on the Theoretical Foundations of Deep Learning. His research interests include machine learning and statistical learning theory and he is the co-author of the book Neural Network Learning: Theoretical Foundations. He has been an Institute of Mathematical Statistics Medallion Lecturer, a winner of the Malcolm McIntosh Prize for Physical Scientist of the Year, and an Australian Laureate Fellow, and he is a Fellow of the IMS, ACM, and the Australian Academy of Science.

##

**Abstract: **We first formulate and formally characterize group fairness as a multi-objective optimization problem, where each sensitive group risk is a separate objective. We propose a fairness criterion where a classifier achieves minimax risk and is Pareto-efficient w.r.t. all groups, avoiding unnecessary harm, and can lead to the best zero-gap model if policy dictates so. We provide a simple optimization algorithm compatible with deep neural networks to satisfy these constraints. Since our method does not require test-time access to sensitive attributes, it can be applied to reduce worst-case classification errors between outcomes in unbalanced classification problems. We test the proposed methodology on real case-studies of predicting income, ICU patient mortality, skin lesions classification, and assessing credit risk, demonstrating how our framework compares favorably to other approaches. We then extend this work when the sensitive classes are not known even at training time, achieving this via a game theoretical optimization approach. We show the implications of this to the concept to subgroup robustness. This is joint work with Natalia Martinez, Martin Bertran, Afroditi Papadaki, and Miguel Rodrigues.

**Speaker: **Guillermo Sapiro is the James B. Duke Professor at Duke University. He received all his degrees from the Technion in Israel and works on theory and applications in computer vision, computer graphics, medical imaging, image analysis, and machine learning. He has authored and co-authored over 450 papers in these areas and has written a book published by Cambridge University Press, January 2001. He has developed an Autism app highlighted as one of the top health apps in the Apple Store, and is heavily involved in behavioral mobile health. He was awarded the Rothschild Fellowship for postdoctoral studies in 1993, the Office of Naval Research Young Investigator Award in 1998, the Presidential Early Career Awards for Scientist and Engineers (PECASE) in 1998, the National Science Foundation Career Award in 1999, and the National Security Science and Engineering Faculty Fellowship in 2010, and the Test of Time Award at ICCV 2011 and at ICML 2019. He is a Fellow of SIAM, IEEE, and the American Academy of Arts and Sciences, and is the founding Editor in Chief of the SIAM Journal on Image Sciences.

##

**Abstract: **Large, high capacity, deep learning models, trained on large amounts of data have shown to achieve impressive performance and generalize well. However, there is an argument for simpler models that claims that algorithms used for decision-making cannot be fair if they cannot explain their decisions. In this talk, we approach the problem of interpretable feature selection and the topic of robustness. We first derive a linear approach to select relevant features using lasso, and then we extend it to the context of deep learning using variational autoencoders and the famous gumbel softmax trick. Finally, we evaluate this method in the context of single-cell RNA sequencing data. This is joint work with Nabeel Sarwar and Bianca Dumitrascu.

**Speaker: **Soledad Villar is an Assistant Professor of Applied Mathematics and Statistics at Johns Hopkins University. She received a PhD in Mathematics from the University of Texas at Austin in 2017 and held research positions at the University of California, Berkeley and New York University, where she was also a collaboration scientist for the Algorithms and Geometry Simons collaboration. Her research is related to optimization, statistics, machine learning, and applied harmonic analysis. She is also interested in data-related problems from a geometric, topologic, and algorithmic point of view. Recently she has been working on data problems arising from computational biology.

##

**Abstract: **Evasion and poisoning attacks have been demonstrated on a range of systems, but usually in a simplified laboratory setting. In this talk, I will describe recent work on evasion attacks and present our work on dataset poisoning. I'll explain how attacks on toy systems can be scaled up and weaponized to break industrial systems, including copyright detection systems, algorithmic trading bots, and the Google and Amazon machine learning APIs.

**Speaker: **Tom Goldstein is an Associate Professor of Computer Science at the University of Maryland. His research lies at the intersection of machine learning and optimization, and targets applications in computer vision and signal processing. He works at the boundary between theory and practice, leveraging mathematical foundations, complex models, and efficient hardware to build practical, high-performance systems. Before joining the faculty at Maryland, Tom completed his PhD in Mathematics at UCLA, and was a research scientist at Rice University and Stanford University. Tom has been the recipient of several awards, including SIAM’s DiPrima Prize, a DARPA Young Faculty Award, a JP Morgan Fellowship Award, and a Sloan Fellowship.

**Day 4 (Friday, Oct. 23): Brainstorm and Discussion**

**Chair: **Yi Ma

##

**Lead: **Edgar Dobriban (University of Pennsylvania)

**Participants: **Sebastien Bubeck (Microsoft Research), Jinghui Chen (University of California, Los Angeles), Soheil Feizi (University of Maryland), Micah Goldblum (University of Maryland), Zico Kolter (Carnegie Mellon University), Omar Montasser (Toyota Technological Institute at Chicago), Cyrus Rashtchian (University of California, San Diego), Aditi Raghunathan (Stanford University), Alex Robey (University of Pennsylvania), Chong You (University of California, Berkeley), Hongyang Zhang (Toyota Technological Institute at Chicago)

*10:30 am – 10:45 am: Break*

##

**Leads: **Gitta Kutyniok (Ludwig-Maximilians Universität München) and Guillermo Sapiro (Duke University)

**Participants: **Solon Barocas (Cornell University, Microsoft Research) and Ana-Andreea Stoica (Columbia University)

*12:15 pm – 12:30 pm: Break*

##

**Leads: **Benjamin Haeffele (Johns Hopkins University) and Chong You (University of California, Berkeley)

**Participants: **Anima Anandkumar (California Institute of Technology, Nvidia), Song Han (Massachusetts Institute of Technology), Qiang Liu (University of Texas at Austin), Tess Smidt (Lawrence Berkeley National Laboratory)

**Format of Brainstorm Sessions:**

• Collect significant open problems.

• Discuss potential technical approaches.

• Present grand intellectual/industrial challenges that we can embark on.

• Draft an outline of a report by the group.