Symmetries are of fundamental importance in all of science and therefore critical for the success of deep learning systems used in this domain. In this talk, I will give an overview of the different forms in which symmetries appear in physics and chemistry and explain the theoretical background behind equivariant neural networks. Then, I will discuss common ways of constructing equivariant networks in different settings and contrast manifestly equivariant networks with other techniques for reaching equivariant models. Finally, I will report on recent results about the symmetry properties of deep ensembles trained with data augmentation.
Deep Learning is nowadays a well-stablished method for different applications in science and technology. However, it has been unclear for a long time how the “learning process” actually occurs in different architectures, and how this knowledge could be used to optimize performance and efficiency. Recently, high-energy-physics-based ideas have been applied to the modelling of Deep Learning, thus translating the learning problem to an RG flow analysis in Quantum Field Theory (QFT). In this talk, I will explain how these quite complicated formulae describing such RG flows for different observables in neural networks at initialization, can be easily obtained from a few rules resembling Feynman rules in QFT. I will also comment on some work in progress which implements such rules for computing higher-order corrections to the frozen (infinite-width) NTK for particular activation functions, and how they evolve after a few steps of SGD.
We demonstrate that a generic deep ensemble is emergently equivariant under data augmentation in the large width limit. Specifically, the ensemble is equivariant at any training step, provided that data augmentation is used. Crucially, this equivariance also holds off-manifold and therefore goes beyond the intuition that data augmentation leads to approximately equivariant predictions. Furthermore, equivariance is emergent in the sense that predictions of individual ensemble members are not equivariant but their collective prediction is. Therefore, the deep ensemble is indistinguishable from a manifestly equivariant predictor. We prove this theoretically using neural tangent kernel theory and verify our theoretical insights using detailed numerical experiments. Based on joint work with Pan Kessel.
High-resolution wide-angle fisheye images are becoming more and more important for robotics applications such as autonomous driving. However, using ordinary convolutional neural networks or vision transformers on this data is problematic due to projection and distortion losses introduced when projecting to a rectangular grid on the plane. We introduce the HEAL-SWIN transformer, which combines the highly uniform Hierarchical Equal Area iso-Latitude Pixelation (HEALPix) grid used in astrophysics and cosmology with the Hierarchical Shifted-Window (SWIN) transformer to yield an efficient and flexible model capable of training on high-resolution, distortion-free spherical data. In HEAL-SWIN, the nested structure of the HEALPix grid is used to perform the patching and windowing operations of the SWIN transformer, enabling the network to process spherical representations with minimal computational overhead. We demonstrate the superior performance of our model on both synthetic and real automotive datasets, as well as a selection of other image datasets, for semantic segmentation, depth regression and classification tasks.
The neural tangent kernel (NTK) is a quantity closely related to the training dynamics of neural networks (NNs). It becomes particularly interesting in the infinite width limit of NNs, where this kernel becomes deterministic and time-independent, allowing for an analytical solution of the gradient descent dynamics under the mean squared error loss, resulting in a Gaussian process behaviour. In this talk, we will first introduce the NTK and its properties, and then discuss how it can be extended to NNs equivariant with respect to the regular representation. In analogy to the forward equation of the NTK of conventional NNs, we will present a recursive relation connecting the NTK to the corresponding kernel of the previous layer in an equivariant NN. As a concrete example, we provide explicit expressions for the symmetry group of 90° rotations and translations in the plane, as well as Fourier-space expressions for \(SO(3)\) acting on spherical signals. We support our theoretical findings with numerical experiments.
High-resolution wide-angle images, such as fisheye images, are increasingly important in applications like robotics and autonomous driving. Traditional neural networks struggle with these images due to projection and distortion losses when operating on their flat projections. In this presentation, I will introduce the HEAL-SWIN model, which addresses this issue by combining the SWIN transformer with the Hierarchical Equal Area iso-Latitude Pixelation (HEALPix) grid from astrophysics. This integration enables the HEAL-SWIN model to process inherently spherical data without projections, effectively eliminating distortion losses.
We demonstrate that a generic deep ensemble is emergently equivariant under data augmentation in the large width limit. Specifically, the ensemble is equivariant at any training step, provided that data augmentation is used. Crucially, this equivariance also holds off-manifold and therefore goes beyond the intuition that data augmentation leads to approximately equivariant predictions. Furthermore, equivariance is emergent in the sense that predictions of individual ensemble members are not equivariant but their collective prediction is. Therefore, the deep ensemble is indistinguishable from a manifestly equivariant predictor. We prove this theoretically using neural tangent kernel theory and verify our theoretical insights using detailed numerical experiments. Based on joint work with Pan Kessel.
In this talk I will present novel ideas which implement different high-energy physics tools for describing deep neural networks at initialization and after GD training.
Counterfactuals can explain classification decisions of neural networks in a human interpretable way. We propose a simple but effective method to generate such counterfactuals. More specifically, we perform a suitable diffeomorphic coordinate transformation and then perform gradient ascent in these coordinates to find counterfactuals which are classified with great confidence as a specified target class. We propose two methods to leverage generative models to construct such suitable coordinate systems that are either exactly or approximately diffeomorphic. We analyze the generation process theoretically using Riemannian differential geometry and validate the quality of the generated counterfactuals using various qualitative and quantitative measures. Related paper : https://arxiv.org/abs/2206.05075
A popular science introduction to geometric deep learning for high school teachers. (Given in Swedish.)
Neural network classifiers are black box models which lack inherent interpretability. In many practical applications like medical imaging or autonomous driving, interpretations of the network decisions are needed which are provided by the field of explainable AI.Counterfactuals provide intuitive explanations for neural network classifiers which help to identify reliance on spurious features and biases in the underlying dataset. In this talk, I will introduce Diffeomorphic Counterfactuals, a simple but effective method to generate counterfactuals. Diffeomorphic Counterfactuals are generated by performing a suitable coordinate transformation of the data space using a generative model and then gradient ascent in the new coordinates. I will present a theoretical analysis of the generation process using differential geometry and show experimental results which validate the quality of the generated counterfactuals using various qualitative and quantitative measures.
Despite its remarkable success, deep learning is lacking a strong theoretical foundation. One way to help alleviate this problem is to use ideas from differential geometry and group theory at various points in the learning process to arrive at a more principled approach of setting up the learning process. This approach goes by the name of geometric deep learning and has received a lot of attention in recent years. In this talk, I will summarize our work on some aspects of geometric deep learning, namely using group theory to guide the construction of neural network architectures and using the manifold structure of the input data to generate counterfactual explanations for neural networks motivated from differential geometry.
Geometric Deep Learning (GDL) is a vast and rapidly advancing field. In this talk, I provide a brief introduction to GDL, some approaches and applications, along with a few examples. An essential aspect of GDL is how the model handles symmetries in data or spaces. For data defined on a manifold, one such symmetry is the choice of local coordinates and I will present our formulation of a convolutional layer which is equivariant to the choice of local coordinates (gauge equivariant) along with a brief overview of the required structures and concepts. Finally I will discuss our findings on the benefits and drawbacks of enforcing equivariance in the model compared to augmenting the training data. Based on our findings I am going to present some questions to consider when choosing the best approach for your models.
We analyze the role of rotational equivariance in convolutional neural networks (CNNs) applied to spherical images. We compare the performance of the group equivariant networks known as S2CNNs and standard non-equivariant CNNs trained with an increasing amount of data augmentation. The chosen architectures can be considered baseline references for the respective design paradigms. Our models are trained and evaluated on single or multiple items from the MNIST- or FashionMNIST dataset projected onto the sphere. For the task of image classification, which is inherently rotationally invariant, we find that by considerably increasing the amount of data augmentation and the size of the networks, it is possible for the standard CNNs to reach at least the same performance as the equivariant network. In contrast, for the inherently equivariant task of semantic segmentation, the non-equivariant networks are consistently outperformed by the equivariant networks with significantly fewer parameters. We also analyze and compare the inference latency and training times of the different networks, enabling detailed tradeoff considerations between equivariant architectures and data augmentation for practical problems.
Neural network classifiers are black box models which lack inherent interpretability. In many practical applications like medical imaging or autonomous driving, interpretations of the network decisions are needed which are provided by the field of explainable AI. Counterfactuals provide intuitive explanations for neural network classifiers which help to identify reliance on spurious features and biases in the underlying dataset. In this talk, I will introduce Diffeomorphic Counterfactuals, a simple but effective method to generate counterfactuals. Diffeomorphic Counterfactuals are generated by performing a suitable coordinate transformation of the data space using a generative model and then gradient ascent in the new coordinates. I will present a theoretical analysis of the generation process using differential geometry and show experimental results which validate the quality of the generated counterfactuals using various qualitative and quantitative measures.
The field of geometric deep learning has gained a lot of momentum in recent years and attracted people with different backgrounds such as deep learning, theoretical physics and mathematics. This is also reflected by the considerable research activity in this direction at our department. In this talk, I will give an introduction into neural networks and deeplearning and mention the different branches of mathematics relevant to their study. Then, I will focus more specifically on the subject of geometric deep learning where symmetries in the underlying data are used to guide the construction of network architectures. This opens the door for mathematical tools such as representation theory and differential geometry to be used in deep learning, leading to interesting new results. I will also comment on how the cross-fertilization between machine learning and mathematics has recently benefited (pure) mathematics.
Normalizing flows are diffeomorphisms which are parameterized by neural networks. As a result, they can induce coordinate transformations in the tangent space of the data manifold. In this work, we demonstrate that such transformations can be used to generate interpretable explanations for de- cisions of neural networks. More specifically, we perform gradient ascent in the base space of the flow to generate counterfactuals which are clas- sified with great confidence as a specified target class. We analyze this generation process theo- retically using Riemannian differential geometry and establish a rigorous theoretical connection be- tween gradient ascent on the data manifold and in the base space of the flow.