Talks

Emergent Equivariance in Deep Ensembles #

Jan E. Gerken

06 May 2025

Applied CATS Seminar, KTH Stockholm

Slides

NTK ENN

In this talk, I will discuss recent results on the symmetry properties of deep ensembles trained with data augmentation. Specifically, we show that the ensemble is equivariant at any training step, provided that data augmentation is used. Crucially, this equivariance also holds off-manifold and therefore goes beyond the intuition that data augmentation leads to approximately equivariant predictions. Furthermore, equivariance is emergent in the sense that predictions of individual ensemble members are not equivariant but their collective prediction is. Therefore, the deep ensemble is indistinguishable from a manifestly equivariant predictor. In the infinite width limit, this predictor is in fact a group convolutional neural network. We prove this theoretically using neural tangent kernel theory and verify our theoretical insights using detailed numerical experiments. Based on joint work with Pan Kessel and Philipp Misof: arXiv:2403.03103 and arXiv:2406.06504.

Equivariant Neural Tangent Kernels #

Philipp Misof

13 Mar 2025

Mathematical Foundations of AI Seminar in Umeå

Slides

NTK ENN

In recent years, the neural tangent kernel (NTK) has proven to be a valuable tool to study training dynamics of neural networks (NN) analytically. In this talk, I will present how this NTK framework can be extended to equivariant NNs based on group convolutional NNs (GCNNs). Not only does this enable the analytic study of influences of hyperparameters, training biases etc. in equivariant NNs, but it also allows us to draw an interesting connection between data augmentation and manifestly equivariant architectures. In particular, we show that the mean predictions of an ensemble of data augmented non-equivariant networks coincide with the mean predictions of an ensemble of specific GCNNs at all training times in the infinite-width limit. We further provide explicit implementations of the equivariant NTK for roto-translations in the plane and 3d rotations. To evaluate the performance of the equivariant infinite width solution, we benchmark the models on quantum mechanical property prediction and medical image classification. This talk is based on joined work with Jan Gerken and Pan Kessel.

Emergent Equivariance in Deep Ensembles #

Jan E. Gerken

03 Mar 2025

Data Science and AI Seminar, Chalmers

Slides

NTK ENN

In this talk, I will discuss recent results on the symmetry properties of deep ensembles trained with data augmentation. Specifically, we show that the ensemble is equivariant at any training step, provided that data augmentation is used. Crucially, this equivariance also holds off-manifold and therefore goes beyond the intuition that data augmentation leads to approximately equivariant predictions. Furthermore, equivariance is emergent in the sense that predictions of individual ensemble members are not equivariant but their collective prediction is. Therefore, the deep ensemble is indistinguishable from a manifestly equivariant predictor. In the infinite width limit, this predictor is in fact a group convolutional neural network. We prove this theoretically using neural tangent kernel theory and verify our theoretical insights using detailed numerical experiments. Based on joint work with Pan Kessel and Philipp Misof: arXiv:2403.03103 and arXiv:2406.06504.

Equivariant Neural Tangent Kernels #

Philipp Misof

11 Feb 2025

Learning on graphs and geometry meetup in Uppsala

Slides

NTK ENN

In this talk I will present a compact overview of our recent work, where we extend the recent theory of the Neural Tangent Kernel (NTK) to equivariant neural networks. The NTK has been proven to be a useful analytical tool in the study of the training dynamics of wide NNs. Investigating the properties of the NTK associated to Group Convolutional NNs (GCNNs) allows us to draw a connection between data augmentation and manifestly equivariant architectures. In particular, we show that the mean predictions of an ensemble of data augmented non-equivariant networks coincide with the mean predictions of an ensemble of specific GCNNs at all training times in the infinite-width limit. This correspondence is numerically verified to also hold at finite width. We further show that the performance boost of \(\mathrm{SO}(3)\)-invariant architectures compared to non-equivariant counterparts in quantum mechanical property prediction extends to the infinite width limit.

A (More) General Framework for Equivariant Neural Networks #

Elias Nyholm

11 Feb 2025

Learning on graphs and geometry meetup in Uppsala

Slides

GDL ENN

Emergent Equivariance in Deep Ensembles #

Jan E. Gerken

04 Feb 2025

Seminar of the Laboratoryof Computational Science and Modelling (COSMO), EPFL

Slides

NTK ENN

In this talk, I will discuss recent results on the symmetry properties of deep ensembles trained with data augmentation. Specifically, we show that the ensemble is equivariant at any training step, provided that data augmentation is used. Crucially, this equivariance also holds off-manifold and therefore goes beyond the intuition that data augmentation leads to approximately equivariant predictions. Furthermore, equivariance is emergent in the sense that predictions of individual ensemble members are not equivariant but their collective prediction is. Therefore, the deep ensemble is indistinguishable from a manifestly equivariant predictor. In the infinite width limit, this predictor is in fact a group convolutional neural network. We prove this theoretically using neural tangent kernel theory and verify our theoretical insights using detailed numerical experiments. Based on joint work with Pan Kessel and Philipp Misof: http://arxiv.org/abs/2403.03103 and http://arxiv.org/abs/2406.06504

The quest for unification - Intersecting mathematics, physics and AI #

Daniel Persson

22 Nov 2024

Inauguration lecture for for promotion to full professor of mathematics

Slides

GDL

Emergent Equivariance in Deep Ensembles #

Jan E. Gerken

29 Oct 2024

BIFOLD Workshop on equivariance in ML for quantum chemistry

Slides

NTK ENN

Matematikens mysterier - från svarta hål till artificiell intelligens #

Daniel Persson

24 Oct 2024

Lecture for students at the high school "Hulebäcksgymnasiet"

Slides

GDL

Symmetries in AI4Science #

Jan E. Gerken

22 Oct 2024

Workshop on Machine-Learning-Based Sampling in Lattice Field Theory and Quantum Chemistry, TRA Colloquium

Slides

NTK ENN

Symmetries are of fundamental importance in all of science and therefore critical for the success of deep learning systems used in this domain. In this talk, I will give an overview of the different forms in which symmetries appear in physics and chemistry and explain the theoretical background behind equivariant neural networks. Then, I will discuss common ways of constructing equivariant networks in different settings and contrast manifestly equivariant networks with other techniques for reaching equivariant models. Finally, I will report on recent results about the symmetry properties of deep ensembles trained with data augmentation.

Geometric Deep Learning and Neural Tangent Kernels #

Jan E. Gerken

30 Aug 2024

WASP Supervisor Meeting 2024

Slides

NTK ENN

RG flow of the NTK dynamics at finite-width from Feynman diagrams #

Max Guillen

13 Aug 2024

IAIFI 2024 Summer Workshop

NTK

Deep Learning is nowadays a well-stablished method for different applications in science and technology. However, it has been unclear for a long time how the “learning process” actually occurs in different architectures, and how this knowledge could be used to optimize performance and efficiency. Recently, high-energy-physics-based ideas have been applied to the modelling of Deep Learning, thus translating the learning problem to an RG flow analysis in Quantum Field Theory (QFT). In this talk, I will explain how these quite complicated formulae describing such RG flows for different observables in neural networks at initialization, can be easily obtained from a few rules resembling Feynman rules in QFT. I will also comment on some work in progress which implements such rules for computing higher-order corrections to the frozen (infinite-width) NTK for particular activation functions, and how they evolve after a few steps of SGD.

Symmetries and Neural Tangent Kernels #

Jan E. Gerken

12 Aug 2024

IAIFI 2024 Summer Workshop

Slides

NTK ENN

Emergent Equivariance in Deep Ensembles #

Jan E. Gerken

25 Jul 2024

ICML 2024

Slides Video

NTK ENN

We demonstrate that a generic deep ensemble is emergently equivariant under data augmentation in the large width limit. Specifically, the ensemble is equivariant at any training step, provided that data augmentation is used. Crucially, this equivariance also holds off-manifold and therefore goes beyond the intuition that data augmentation leads to approximately equivariant predictions. Furthermore, equivariance is emergent in the sense that predictions of individual ensemble members are not equivariant but their collective prediction is. Therefore, the deep ensemble is indistinguishable from a manifestly equivariant predictor. We prove this theoretically using neural tangent kernel theory and verify our theoretical insights using detailed numerical experiments. Based on joint work with Pan Kessel.

HEAL-SWIN: A Vision Transformer On The Sphere #

Oscar Carlsson

19 Jun 2024

CVPR 2024

Slides

GDL

High-resolution wide-angle fisheye images are becoming more and more important for robotics applications such as autonomous driving. However, using ordinary convolutional neural networks or vision transformers on this data is problematic due to projection and distortion losses introduced when projecting to a rectangular grid on the plane. We introduce the HEAL-SWIN transformer, which combines the highly uniform Hierarchical Equal Area iso-Latitude Pixelation (HEALPix) grid used in astrophysics and cosmology with the Hierarchical Shifted-Window (SWIN) transformer to yield an efficient and flexible model capable of training on high-resolution, distortion-free spherical data. In HEAL-SWIN, the nested structure of the HEALPix grid is used to perform the patching and windowing operations of the SWIN transformer, enabling the network to process spherical representations with minimal computational overhead. We demonstrate the superior performance of our model on both synthetic and real automotive datasets, as well as a selection of other image datasets, for semantic segmentation, depth regression and classification tasks.

Neural Tangent Kernel for Equivariant Neural Networks #

Philipp Misof

12 Jun 2024

GeUmetric Deep Learning Workshop

Slides

NTK ENN

The neural tangent kernel (NTK) is a quantity closely related to the training dynamics of neural networks (NNs). It becomes particularly interesting in the infinite width limit of NNs, where this kernel becomes deterministic and time-independent, allowing for an analytical solution of the gradient descent dynamics under the mean squared error loss, resulting in a Gaussian process behaviour. In this talk, we will first introduce the NTK and its properties, and then discuss how it can be extended to NNs equivariant with respect to the regular representation. In analogy to the forward equation of the NTK of conventional NNs, we will present a recursive relation connecting the NTK to the corresponding kernel of the previous layer in an equivariant NN. As a concrete example, we provide explicit expressions for the symmetry group of 90° rotations and translations in the plane, as well as Fourier-space expressions for \(SO(3)\) acting on spherical signals. We support our theoretical findings with numerical experiments.

HEAL-SWIN: A Vision Transformer On The Sphere #

Oscar Carlsson

11 Jun 2024

GeUmetric Deep Learning Workshop

Slides

GDL

High-resolution wide-angle images, such as fisheye images, are increasingly important in applications like robotics and autonomous driving. Traditional neural networks struggle with these images due to projection and distortion losses when operating on their flat projections. In this presentation, I will introduce the HEAL-SWIN model, which addresses this issue by combining the SWIN transformer with the Hierarchical Equal Area iso-Latitude Pixelation (HEALPix) grid from astrophysics. This integration enables the HEAL-SWIN model to process inherently spherical data without projections, effectively eliminating distortion losses.

Emergent Equivariance in Deep Ensembles #

Jan E. Gerken

19 Mar 2024

Workshop in Statistical aspects related to Machine Learning

Slides

NTK ENN

Emergent Equivariance in Deep Ensembles #

Jan E. Gerken

06 Mar 2024

One World Seminar Series on the Mathematics of Machine Learning

Slides Video

NTK ENN

We demonstrate that a generic deep ensemble is emergently equivariant under data augmentation in the large width limit. Specifically, the ensemble is equivariant at any training step, provided that data augmentation is used. Crucially, this equivariance also holds off-manifold and therefore goes beyond the intuition that data augmentation leads to approximately equivariant predictions. Furthermore, equivariance is emergent in the sense that predictions of individual ensemble members are not equivariant but their collective prediction is. Therefore, the deep ensemble is indistinguishable from a manifestly equivariant predictor. We prove this theoretically using neural tangent kernel theory and verify our theoretical insights using detailed numerical experiments. Based on joint work with Pan Kessel.

Quantum Field Theory Behind the Foundations of Deep Learning #

Max Guillen

10 Jan 2024

Physics in Rio Rimac

NTK

In this talk I will present novel ideas which implement different high-energy physics tools for describing deep neural networks at initialization and after GD training.

Diffeomorphic Counterfactuals and Generative Models #

Jan E. Gerken

17 Nov 2023

KAIST XAI Research Center, Seoul

Slides

GDL XAI

Counterfactuals can explain classification decisions of neural networks in a human interpretable way. We propose a simple but effective method to generate such counterfactuals. More specifically, we perform a suitable diffeomorphic coordinate transformation and then perform gradient ascent in these coordinates to find counterfactuals which are classified with great confidence as a specified target class. We propose two methods to leverage generative models to construct such suitable coordinate systems that are either exactly or approximately diffeomorphic. We analyze the generation process theoretically using Riemannian differential geometry and validate the quality of the generated counterfactuals using various qualitative and quantitative measures. Related paper : https://arxiv.org/abs/2206.05075

A popular science introduction to geometric deep learning #

Oscar Carlsson

31 Oct 2023

Chalmers, Gothenburg, Sweden

Slides

GDL

A popular science introduction to geometric deep learning for high school teachers. (Given in Swedish.)

Diffeomorphic Counterfactuals and Generative Models #

Jan E. Gerken

05 Oct 2023

Electronics and Telecommunications Research Institute Korea

Slides

GDL XAI

Neural network classifiers are black box models which lack inherent interpretability. In many practical applications like medical imaging or autonomous driving, interpretations of the network decisions are needed which are provided by the field of explainable AI.Counterfactuals provide intuitive explanations for neural network classifiers which help to identify reliance on spurious features and biases in the underlying dataset. In this talk, I will introduce Diffeomorphic Counterfactuals, a simple but effective method to generate counterfactuals. Diffeomorphic Counterfactuals are generated by performing a suitable coordinate transformation of the data space using a generative model and then gradient ascent in the new coordinates. I will present a theoretical analysis of the generation process using differential geometry and show experimental results which validate the quality of the generated counterfactuals using various qualitative and quantitative measures.

Diffeomorphic Counterfactuals #

Jan E. Gerken

31 May 2023

Pollica Physics Center

Slides

GDL XAI

Geometric Deep Learning: From Pure Math to Applications #

Jan E. Gerken

13 Apr 2023

Uppsala University Amplitudes Seminar

Slides

GDL ENN

Geometric Deep Learning: From Pure Math to Applications #

Jan E. Gerken

12 Apr 2023

WASP Math/AI Meeting

Slides

GDL ENN

Despite its remarkable success, deep learning is lacking a strong theoretical foundation. One way to help alleviate this problem is to use ideas from differential geometry and group theory at various points in the learning process to arrive at a more principled approach of setting up the learning process. This approach goes by the name of geometric deep learning and has received a lot of attention in recent years. In this talk, I will summarize our work on some aspects of geometric deep learning, namely using group theory to guide the construction of neural network architectures and using the manifold structure of the input data to generate counterfactual explanations for neural networks motivated from differential geometry.

Half-way seminar: Geometric deep learning for data on manifolds and spherical images #

Oscar Carlsson

09 Feb 2023

Chalmers, Gothenburg, Sweden

Slides

GDL

Geometric Deep Learning (GDL) is a vast and rapidly advancing field. In this talk, I provide a brief introduction to GDL, some approaches and applications, along with a few examples. An essential aspect of GDL is how the model handles symmetries in data or spaces. For data defined on a manifold, one such symmetry is the choice of local coordinates and I will present our formulation of a convolutional layer which is equivariant to the choice of local coordinates (gauge equivariant) along with a brief overview of the required structures and concepts. Finally I will discuss our findings on the benefits and drawbacks of enforcing equivariance in the model compared to augmenting the training data. Based on our findings I am going to present some questions to consider when choosing the best approach for your models.

Geometric Deep Learning - From AI to gauge theory, and back #

Daniel Persson

13 Sep 2022

Albert Einstein Institute Potsdam, Quantum Gravity - The sounds of symmetry ("Hermannfest")

Slides

GDL ENN

Equivariance versus Augmentation for Spherical Images #

Jan E. Gerken

21 Jul 2022

ICML 2022

Slides

ENN SCV

We analyze the role of rotational equivariance in convolutional neural networks (CNNs) applied to spherical images. We compare the performance of the group equivariant networks known as S2CNNs and standard non-equivariant CNNs trained with an increasing amount of data augmentation. The chosen architectures can be considered baseline references for the respective design paradigms. Our models are trained and evaluated on single or multiple items from the MNIST- or FashionMNIST dataset projected onto the sphere. For the task of image classification, which is inherently rotationally invariant, we find that by considerably increasing the amount of data augmentation and the size of the networks, it is possible for the standard CNNs to reach at least the same performance as the equivariant network. In contrast, for the inherently equivariant task of semantic segmentation, the non-equivariant networks are consistently outperformed by the equivariant networks with significantly fewer parameters. We also analyze and compare the inference latency and training times of the different networks, enabling detailed tradeoff considerations between equivariant architectures and data augmentation for practical problems.

Diffeomorphic Counterfactuals and Generative Models #

Jan E. Gerken

11 May 2022

Geometric Deep Learning Seminar Chalmers

Slides

GDL XAI

Neural network classifiers are black box models which lack inherent interpretability. In many practical applications like medical imaging or autonomous driving, interpretations of the network decisions are needed which are provided by the field of explainable AI. Counterfactuals provide intuitive explanations for neural network classifiers which help to identify reliance on spurious features and biases in the underlying dataset. In this talk, I will introduce Diffeomorphic Counterfactuals, a simple but effective method to generate counterfactuals. Diffeomorphic Counterfactuals are generated by performing a suitable coordinate transformation of the data space using a generative model and then gradient ascent in the new coordinates. I will present a theoretical analysis of the generation process using differential geometry and show experimental results which validate the quality of the generated counterfactuals using various qualitative and quantitative measures.

Geometric Deep Learning #

Jan E. Gerken

24 Jan 2022

Mathematics Colloquium at Chalmers

Slides

GDL

The field of geometric deep learning has gained a lot of momentum in recent years and attracted people with different backgrounds such as deep learning, theoretical physics and mathematics. This is also reflected by the considerable research activity in this direction at our department. In this talk, I will give an introduction into neural networks and deeplearning and mention the different branches of mathematics relevant to their study. Then, I will focus more specifically on the subject of geometric deep learning where symmetries in the underlying data are used to guide the construction of network architectures. This opens the door for mathematical tools such as representation theory and differential geometry to be used in deep learning, leading to interesting new results. I will also comment on how the cross-fertilization between machine learning and mathematics has recently benefited (pure) mathematics.

Diffeomorphic Explanations with Normalizing Flows #

Jan E. Gerken

08 Oct 2021

TU Berlin Machine Learning Seminar

Slides

GDL XAI

Diffeomorphic Explanations with Normalizing Flows #

Jan E. Gerken

23 Jul 2021

ICML 2021 INNF+ Workshop

Slides

GDL XAI

Normalizing flows are diffeomorphisms which are parameterized by neural networks. As a result, they can induce coordinate transformations in the tangent space of the data manifold. In this work, we demonstrate that such transformations can be used to generate interpretable explanations for de- cisions of neural networks. More specifically, we perform gradient ascent in the base space of the flow to generate counterfactuals which are clas- sified with great confidence as a specified target class. We analyze this generation process theo- retically using Riemannian differential geometry and establish a rigorous theoretical connection be- tween gradient ascent on the data manifold and in the base space of the flow.