Jan Gerken

Assistant Professor

About me

I lead a research group about mathematical foundations of AI, funded by the Wallenberg AI, Autonomous Systems and Software Program.

I have a background in string theory, in particular I computed string scattering amplitudes at genus one during my PhD. My current main research interests are:

wide neural networks, neural tangent kernels and connections to quantum field theory
mathematical aspects of geometric deep learning, ranging from the geometry of the data manifold to equivariant neural networks
computer vision for spherical data

I am also interested in quantum chemistry and quantum computing.

Publications

PEAR: Equal Area Weather Forecasting on the Sphere #

2025

Hampus Linander, Christoffer Petersson, Daniel Persson, Jan E. Gerken

Machine learning methods for global medium-range weather forecasting have recently received immense attention. Following the publication of the Pangu Weather model, the first deep learning model to outperform traditional numerical simulations of the atmosphere, numerous models have been published in this domain, building on Pangu’s success. However, all of these models operate on input data and produce predictions on the Driscoll–Healy discretization of the sphere which suffers from a much finer grid at the poles than around the equator. In contrast, in the Hierarchical Equal Area iso-Latitude Pixelization (HEALPix) of the sphere, each pixel covers the same surface area, removing unphysical biases. Motivated by a growing support for this grid in meteorology and climate sciences, we propose to perform weather forecasting with deep learning models which natively operate on the HEALPix grid. To this end, we introduce Pangu Equal ARea (PEAR), a transformer-based weather forecasting model which operates directly on HEALPix-features and outperforms the corresponding model on Driscoll–Healy without any computational overhead.

Preprint: arXiv

DL GDL ML4SCI

Equivariant Neural Tangent Kernels #

2025

Philipp Misof, Pan Kessel, Jan E. Gerken

Equivariant neural networks have in recent years become an important technique for guiding architecture selection for neural networks with many applications in domains ranging from medical image analysis to quantum chemistry. In particular, as the most general linear equivariant layers with respect to the regular representation, group convolutions have been highly impactful in numerous applications. Although equivariant architectures have been studied extensively, much less is known about the training dynamics of equivariant neural networks. Concurrently, neural tangent kernels (NTKs) have emerged as a powerful tool to analytically understand the training dynamics of wide neural networks. In this work, we combine these two fields for the first time by giving explicit expressions for NTKs of group convolutional neural networks. In numerical experiments, we demonstrate superior performance for equivariant NTKs over non-equivariant NTKs on a classification task for medical images.

Published: ICML 2025

Preprint: arXiv

NTK ENN

Learning Chern Numbers of Topological Insulators with Gauge Equivariant Neural Networks #

2025

Longde Huang, Oleksandr Balabanov, Hampus Linander, Mats Granath, Daniel Persson, Jan E. Gerken

Equivariant network architectures are a well-established tool for predicting invariant or equivariant quantities. However, almost all learning problems considered in this context feature a global symmetry, i.e. each point of the underlying space is transformed with the same group element, as opposed to a local gauge symmetry, where each point is transformed with a different group element, exponentially enlarging the size of the symmetry group. Gauge equivariant networks have so far mainly been applied to problems in quantum chromodynamics. Here, we introduce a novel application domain for gauge-equivariant networks in the theory of topological condensed matter physics. We use gauge equivariant networks to predict topological invariants (Chern numbers) of multiband topological insulators. The gauge symmetry of the network guarantees that the predicted quantity is a topological invariant. We introduce a novel gauge equivariant normalization layer to stabilize the training and prove a universal approximation theorem for our setup. We train on samples with trivial Chern number only but show that our models generalize to samples with non-trivial Chern number. We provide various ablations of our setup. Our code is available at this https URL.

Preprint: arXiv

GDL ENN

Emergent Equivariance in Deep Ensembles #

2024

Jan E. Gerken, Pan Kessel

We demonstrate that deep ensembles are secretly equivariant models. More precisely, we show that deep ensembles become equivariant for all inputs and at all training times by simply using data augmentation. Crucially, equivariance holds off-manifold and for any architecture in the infinite width limit. The equivariance is emergent in the sense that predictions of individual ensemble members are not equivariant but their collective prediction is. Neural tangent kernel theory is used to derive this result and we verify our theoretical insights using detailed numerical experiments.

Published: ICML 2024 (Oral)

Preprint: arXiv

NTK ENN

Geometric deep learning and equivariant neural networks #

2023

Jan E. Gerken, Jimmy Aronsson, Oscar Carlsson, Hampus Linander, Fredrik Ohlsson, Christoffer Petersson, Daniel Persson

We survey the mathematical foundations of geometric deep learning, focusing on group equivariant and gauge equivariant neural networks. We develop gauge equivariant convolutional neural networks on arbitrary manifolds \(\mathcal{M}\) using principal bundles with structure group K and equivariant maps between sections of associated vector bundles. We also discuss group equivariant neural networks for homogeneous spaces \(\mathcal {M}=G/K\), which are instead equivariant with respect to the global symmetry \(G\) on \(\mathcal {M}\). Group equivariant layers can be interpreted as intertwiners between induced representations of \(G\), and we show their relation to gauge equivariant convolutional layers. We analyze several applications of this formalism, including semantic segmentation and object detection networks. We also discuss the case of spherical networks in great detail, corresponding to the case \(\mathcal {M}=S^2=\textrm{SO}(3)/\textrm{SO}(2)\). Here we emphasize the use of Fourier analysis involving Wigner matrices, spherical harmonics and Clebsch-Gordan coefficients for \(G=\textrm{SO}(3)\), illustrating the power of representation theory for deep learning.

Published: Artificial Intelligence Review

Preprint: arXiv

ENN GDL

HEAL-SWIN: A Vision Transformer On The Sphere #

2023

Oscar Carlsson, Jan E. Gerken, Hampus Linander, Heiner Spieß, Fredrik Ohlsson, Christoffer Petersson, Daniel Persson

High-resolution wide-angle fisheye images are becoming more and more important for robotics applications such as autonomous driving. However, using ordinary convolutional neural networks or vision transformers on this data is problematic due to projection and distortion losses introduced when projecting to a rectangular grid on the plane. We introduce the HEAL-SWIN transformer, which combines the highly uniform Hierarchical Equal Area iso-Latitude Pixelation (HEALPix) grid used in astrophysics and cosmology with the Hierarchical Shifted-Window (SWIN) transformer to yield an efficient and flexible model capable of training on high-resolution, distortion-free spherical data. In HEAL-SWIN, the nested structure of the HEALPix grid is used to perform the patching and windowing operations of the SWIN transformer, resulting in a one-dimensional representation of the spherical data with minimal computational overhead. We demonstrate the superior performance of our model for semantic segmentation and depth regression tasks on both synthetic and real automotive datasets. Our code is available at https://github.com/JanEGerken/HEAL-SWIN.

Published: CVPR 2024

Preprint: arXiv

SCV

Towards closed strings as single-valued open strings at genus one #

2022

Jan E. Gerken, Axel Kleinschmidt, Carlos R. Mafra, Oliver Schlotterer, Bram Verbeek

We relate the low-energy expansions of world-sheet integrals in genus-one amplitudes of open- and closed-string states. The respective expansion coefficients are elliptic multiple zeta values in the open-string case and non-holomorphic modular forms dubbed “modular graph forms” for closed strings. By inspecting the differential equations and degeneration limits of suitable generating series of genus-one integrals, we identify formal substitution rules mapping the elliptic multiple zeta values of open strings to the modular graph forms of closed strings. Based on the properties of these rules, we refer to them as an elliptic single-valued map which generalizes the genus-zero notion of a single-valued map acting on multiple zeta values seen in tree-level relations between the open and closed string.

Published: Journal of Physics A: Mathematical and Theoretical

Preprint: arXiv

Equivariance versus Augmentation for Spherical Images #

2022

Jan E. Gerken, Oscar Carlsson, Hampus Linander, Fredrik Ohlsson, Christoffer Petersson, Daniel Persson

We analyze the role of rotational equivariance in convolutional neural networks (CNNs) applied to spherical images. We compare the performance of the group equivariant networks known as S2CNNs and standard non-equivariant CNNs trained with an increasing amount of data augmentation. The chosen architectures can be considered baseline references for the respective design paradigms. Our models are trained and evaluated on single or multiple items from the MNIST- or FashionMNIST dataset projected onto the sphere. For the task of image classification, which is inherently rotationally invariant, we find that by considerably increasing the amount of data augmentation and the size of the networks, it is possible for the standard CNNs to reach at least the same performance as the equivariant network. In contrast, for the inherently equivariant task of semantic segmentation, the non-equivariant networks are consistently outperformed by the equivariant networks with significantly fewer parameters. We also analyze and compare the inference latency and training times of the different networks, enabling detailed tradeoff considerations between equivariant architectures and data augmentation for practical problems.

Published: ICML 2022

Preprint: arXiv

SCV ENN

Diffeomorphic Counterfactuals With Generative Models #

2022

Ann-Kathrin Dombrowski, Jan E. Gerken, Klaus-Robert Müller, Pan Kessel

Counterfactuals can explain classification decisions of neural networks in a human interpretable way. We propose a simple but effective method to generate such counterfactuals. More specifically, we perform a suitable diffeomorphic coordinate transformation and then perform gradient ascent in these coordinates to find counterfactuals which are classified with great confidence as a specified target class. We propose two methods to leverage generative models to construct such suitable coordinate systems that are either exactly or approximately diffeomorphic. We analyze the generation process theoretically using Riemannian differential geometry and validate the quality of the generated counterfactuals using various qualitative and quantitative measures.

Published: IEEE TPAMI

Preprint: arXiv

GDL XAI

Modular Graph Forms and Scattering Amplitudes in String Theory #

2020

Jan Gerken

About me

Publications

PEAR: Equal Area Weather Forecasting on the Sphere #

Hampus Linander, Christoffer Petersson, Daniel Persson, Jan E. Gerken

Equivariant Neural Tangent Kernels #

Philipp Misof, Pan Kessel, Jan E. Gerken

Learning Chern Numbers of Topological Insulators with Gauge Equivariant Neural Networks #

Longde Huang, Oleksandr Balabanov, Hampus Linander, Mats Granath, Daniel Persson, Jan E. Gerken

Emergent Equivariance in Deep Ensembles #

Jan E. Gerken, Pan Kessel

Geometric deep learning and equivariant neural networks #

Jan E. Gerken, Jimmy Aronsson, Oscar Carlsson, Hampus Linander, Fredrik Ohlsson, Christoffer Petersson, Daniel Persson

HEAL-SWIN: A Vision Transformer On The Sphere #

Oscar Carlsson, Jan E. Gerken, Hampus Linander, Heiner Spieß, Fredrik Ohlsson, Christoffer Petersson, Daniel Persson

Towards closed strings as single-valued open strings at genus one #

Jan E. Gerken, Axel Kleinschmidt, Carlos R. Mafra, Oliver Schlotterer, Bram Verbeek

Equivariance versus Augmentation for Spherical Images #

Jan E. Gerken, Oscar Carlsson, Hampus Linander, Fredrik Ohlsson, Christoffer Petersson, Daniel Persson

Diffeomorphic Counterfactuals With Generative Models #

Ann-Kathrin Dombrowski, Jan E. Gerken, Klaus-Robert Müller, Pan Kessel

Modular Graph Forms and Scattering Amplitudes in String Theory #

Generating series of all modular graph forms from iterated Eisenstein integrals #

Jan E. Gerken, Axel Kleinschmidt, Oliver Schlotterer

Basis Decompositions and a Mathematica Package for Modular Graph Forms #

All-order differential equations for one-loop closed-string integrals and modular graph forms #

Jan E. Gerken, Axel Kleinschmidt, Oliver Schlotterer

Heterotic-string amplitudes at one loop: modular graph forms and relations to open strings #

Jan E. Gerken, Axel Kleinschmidt, Oliver Schlotterer

Holomorphic subgraph reduction of higher-point modular graph forms #

Jan E. Gerken, Justin Kaidi

Talks

Emergent Equivariance in Deep Ensembles #

Emergent Equivariance in Deep Ensembles #

Emergent Equivariance in Deep Ensembles #

Emergent Equivariance in Deep Ensembles #

Symmetries in AI4Science #

Geometric Deep Learning and Neural Tangent Kernels #

Symmetries and Neural Tangent Kernels #

Emergent Equivariance in Deep Ensembles #

Emergent Equivariance in Deep Ensembles #

Emergent Equivariance in Deep Ensembles #

Diffeomorphic Counterfactuals and Generative Models #

Diffeomorphic Counterfactuals and Generative Models #

Diffeomorphic Counterfactuals #

Geometric Deep Learning: From Pure Math to Applications #

Geometric Deep Learning: From Pure Math to Applications #

Equivariance versus Augmentation for Spherical Images #

Diffeomorphic Counterfactuals and Generative Models #

Geometric Deep Learning #

Diffeomorphic Explanations with Normalizing Flows #

Diffeomorphic Explanations with Normalizing Flows #