Equivariant neural networks and geometric deep learning

The notion of ‘‘geometric deep learning” is often used as an umbrella term to describe various approaches to geometric theory applied to deep learning. It has been compared to the famous “Erlangen program” in mathematics, proposed by Klein in 1872 as a ‘‘unified theory of geometry’’, connecting group theory, and geometry in profound ways.

In a similar way, our research can be viewed as building a “unified mathematical theory of deep learning’’, bringing geometry, group theory, representation theory and theoretical physics into the realm of machine learning.

What is the mathematical framework underlying deep learning? One promising direction is to consider “symmetries”  as an underlying design principle for network architectures. This can be implemented by constructing deep neural networks on a group G that acts transitively on the input data. This is directly relevant for instance in the case of spherical signals where G is a rotation group. Even more generally, it is natural to consider the question of how to train neural networks in the case of “non-Euclidean data”.

Relevant applications include omnidirectional computer vision, biomedicine, and climate observations, just to mention a few situations where data is naturally “non-flat”. Mathematically, this calls for developing a theory of deep learning on manifolds, or even more exotic structures, like graphs or algebraic varieties. A special class consists of homogeneous spaces G/H, where H is a subgroup. The project aims to develop the mathematical framework for equivariant neural networks. Below we describe this in some more detail.

[For a detailed mathematical account of this field, see our extensive review article “Geometric deep learning and equivariant neural networks”.]

Group equivariant convolutional networks

The basic idea of deep learning is that learning processes takes place in multi-layer networks known as Deep Neural Networks (DNN) of “artificial neurons”, where each layer receives data from the preceding layer and processes it before sending it to the subsequent layer.

Suppose one wishes to categorize some data sample X according to which class Y it belongs to. As a simple example, the input sample X could be an image and the output Y could be a binary classification of whether a dog or a cat is present in the image. The first layers of a DNN would learn some basic low-level features, such as edges and contours, which are then transferred as input to the subsequent layers. These layers then learn more sophisticated high-level features, such as combinations of edges, legs and ears. The learning process takes place in the sequence of hidden layers, until finally producing an output Y’, to be compared with the correct image class Y. The better the learning algorithm, the closer the DNN predictions Y’ will be to Y on new data samples it has not trained on. In short, one wishes to minimize the “loss function”, which measures the difference between the output Y’ and the class Y.

Convolutional neural networks (CNNs) have been enormously successful in various applications concerning image recognition and feature extraction. Roughly speaking, convolutional neural networks are ordinary deep networks that implement convolution in place of matrix multiplication. One of the main reasons for their power is the built in “translation equivariance”, which implies that a translation of the pixels in an image produces an overall translation of the convolution. Since each layer is translation equivariant all representations will be translated when the input data is translated, resulting in efficient weight sharing layer by layer throughout the network.

There are, however, many potential applications where invariance with respect to other transformations, like rotations, are desired, and this is not captured by ordinary convolutional neural networks. A striking example is omnidirectional computer vision, which is relevant for autonomous cars or drones. Other examples range from protein structure analysis to cosmology. This calls for the development of a general theory of convolutional neural networks which are invariant, or, more generally, equivariant with respect to arbitrary groups of transformations. This allows to import powerful techniques from group theory and representation theory into deep learning.

Deep learning on manifolds and connection with gauge theory

Taking symmetries of neural networks as a fundamental design principle, it is very natural to consider the question of how to train neural networks in the case of “non-flat” data. Relevant applications include fisheye cameras, biomedicine, and cosmological data, just to mention a few situations where the data is naturally curved. Mathematically, this calls for developing a theory of deep learning on manifolds, or even more exotic structures, like graphs or algebraic varieties. This rapidly growing research field is referred to as “geometric deep learning”. In mathematics we study manifolds using differential geometry which is a generalization of calculus to curved spaces. For an arbitrary manifold one cannot define global coordinates, i.e. coordinates that are the same throughout. Instead one considers the manifold as a collection of local regions, or “patches”, in which we can define local coordinates. The full manifold is then obtained by gluing these local patches in a consistent way. Similarly, manifolds usually don’t have global symmetries, but one needs to consider local, or “gauge”, symmetries that act separately within each patch. For convolutional networks a theory of gauge equivariant CNNs has been proposed with inspiration from the physics of gauge theories and general relativity. The basic feature here is to consider convolutions which are equivariant with respect to local transformations, meaning that they can vary over the manifold. Such transformations are called gauge transformations in physics, hence the name.

Relevant publications

Equivariant Neural Tangent Kernels
2024
Philipp Misof, Pan Kessel, Jan E. Gerken

Equivariant neural networks have in recent years become an important technique for guiding architecture selection for neural networks with many applications in domains ranging from medical image analysis to quantum chemistry. In particular, as the most general linear equivariant layers with respect to the regular representation, group convolutions have been highly impactful in numerous applications. Although equivariant architectures have been studied extensively, much less is known about the training dynamics of equivariant neural networks. Concurrently, neural tangent kernels (NTKs) have emerged as a powerful tool to analytically understand the training dynamics of wide neural networks. In this work, we combine these two fields for the first time by giving explicit expressions for NTKs of group convolutional neural networks. In numerical experiments, we demonstrate superior performance for equivariant NTKs over non-equivariant NTKs on a classification task for medical images.

Preprint: arXiv
NTK ENN
Emergent Equivariance in Deep Ensembles
2024
Jan E. Gerken, Pan Kessel

We demonstrate that deep ensembles are secretly equivariant models. More precisely, we show that deep ensembles become equivariant for all inputs and at all training times by simply using data augmentation. Crucially, equivariance holds off-manifold and for any architecture in the infinite width limit. The equivariance is emergent in the sense that predictions of individual ensemble members are not equivariant but their collective prediction is. Neural tangent kernel theory is used to derive this result and we verify our theoretical insights using detailed numerical experiments.

Published: ICML 2024 (Oral)
Preprint: arXiv
NTK ENN
Geometric deep learning and equivariant neural networks
2023
Jan E. Gerken, Jimmy Aronsson, Oscar Carlsson, Hampus Linander, Fredrik Ohlsson, Christoffer Petersson, Daniel Persson

We survey the mathematical foundations of geometric deep learning, focusing on group equivariant and gauge equivariant neural networks. We develop gauge equivariant convolutional neural networks on arbitrary manifolds (\mathcal {M}) using principal bundles with structure group K and equivariant maps between sections of associated vector bundles. We also discuss group equivariant neural networks for homogeneous spaces (\mathcal {M}=G/K), which are instead equivariant with respect to the global symmetry (G) on (\mathcal {M}). Group equivariant layers can be interpreted as intertwiners between induced representations of (G), and we show their relation to gauge equivariant convolutional layers. We analyze several applications of this formalism, including semantic segmentation and object detection networks. We also discuss the case of spherical networks in great detail, corresponding to the case (\mathcal {M}=S^2=\textrm{SO}(3)/\textrm{SO}(2)). Here we emphasize the use of Fourier analysis involving Wigner matrices, spherical harmonics and Clebsch–Gordan coefficients for (G=\textrm{SO}(3)), illustrating the power of representation theory for deep learning.

Preprint: arXiv
ENN GDL
Geometrical aspects of lattice gauge equivariant convolutional neural networks
2023
Jimmy Aronsson, David I. Müller, Daniel Schuh

Lattice gauge equivariant convolutional neural networks (L-CNNs) are a framework for convolutional neural networks that can be applied to non-Abelian lattice gauge theories without violating gauge symmetry. We demonstrate how L-CNNs can be equipped with global group equivariance. This allows us to extend the formulation to be equivariant not just under translations but under global lattice symmetries such as rotations and reflections. Additionally, we provide a geometric formulation of L-CNNs and show how convolutions in L-CNNs arise as a special case of gauge equivariant neural networks on SU(N) principal bundles.

Preprint: arXiv
ENN
Equivariance versus Augmentation for Spherical Images
2022
Jan E. Gerken, Oscar Carlsson, Hampus Linander, Fredrik Ohlsson, Christoffer Petersson, Daniel Persson

We analyze the role of rotational equivariance in convolutional neural networks (CNNs) applied to spherical images. We compare the performance of the group equivariant networks known as S2CNNs and standard non-equivariant CNNs trained with an increasing amount of data augmentation. The chosen architectures can be considered baseline references for the respective design paradigms. Our models are trained and evaluated on single or multiple items from the MNIST- or FashionMNIST dataset projected onto the sphere. For the task of image classification, which is inherently rotationally invariant, we find that by considerably increasing the amount of data augmentation and the size of the networks, it is possible for the standard CNNs to reach at least the same performance as the equivariant network. In contrast, for the inherently equivariant task of semantic segmentation, the non-equivariant networks are consistently outperformed by the equivariant networks with significantly fewer parameters. We also analyze and compare the inference latency and training times of the different networks, enabling detailed tradeoff considerations between equivariant architectures and data augmentation for practical problems.

Published: ICML 2022
Preprint: arXiv
SCV ENN
Homogeneous vector bundles and G-equivariant convolutional neural networks
2021
Jimmy Aronsson

$G$-equivariant convolutional neural networks (GCNNs) is a geometric deep learning model for data defined on a homogeneous $G$-space $\mathcal{M}$. GCNNs are designed to respect the global symmetry in $\mathcal{M}$, thereby facilitating learning. In this paper, we analyze GCNNs on homogeneous spaces $\mathcal{M} = G/K$ in the case of unimodular Lie groups $G$ and compact subgroups $K \leq G$. We demonstrate that homogeneous vector bundles is the natural setting for GCNNs. We also use reproducing kernel Hilbert spaces to obtain a precise criterion for expressing $G$-equivariant layers as convolutional layers. This criterion is then rephrased as a bandwidth criterion, leading to even stronger results for some groups.