PhD student
My interest is broadly in symmetries (local/global, invariances/equivariances) in machine learning models as well as in theoretical physics. Many of my current projects revolve around group- or gauge-equivariant neural networks, which are well-studied for example in the form of G-CNNs. The goal is to develop a general framework which covers other machine learning models including equivariant transformer models. The hope is that such a framework would shed light on the fundamental differences and similarities between different existing equivariant transformer models, as well as relate these to the previous generation of equivariant convolutional neural networks.
The long-term goal is to further introduce sophisticated mathematical tools in the study of neural networks in large and equivariant neural networks in particular. I am particularly interested in tools from differential geometry, algebraic topology and representation theory, and I can often draw inspiration from my background in theoretical physics.
Before starting as a PhD at Chalmers I was a research intern at ETH Zürich, Switzerland, where I worked in the group of Prof. Marina Marinkovic on gauge-equivariant normalizing flows for computational lattice quantum chromodynamics. I have also worked as a research intern in the group of Peter Samuelsson at Lund University, Sweden. The work in Lund was a continuation of my BSc thesis project on mathematical physics for quantum thermodynamics, which ultimately resulted in a published article.
My research is supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP). I am the main organizer of the nation-wide WASP PhD cluster focused on geometric deep learning, which brings WASP PhD students together under a common interest in equivariant networks, graph neural networks, topological data analysis, learning on manifolds, among other topics.
This paper presents a novel framework for non-linear equivariant neural network layers on homogeneous spaces. The seminal work of Cohen et al. on equivariant G-CNNs on homogeneous spaces characterized the representation theory of such layers in the linear setting, finding that they are given by convolutions with kernels satisfying so-called steerability constraints. Motivated by the empirical success of non-linear layers, such as self-attention or input dependent kernels, we set out to generalize these insights to the non-linear setting. We derive generalized steerability constraints that any such layer needs to satisfy and prove the universality of our construction. The insights gained into the symmetry-constrained functional dependence of equivariant operators on feature maps and group elements informs the design of future equivariant neural network layers. We demonstrate how several common equivariant network architectures - G-CNNs, implicit steerable kernel networks, conventional and relative position embedded attention based transformers, and LieTransformers - may be derived from our framework.