PhD student
As part of Jan Gerken’s group, my research lies in the area of mathematical foundations of deep learning. I’m particularly interested in geometric descriptions of neural networks (NNs), e.g. group-equivariant NNs, allowing for better-tailored algorithms, as well as in the regime of wide neural networks that is governed by the neural tangent kernel. Owing to my physics background, my work is influenced by symmetry-centered approaches typically found in mathematical physics and quantum field theory. Possible applications in condensed matter physics and autonomous driving are also part of my interests. My research is supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP).
MSc in Physics at the University of Vienna, Austria (2023)
Thesis: Neural network potentials with long-range charge transfer and electrostatics
Supervised by: Prof. Christoph Dellago, Dr. Andreas Tröster
Equivariant neural networks have in recent years become an important technique for guiding architecture selection for neural networks with many applications in domains ranging from medical image analysis to quantum chemistry. In particular, as the most general linear equivariant layers with respect to the regular representation, group convolutions have been highly impactful in numerous applications. Although equivariant architectures have been studied extensively, much less is known about the training dynamics of equivariant neural networks. Concurrently, neural tangent kernels (NTKs) have emerged as a powerful tool to analytically understand the training dynamics of wide neural networks. In this work, we combine these two fields for the first time by giving explicit expressions for NTKs of group convolutional neural networks. In numerical experiments, we demonstrate superior performance for equivariant NTKs over non-equivariant NTKs on a classification task for medical images.
The neural tangent kernel (NTK) is a quantity closely related to the training dynamics of neural networks (NNs). It becomes particularly interesting in the infinite width limit of NNs, where this kernel becomes deterministic and time-independent, allowing for an analytical solution of the gradient descent dynamics under the mean squared error loss, resulting in a Gaussian process behaviour. In this talk, we will first introduce the NTK and its properties, and then discuss how it can be extended to NNs equivariant with respect to the regular representation. In analogy to the forward equation of the NTK of conventional NNs, we will present a recursive relation connecting the NTK to the corresponding kernel of the previous layer in an equivariant NN. As a concrete example, we provide explicit expressions for the symmetry group of 90° rotations and translations in the plane, as well as Fourier-space expressions for \(SO(3)\) acting on spherical signals. We support our theoretical findings with numerical experiments.