We organize the GAPinDNNs seminar at the Department for Mathematical Sciences at Chalmers and the University of Gothenburg.
The topics of the seminar are broad and lie at the intersection of machine learning (in particular deep learning), pure mathematics and theoretical physics. We have both more theoretical and more applied speakers. If you would like to receive invitations to upcoming talks, please let the seminar organizers know and we will add you to our email list.
Seminars usually happen in-person only on Thursdays at 1.30pm. Before the seminar we go for lunch with the speaker and there is a speaker dinner in the evening before or after the talk.
Current seminar organizers are Jan Gerken and Max Guillen.
The NN-QFT correspondence provides a description of a statistical ensemble of neural networks in terms of a quantum field theory. The infinite-width limit is mapped to a free field theory while finite N corrections are taken into account by interactions. In this talk, after reviewing the correspondence, I will describe how to use non-perturbative renormalization in this context. An important difference with the usual analysis is that the effective (IR) 2-point function is known, while the microscopic (UV) 2-point function is not, which requires setting the problem with care. Finally, I will discuss preliminary numerical results for translation-invariant kernels. A major result is that changing the standard deviation of the neural network weight distribution can be interpreted as a renormalization flow in the space of networks.
Based on arXiv:2108.01403 and arXiv:2212.11811.
The short answer to the title is NO, but there are some theoretical connections between neural networks with self-supervised pre-training and kernel principal component analysis. At a high level, the equivalence is based on two ideas: (i) optimal solutions for many self-supervised losses correspond to spectral embeddings; and (ii) infinite-width neural networks converge to neural tangent kernel (NTK) models.
I will first give a short overview of this equivalence and discuss why it could be useful for both theory and practice of foundation models. I will then discuss two recent works on the NTK convergence under self-supervised losses (arXiv:2403.08673, arXiv:2411.11176). Specifically, I will show that one cannot directly use NTK results from supervised learning / regression, but a careful analysis is needed to prove that the NTK indeed remains constant during self-supervised training.
Numerical linear algebra underpins all computational sciences, machine learning not the least. But what if machine learning could return the favor by learning numerical algorithms tailored to a particular problem class? In this talk, I will highlight the connection between matrices and graph, and argue that this makes graph neural networks a natural fit for learning task-specific numerical algorithms.
Visualization and uncertainty quantification are often used to support our interpretation of deep learning models. In this talk, we show through examples how both visualization and uncertainty quantification can lead to misinterpretation if applied naïvely. Our examples will include equivariant neural networks for graphs and images, as well as uncertainty quantification with structured label variation.
Equivariance imposes symmetry constraints on the connectivity of neural networks. This talk investigates the case of equivariant networks for fields of feature vectors on Euclidean spaces or other Riemannian manifolds. Equivariance is shown to lead to requirements for 1) spatial (convolutional) weight sharing, and 2) symmetry constraints on the shared weights themselves. We investigate the symmetry constraints imposed on convolution kernels and discuss how they can be solved and implemented. A gauge theoretic formulation of equivariant CNNs shows that these models are not only equivariant under global transformations, but under more general local gauge transformations as well.
Neural networks parametrize spaces of functions, sometimes referred to as `neuromanifolds’. Their geometry is intimately related to fundamental machine learning aspects, such as expressivity, sample complexity, and training dynamics. For polynomial activation functions, neuromanifolds are (semi-) algebraic varieties, enabling the application of tools and ideas from algebraic geometry to deep learning. In this talk, we will first review the general theory of neuromanifolds, and then present our recent results for deep convolutional networks with monomial activations. In this case, we show that the parametrization is finite, birational, and regular, factoring through the Segre-Veronese embedding. Moreover, by appealing to the theory of the generic Euclidean distance degree, we compute the number of critical points of the (complexified) regression objective for a generic large dataset.
Neural ODEs are neural network models where the network is not specified by a discrete sequence of hidden layers. Instead, the network is defined by a vector field describing how the data evolves continuously over time governed by an ordinary differential equation (ODE). These models can be generalized for data living on non-Euclidean manifolds, a concept known as manifold neural ODEs. In our paper, we develop a geometric framework for equivariant manifold neural ODEs. Our work includes a novel formulation of equivariant neural ODEs in terms of differential invariants, based on Lie theory for symmetries of differential equations. We also construct augmented manifold neural ODEs and show that they are universal approximators of equivariant diffeomorphisms on any path-connected manifold.
The field of causality has recently emerged as a subject of interest in machine learning, largely due to major advances in data collection methods in the biological sciences and tech industries where large-scale observational and experimental data sets can now be efficiently and ethically obtained. The modern approach to causality decomposes the inference process into two fundamental problems: the inference of causal relations between variables in a complex system and the estimation of the causal effect of one variable on another given that such a relation exists. The subject of this talk will be the former of the two problems, commonly called causal discovery, where the aim is to learn a complex causal network from the available data. We will give a soft introduction to the basics of causal modeling and causal discovery, highlighting where combinatorics and geometry have already started to contribute. Going deeper, we will analyze how and when geometry and combinatorics help us identify causal structure without the use of experimental data.
We start from geometric first principles to construct a machine learning framework for 3D point set analysis. We argue that spherical decision surfaces are a natural choice for this type of problems, and we represent them using a non-linear embedding of 3D Euclidean space into a Minkowski space, represented by a 5D Euclidean space. Via classification experiments on a 3D Tetris dataset, we show that we can get a geometric handle on the network weights, allowing us to directly apply transformations to the network. The model is further extended into a steerable filter bank, facilitating classification in arbitrary poses. Additionally, we study equivariance and invariance properties with respect to \(O(3)\) transformations.
In this talk I present an overview to my recently defended PhD thesis conducted within the WASP program. While artificial intelligence and deep learning have revolutionized many fields in the last decade, one of the key drivers has been access to data. This is especially true in biomedical image analysis where expert annotated data is hard to come by. The combination of Convolutional Neural Networks (CNNs) with data augmentation has proven successful in increasing the amount of training data at the cost of overfitting. In our research, equivariant neural networks have been used to extend the equivariant properties of CNNs to more transformations than translations. The networks have been trained and evaluated on biomedical image datasets, including bright-field microscopy images of cytological samples indicating oral cancer, and transmission electron microscopy images of virus samples. By designing the networks to be equivariant to e.g. rotations, it is shown that the need for data augmentation is reduced, that less overfitting occurs, and that convergence during training is faster. Furthermore, equivariant neural networks are more data efficient than CNNs, as demonstrated by scaling laws. These benefits are not present in all problem settings and which benefits will occur is somewhat unpredictable. We have identified that the results to some extent depend on architectures, hyperparameters and datasets. Further research may broaden the performed studies to explain how the results occur with new theory.
This talk will explain that Convolutional Neural Networks without activation parametrize polynomials that admit a certain sparse factorization. For a fixed network architecture, these polynomials form a semialgebraic set. We will investigate how the geometry of this semialgebraic set (e.g., its singularities and relative boundary) changes with the network architecture. Moreover, we will explore how these geometric properties affect the optimization of a loss function for given training data. We prove that for architectures where all strides are larger than one and generic data, the non-zero critical points of the squared-error loss are smooth interior points of the semialgebraic function space. This property is known to be false for dense linear networks or linear convolutional networks with stride one. (For linear networks, that are equivariant under the action of some group, we prove that no fixed network architecture can parametrize the whole space of functions, but that finitely many architectures can exhaust the whole space of linear equivariant functions.) This talk is based on joint work with Joan Bruna, Guido Montúfar, Anna-Laura Sattelberger, Vahid Shahverdi, and Matthew Trager.