Maurice Weiler@maurice_weiler
AI researcher with a focus on geometric DL and equivariant CNNs. PhD with Max Welling. Master's degree in physics. mauriceweiler.gitlab.io Amsterdam, The Netherlands Joined January 2018
Tweets530

Followers2,528

Following906
View a Private Twitter Instagram Account
Johann Brehmer @johannbrehmer
2 weeks agoThe GATr has hatched: github.com/QualcommAIre… Have a look if you need an equivariant but scalable architecture for your geometric problems 🐊 twitter.com/johannbrehmer/…
Maurice Weiler @maurice_weiler
3 days ago@ylecun A reason why I emphasize linear maps is since these are the most basic operations in any equivariant NNs. This insight allows us furthermore to derive more general group convolutions and more general Gsteerable convolutions later on.
Maurice Weiler @maurice_weiler
3 days ago@ylecun Indeed, thanks for emphasizing this point! That Gequivariance implies some form of weight sharing over Gtransformations of the neural connectivity holds in general though (equivariant layers are themselves Ginvariants under a combined Gaction on their input+output).
Antoine Collas @AntoineCollas
a week ago☀️New paper alert: “Parametric information geometry with the package Geomstats”
Work done with Alice Le Brigant, @jujuzor, and @ninamiolane.
ACM Transactions on Mathematical Software: dl.acm.org/doi/10.1145/36…
Github: github.com/geomstats/geom…
Maurice Weiler @maurice_weiler
a week agoA shortcoming of the representation theoretic approach is that it only explains global translations of whole feature maps, but not independent translations of local patches. This relates to local gauge symmetries, which will be covered in another post.
[11/N]
Maurice Weiler @maurice_weiler
a week agoA benefit of the representation theoretic viewpoint is that it generalizes to other symmetry groups, e.g. the Euclidean group E(d). The next blog post will explain such generalized equivariant CNNs and their generalized weight sharing patterns. [10/N]
Maurice Weiler @maurice_weiler
a week agoSimilarly, one could apply different nonlinearities at different locations, but equivariance requires them to be the same.
[9/N]
Maurice Weiler @maurice_weiler
a week agoIn principle one could sum a "bias field" to feature maps, i.e. apply a different bias vector at each location. Equivariance forces this field to be translationinvariant, i.e. the same bias vector needs to be applied at each location.
[8/N]
Maurice Weiler @maurice_weiler
a week agoIn general, linear layers may have different weight matrices at different locations. However, equivariance enforces these weights to be exactly the same, i.e. it implies convolutions.
[7/N]
Maurice Weiler @maurice_weiler
a week agoWe define CNN layers simply as any translation equivariant maps between feature maps. Such layers are generally characterized by a translationinvariant neural connectivity, i.e. spatial weight sharing.
Some examples:
[6/N]
Maurice Weiler @maurice_weiler
a week agoFeature maps with c channels on R^d are functions F: R^d > R^c.
Translations t act on feature maps by shifting them: [t.F](x) = F(xt)
The vector space (function space) of feature maps with this action forms a translation group representation (regular rep).
[5/N]
Maurice Weiler @maurice_weiler
a week agoThis "engineering viewpoint" establishes the implication
weight sharing > equivariance
Our representation theoretic viewpoint turns this around:
weight sharing < equivariance
It derives CNN layers purely from symmetry principles!
[4/N]
Maurice Weiler @maurice_weiler
a week agoImagine to shift the input of a CNN layer. As the neural connectivity is the same everywhere, shifted patterns will evoke exactly the same responses, however, at correspondingly shifted locations.
[3/N]
Maurice Weiler @maurice_weiler
a week agoCNNs are often understood as neural nets which share weights (kernel/bias/...) between different spatial locations. Depending on the application, it may be sensible to enforce a local connectivity, however, this is not strictly necessary for the network to be convolutional.
[2/N]
Maurice Weiler @maurice_weiler
a week agoThis is the 2nd post in our series on equivariant neural nets. It explains conventional CNNs from a representation theoretic viewpoint and clarifies the mutual relationship between equivariance and spatial weight sharing.
mauriceweiler.gitlab.io/blog_post/cnn…
👇TL;DR🧵
Dominique Beaini @dom_beaini
2 weeks agoDid you know that the legend of Icarus warns us of graph Transformers?👼
Last year, to organize the field of GNN research, I wrote the very popular mazeanalogy:
x.com/dom_beaini/sta…
It is now revamped into a blogpost, with nicer visuals and more SOTA
portal.valencelabs.com/blogs/post/maz…
Maurice Weiler @maurice_weiler
3 weeks ago@Joebingoo ...only the M's), the model couldn't distinguish them. The extent to which equiv holds in practice depends on the implementation/discretization. Pixel grids, pooling and boundary effects (finite image size) break equiv. For point clouds / continuous space it holds perfectly.
Maurice Weiler @maurice_weiler
3 weeks ago@Joebingoo Not exactly sure what the algorithm is doing, but it is likely distinguishing the two M's by their context: it detects "sex: M" and "status: M" as different objects and learned to put bounding boxes around the M's only. If the aggregated field of view would be small (contain ...
Maurice Weiler @maurice_weiler
3 weeks ago@maxishtefan And yes, this could be combined with equivariance: just define Gactions on the manifoldvalued features and come up with layers that commute with these actions
Maurice Weiler @maurice_weiler
3 weeks ago@maxishtefan If you are interested in hyperbolic NNs, this is a great place to start: arxiv.org/abs/2006.08210
Maurice Weiler @maurice_weiler
3 weeks ago@maxishtefan Not an expert on that topic but you can find some references in the second last paragraph here (page 213) arxiv.org/pdf/2106.06020…
Maurice Weiler @maurice_weiler
3 weeks ago@maxishtefan Yes, vector space structure > linear action definitely makes sense. Not sure whether nonvector feature spaces necessarily imply nonvector parameter spaces? People are often still using vectorvalued params and then apply nonlinear transformations to them
Maurice Weiler @maurice_weiler
3 weeks ago@maxishtefan That would also be possible, but it would complicate things. NNs work best with vectorvalued features. There is some research on hyperbolic "gyrovector" valued features, I could imagine that nonlinear group actions would play a role there?
Maurice Weiler @maurice_weiler
3 weeks ago@maxishtefan The role of the group G is to specify transformations of features, but you don't optimize group values or anything like that. Equivariance rather demands that your fct commutes with any g∈G. The group's geometry is not immediately relevant
Maurice Weiler @maurice_weiler
3 weeks ago@maxishtefan ... with R(φ) for any φ∈SO(2). One can prove that any equiv matrix is a linear combination in this basis, i.e. [[a,b],[b,a]]. Equiv. therefore reduces the original 4dim parameter space to two dimensions, but you still have a *Euclidean vector space* describing the parameters.
Maurice Weiler @maurice_weiler
3 weeks ago@maxishtefan No, they are living in a vector space. Example: consider the 4dim space of 2x2 matrices mapping from X=R^2 to Y=R^2. Assume SO(2)actions on X and Y, both given by rotation matrices R(φ)=[[cosφ,sinφ],[sinφ,cosφ]]. The matrices M1=[[1,0],[0,1]] and M2=[[0,1],[1,0]] commute ...
Maurice Weiler @maurice_weiler
3 weeks ago@maxishtefan In the visualized example (DeepSets) permutation equivariance restricts the N^2dim parameter vector space to a 2dim vector subspace (green/blue). You simply run standard gradient descent in this Euclidean parameter subspace.
Hope that helps to clarify what I am trying to say :)
Maurice Weiler @maurice_weiler
3 weeks ago@maxishtefan The geometry of the group or feature space doesn't matter, Riemannian gradient descent is about the geometry of the *parameter* space. We are also not necessarily requiring Lie groups, finite groups work as well (reflections / discrete rotations / permutations ...)
Maurice Weiler @maurice_weiler
3 weeks ago@maxishtefan Riemannian gradient descent is relevant when the parameter space is curved. This is independent from having Lie group equivariance or running convolutions on curved space (e.g. spherical CNNs).
Maurice Weiler @maurice_weiler
3 weeks ago@maxishtefan The feature spaces are usually still vector spaces, but acted on by Greps. Equivariance constrains network weights to linear subspaces, in which you can run standard SGD. Example: translation equivariance turns MLPs into CNNs. Their subspace of params is optimized as usual.