Neural network layers as parametric spans
mattiagbergomi@gmail.com, pietro.vertechi@protonmail.com
Preprint available at
arXiv:2208.00809
.
--- layout: true --- class: white ### Neural networks: stack “simple” layers to approximate complex functions
Image credits: [https://github.com/poloclub/cnn-explainer](https://github.com/poloclub/cnn-explainer)
---
Video credits:
https://github.com/poloclub/cnn-explainer
--- .column-left[
] -- .column-right[ ### Requirements for a linear layer ☐ Bilinearity. ] --- count: false .column-left[
] .column-right[ ### Requirements for a linear layer ☐ Bilinearity. ] -- .column-right[ ☐ Duality. ] --- ### Problems - It is hard to *define* a novel linear layer: - efficient computation of value on CPU and GPU, - efficient dualization on CPU and GPU. -- - It is hard to *choose* among a plethora of existing linear layers: - dense layer, - convolutional layer, - transposed convolutional layer, - depthwise convolutional layer, - graph convolutional layer, - diffusion convolutional layer, - geodesic convolutional layer, - anisotropic convolutional layer, - ... -- - It is hard to *learn* a linear layer: - there is not a space, just a taxonomy, of linear layers, - even for a given layer, hyperparameters are discrete. --- ### Key ingredients - Frobenius integration theories - formalize categorically the interplay between functions and measures, - naturally lead to bilinearity and duality, - can be applied to the category of smooth manifolds and submersions. --
- Parametric spans - induce bilinear operators that can be dualized trivially, - recover classical linear neural network layers. --- ### Frobenius integration theories **Definition.** For a commutative $K$-algebra $A$, let $\mathbf{Mod}\_A / K$ denote the comma category of .half-width-very-wide[ $$\mathbf{Mod}_A \rightarrow \mathbf{Vect}_K \xleftarrow{K} \bullet.$$ ] -- Objects are $A$-modules with a $K$-linear functional. Morphisms are $A$-module homomorphisms compatible with the $K$-linear functional. -- .half-width-very-wide[
$\mathbf{Gr}(\mathbf{Mod} / K)$ is the *covariant Grothendieck construction* of $$\mathbf{Mod} / K \colon \mathbf{CAlg}_K^\textnormal{op} \rightarrow \mathbf{Cat}.$$ It is the opposite of the category of $F$-lenses $\mathbf{Lens}\_{\mathbf{Mod} / K}$. ] -- .half-width-very-wide[
**Definition.** A *Frobenius integration theory* over a category $\mathcal{C}$ is a functor $$\mathcal{C} \rightarrow \mathbf{Gr}(\mathbf{Mod}/K).$$ ] --- ### Frobenius integration theories .column-left-wide[ **Proposition 1.** A Frobenius integration theory is equivalent to ] .column-left-wide[ - functors $\mathcal{F}\colon \mathcal{C}^\textnormal{op}\rightarrow\mathbf{CAlg}\_K$ and $\mathcal{M}\colon \mathcal{C}\rightarrow\mathbf{Vect}\_K$, ] -- .column-left-wide[ - action of $\mathcal{F}(X)$ on $\mathcal{M}(X)$ such that, for all $\mu \in \mathcal{M}(X)$ and $y \in \mathcal{F}(Y)$, $$f\_\ast (f^\ast y \cdot \mu) = y \cdot f\_\ast \mu \quad \text{(Frobenius reciprocity)}$$ where $f\colon X \rightarrow Y$, $f^\ast:=\mathcal{F}(f)$, and $f_\ast := \mathcal{M}(f)$, ] -- .column-left-wide[ - $K$-linear functional $\int\_X$ on $\mathcal{M}(X)$ such that $\int\_X \mu = \int\_Y f_\ast\mu$. ] -- .column-left-wide[ **Proposition 2.** For all $f\colon X \rightarrow Y$, $\mu \in \mathcal{M}(X)$, and $y \in \mathcal{F}(Y)$, $$\int\_X f^\ast y \cdot \mu = \int\_Y y \cdot f_\ast\mu.$$ Thus, $\int$ and $\cdot$ induce an extranatural transformation $\mathcal{F} \otimes \mathcal{M} \overset{\cdot\cdot}{\Rightarrow} K$. ] --- ### Integration theory on smooth spaces Let $\mathbf{Subm}$ denote the category of smooth manifolds and submersions. -- For a manifold $X$, we denote - $C^\infty(X)$ the algebra of smooth, real-valued functions on $X$, - $C^\infty\_0(|\Lambda|\_X)$ the space of smooth densities of compact support on $X$. -- .half-width-very-wide[ These mappings extend to functors $$C^\infty(\text{\textendash})\colon\mathbf{Subm}^\textnormal{op}\rightarrow \mathbf{CAlg}\_\mathbb{R}\quad\text{ and }\quad C^\infty\_0(|\Lambda|\_\text{\textendash}) \colon \mathbf{Subm}\rightarrow\mathbf{Vect}\_\mathbb{R}.$$ ] -- .half-width-very-wide[ Pushforward of densities and pullback of functions obey Frobenius reciprocity $$f\_\ast (f^\ast y \cdot \mu) = y \cdot f\_\ast \mu.$$ ] -- .half-width-very-wide[ Integrating a density on a manifold yields a natural $\mathbb{R}$-linear transformation $$\int_X \colon C^\infty\_0(|\Lambda|\_X) \rightarrow \mathbb{R}.$$ ] -- .column-left-wide[ Thus, $C^\infty(\text{\textendash}),\, C^\infty\_0(|\Lambda|\_\text{\textendash}),\, \int$ are a Frobenius integration theory. ] --- ### Take-home message .column-left-wide[ - The Grothendieck construction yields a global category $\mathbf{Gr}(\mathbf{Mod} / K)$ that axiomatizes the behavior of - functions, - measures, - and integrals. - A *Frobenius integration theory* is a functor to $\mathbf{Gr}(\mathbf{Mod} / K)$. - There is a natural Frobenius integration theory $$\mathbf{Subm} \rightarrow \mathbf{Gr}(\mathbf{Mod} / \mathbb{R})$$ on the category of smooth manifolds and submersions. ] --- .column-left[ ### Parametric span
] --- count:false .column-left[ ### Parametric span
] --- count:false .column-left[ ### Parametric span
] --- count:false .column-left.long[ ### Parametric span
**Requirements** ☐ Bilinearity. ☐ Duality. ] --- count:false .column-left.long[ ### Parametric span
**Requirements** ☑ Bilinearity. ☐ Duality. ] .column-right-wide[ ### Results **Proposition 3.** A parametric span induces a $K$-linear map $$\mathcal{F}(X) \otimes \mathcal{F}(W) \otimes \mathcal{M}(E) \rightarrow \mathcal{M}(Y)$$ $$x\otimes w\otimes \mu \mapsto t_\ast (s^\ast x \cdot \pi^\ast w \cdot \mu).$$ ] --
We fix $\mu$ and use bilinear operator $\mathcal{F}(X) \otimes \mathcal{F}(W) \rightarrow \mathcal{M}(Y)$.
--- count:false .column-left.long[ ### Parametric span
**Requirements** ☑ Bilinearity. ☑ Duality. ] .column-right-wide[ ### Results **Proposition 3.** A parametric span induces a $K$-linear map $$\mathcal{F}(X) \otimes \mathcal{F}(W) \otimes \mathcal{M}(E) \rightarrow \mathcal{M}(Y)$$ $$x\otimes w\otimes \mu \mapsto t_\ast (s^\ast x \cdot \pi^\ast w \cdot \mu).$$ ]
We fix $\mu$ and use bilinear operator $\mathcal{F}(X) \otimes \mathcal{F}(W) \rightarrow \mathcal{M}(Y)$.
.column-right-wide[
**Proposition 4.** The following diagram commutes.
] --
The dual operator with respect to $X$ can be lifted as follows.
--
Punch line.
To dualize, permute legs of parametric span.
--- ### Classical architectures #### Dense layer .column-right[
] -- .column-left[ ##### Features - Domain: Discrete - Symmetry: No symmetry ##### Parametric Span
] --- ### Classical architectures #### Convolutional layer .column-right[
Image credits: Đặng Hà Thế Hiển
] -- .column-left[ ##### Features - Domain: Discrete - Symmetry: Translation ##### Parametric Span
] --- ### Classical architectures #### Geometric deep learning .column-right[
Adapted from Monti et al., "Geometric deep learning on graphs and manifolds using mixture model CNNs" (2017).
] -- .column-left[ ##### Features - Domain: Discrete & continuous - Symmetry: Learned ##### Parametric Span
] --- ### Conclusions and future directions - Parametric spans induce bilinear operators via a given Frobenius integration theory. -- - Such operators can be dualized by permuting the legs of the parametric span. -- - Parametric spans in the category of manifolds and submersions encompass - dense layers, - convolutional layers and variations thereof, - many geometric deep learning layers. -- - Thus, we can define the *microstructure* of a single linear layer in categorical terms. -- - Our overarching aim is to create a framework for *neural architectures* with the following properties: - modularity and composability [1], - existence and computability of duals for reverse-mode differentiation [2].
[1] P. Vertechi, P. Frosini, and M. G. Bergomi. "Parametric machines: a fresh approach to architecture search." arXiv preprint arXiv:2007.02777 (2020).
[2] P. Vertechi and M. G. Bergomi. "Machines of finite depth: towards a formalization of neural networks." arXiv preprint arXiv:2204.12786 (2022).
-- - In the future, we plan to - incorporate nonlinearities by means of cospans, - encode neural architectures (not just single layers) by means of parametric spans and cospans.