SEQ2TENS: AN EFFICIENT REPRESENTATION OF SE-QUENCES BY LOW-RANK TENSOR PROJECTIONS

Abstract

Sequential data such as time series, video, or text can be challenging to analyse as the ordered structure gives rise to complex dependencies. At the heart of this is non-commutativity, in the sense that reordering the elements of a sequence can completely change its meaning. We use a classical mathematical object -the free algebra -to capture this non-commutativity. To address the innate computational complexity of this algebra, we use compositions of low-rank tensor projections. This yields modular and scalable building blocks that give state-of-the-art performance on standard benchmarks such as multivariate time series classification, mortality prediction and generative models for video. Code and benchmarks are publically available at

1. INTRODUCTION

A central task of learning is to find representations of the underlying data that efficiently and faithfully capture their structure. In the case of sequential data, one data point consists of a sequence of objects. This is a rich and non-homogeneous class of data and includes classical uni-or multi-variate time series (sequences of scalars or vectors), video (sequences of images), and text (sequences of letters). Particular challenges of sequential data are that each sequence entry can itself be a highly structured object and that data sets typically include sequences of different length which makes naive vectorization troublesome.

Contribution.

Our main result is a generic method that takes a static feature map for a class of objects (e.g. a feature map for vectors, images, or letters) as input and turns this into a feature map for sequences of arbitrary length of such objects (e.g. a feature map for time series, video, or text). We call this feature map for sequences Seq2Tens for reasons that will become clear; among its attractive properties are that it (i) provides a structured, parsimonious description of sequences; generalizing classical methods for strings, (ii) comes with theoretical guarantees such as universality, (iii) can be turned into modular and flexible neural network (NN) layers for sequence data. The key ingredient to our approach is to embed the feature space of the static feature map into a larger linear space that forms an algebra (a vector space equipped with a multiplication). The product in this algebra is then used to "stitch together" the static features of the individual sequence entries in a structured way. The construction that allows to do all this is classical in mathematics, and known as the free algebra (over the static feature space). Outline. Section 2 formalizes the main ideas of Seq2Tens and introduces the free algebra T(V ) over a space V as well as the associated product, the so-called convolution tensor product. Section 3 shows how low rank (LR) constructions combined with sequence-to-sequence transforms allows one to efficiently use this rich algebraic structure. Section 4 applies the results of Sections 2 and 3 to build modular and scalable NN layers for sequential data. Section 5 demonstrates the flexibility and modularity of this approach on both discriminative and generative benchmarks. Section 6 makes connections with previous work and summarizes this article. In the appendices we provide mathematical background, extensions, and detailed proofs for our theoretical results.

2. CAPTURING ORDER BY NON-COMMUTATIVE MULTIPLICATION

We denote the set of sequences of elements in a set X by Seq(X ) = {x = (x i ) i=1,...,L : x i ∈ X , L ≥ 1} (1) where L ≥ 1 is some arbitrary length. Even if X itself is a linear space, e.g. X = R, Seq(X ) is never a linear space since there is no natural addition of two sequences of different length. Seq2Tens in a nutshell. Given any vector space V we may construct the so-called free algebra T(V ) over V . We describe the space T(V ) in detail below, but as for now the only thing that is important is that T(V ) is also a vector space that includes V , and that it carries a non-commutative product, which is, in a precise sense, "the most general product" on V . The main idea of Seq2Tens is that any "static feature map" for elements in X φ : X → V can be used to construct a new feature map Φ : Seq(X ) → T(V ) for sequences in X by using the algebraic structure of T(V ): the non-commutative product on T(V ) makes it possible to "stitch together" the individual features φ(x 1 ), . . . , φ(x L ) ∈ V ⊂ T(V ) of the sequence x in the larger space T(V ) by multiplication in T(V ). With this we may define the feature map Φ(x) for a sequences x = (x 1 , . . . , x L ) ∈ Seq(X ) as follows (i) lift the map φ : X → V to a map ϕ : X → T(V ), (ii) map Seq(X ) → Seq(T(V )) by (x 1 , . . . , x L ) → (ϕ(x 1 ), . . . , ϕ(x L )), (iii) map Seq(T(V )) → T(V ) by multiplication (ϕ(x 1 ), . . . , ϕ(x L )) → ϕ(x 1 ) • • • ϕ(x L ). In a more concise form, we define Φ as Φ : Seq(X ) → T(V ), Φ(x) = L i=1 ϕ(x i ) where denotes multiplication in T(V ). We refer to the resulting map Φ as the Seq2Tens map, which stands short for Sequences-2-Tensors. Why is this construction a good idea? First note, that step (i) is always possible since V ⊂ T(V ) and we discuss the simplest such lift before Theorem 2.1 as well as other choices in Appendix B. Further, if φ, respectively ϕ, provides a faithful representation of objects in X , then there is no loss of information in step (ii). Finally, since step (iii) uses "the most general product" to multiply ϕ(x 1 ) • • • ϕ(x L ) one expects that Φ(x) ∈ T(V ) faithfully represents the sequence x as an element of T(V ). Indeed in Theorem 2.1 below we show an even stronger statement, namely that if the static feature map φ : X → V contains enough non-linearities so that non-linear functions from X to R can be approximated as linear functions of the static feature map φ, then the above construction extends this property to functions of sequences. Put differently, if φ is a universal feature map for X , then Φ is a universal feature map for Seq(X ); that is, any non-linear function f (x) of a sequence x can be approximated as a linear functional of Φ(x), f (x) ≈ , Φ(x) . We also emphasize that the domain of Φ is the space Seq(X ) of sequences of arbitrary (finite) length. The remainder of this Section gives more details about steps (i),(ii),(iii) for the construction of Φ. The free algebra T(V ) over a vector space V . Let V be a vector space. We denote by T(V ) the set of sequences of tensors indexed by their degree m, T(V ) := {t = (t m ) m≥0 | t m ∈ V ⊗m } (3) where by convention V ⊗0 = R. For example, if V = R d and t = (t m ) m≥0 is some element of T(R d ), then its degree m = 1 component is a d-dimensional vector t 1 , its degree m = 2 component is a d × d matrix t 2 , and its degree m = 3 component is a degree 3 tensor t 3 . By defining addition and scalar multiplication as s + t := (s m + t m ) m≥0 , c • t = (ct m ) m≥0

availability

https://github.com/tgcsaba

