CONSTRUCTIVE TT-REPRESENTATION OF THE TENSORS GIVEN AS INDEX INTERACTION FUNCTIONS WITH APPLICATIONS

Abstract

This paper presents a method to build explicit tensor-train (TT) representations. We show that a wide class of tensors can be explicitly represented with sparse TTcores, obtaining, in many cases, optimal TT-ranks. Numerical experiments show that our method outperforms the existing ones in several practical applications, including game theory problems. Theoretical estimations of the number of operations show that in some problems, such as permanent calculation, our methods are close to the known optimal asymptotics, which are obtained by a completely different type of methods.

1. INTRODUCTION

The tensor train is a powerful tool for compressing multidimensional tensors (by tensor we mean a multidimensional array of complex numbers). It allows us to circumvent the curse of dimensionality in a number of cases. In a case of d-dimensional tensor with number of indices equal to n for each dimension, direct storage of tensor involves O(n d ) memory cells, while tensor train bypasses O(ndr 2 ), where r is average rank of TT decomposition (Oseledets, 2011) . In many important applications, the average rank may be small enough so that n d ≫ ndr 2 . Tensor approximation is a hot topic in the area of machine learning. For example, in the paper (Richter et al., 2021) tensor train format is used to solve high-dimensional parabolic PDE with dimension in numerical experiments up to d ∼ 10 2 . Problems of building tensor decomposition and tensor completion are considered in (Lacroix et al., 2020; Fan, 2022; Ma & Solomonik, 2021) . The properties of tensor decompositions as applied to machine learning tasks are discussed in (Ghalamkari & Sugiyama, 2021; Kileel et al., 2021; Khavari & Rabusseau, 2021) . Existing methods allow one to build TT-decompositions by treating the tensor values as a black box. The TT-cross approximation method (Oseledets & Tyrtyshnikov, 2010) adaptively queries the points where the tensor value is evaluated. The iterative alternative schemes such as alternating least squares method (Oseledets & Dolgov, 2012) or alternative linear schemes (Holtz et al., 2012) , build a decomposition consistently updating the decomposition cores. These methods do not take into account the analytic dependence, if any, of the tensor value on its indices. At the same time, even for relatively simple tensors, these methods can build a TT decomposition for a long time and in the vast majority of cases obtain an answer with a given error greater than zero, even if the original tensor has an exact TT decomposition. In this paper, we present a fast method to directly construct cores of the TT decomposition of a tensor for which the analytical 1 dependence of the tensor value on the values of its indices is known. Technically, our method works with functions, each of which depends on tensor index and which are sequentially applied to the values of the previous functions. This functions we call derivative functions hereafter. However, this assignment covers quite a large range of functional dependences of tensor value on its indices if such a set of functions is chosen skillfully. Examples are given in Section 3 and Appendix. Our method works best in cases where the derivative functions together with the tensor itself have a small number of possible values. In the Application section and Appendix there are several examples for indicator tensors taking values only 0 and 1. TT-cores, obtained by our method, are highly sparse, which gives an additional gain in performance. In many cases our method gives the lowest possible TT-rank, so that no further rounding of the TT-cores is required. In some other applications, the ranks of the TT decomposition obtained by our method can be substantially higher than those obtained by approximate methods. However, in a large number of such cases, the sparse structure of the cores allows one to achieve performance comparable to known algorithms. The advantage of representing tensors in the TT format is not only in overcoming the curse of dimensionality, but also in the implemented tensor algebra for them: we can easily add, multiply, round TT-tensors, calculate the convolution (Oseledets, 2011) . In this way we can, for example, construct a set of indicator tensors that represent some constraints in the given problem in advance, and then combine these constraints arbitrarily by multiplying these tensors with a data tensor. As a practical use of such a scheme, we give an example of calculating the permanent of a matrix. The cooperative games examples in Application section use the well-known algorithm for quickly finding the sum of all elements of the TT-tensor. Other examples with practical problems are given in Appendix. They include: simple examples for sum, where we explicitly show sparse TT-cores; cooperative games, where we show how one can build iterative algorithm, based on our method; Knapsack problem (in several formulations), where we use existing algorithms to find the (quasi-) maximal element of the TT-tensor; Partition problem; Eight queens puzzle in several extended formulations (see Fig. 1 for the result for the case of 10 queens); sawtooth sequence; standard Boolean satisfiability problem. Python code with the examples is available in the public domainfoot_1 . In the vast majority of this examples we use products of tensors, convolution to find multidimensional sums and maximum element search to solve optimization problems. These operations are sufficient to solve a large class of problems from different areas of mathematics. Our method has a direct extension to more complex cases of tensor networks, for one of the cooperative games below and in several examples in Appendix. Such a construction is called TT-Tucker (Dolgov & Khoromskij, 2013; Oseledets, 2011) . Our main contribution and advantages of our approach • the exact and fast representation of the tensor in TT-format, which can then, if necessary, be rounded to smaller ranks with a given accuracy. In many of the given examples, this representation is optimal in the sense that the ranks of the TT decomposition cannot be reduced without loss of accuracy; • highly sparse structure of TT-decomposition cores which leads to a noticeable reduction in calculations; • a unified approach and a simple algorithmic interface to inherently different tasks and areas including those problems for which it is not immediately obvious the representation of the function specifying the tensor value in the form of consecutive functions (1)-(3); • the ability to construct an approximate TT-decomposition with a controlled error or/and with the specified maximum ranks of the TT decomposition; • the possibility in some cases to explicitly reduce the set of matrix operations on the TTcores in calculating the required value to an iterative algorithm, since the cores of TTdecomposition are sparse and their elements can be constructed explicitly in advance.

Related works

In addition to the works mentioned above, let us list the following. In the paper (Oseledets, 2012) explicit representations of several tensors with known analytical dependence of indices are presented, bur for a fairly narrow class of tensors (see also (Khoromskij, 2018, Sec. 4.2) ). In the survey (Grasedyck et al., 2013) techniques for low-rank tensor approximation are presented, including TT-format. In works (Cichocki et al., 2016; 2017) many examples of applying Tucker and Tensor Train decomposition to various problems including machine learning and data mining algorithms. Tensor completion method described in the paper (Steinlechner, 2016) uses Riemannian optimization for reconstruction a TT-format of a blackbox. In the paper (Bian et al., 2022) energybased learning approach for cooperative games is considered. Background Consider a tensor K with the following dimensions {n 1 , n 2 , . . . , n d }, i. e., K ∈ C n1×n2×•••×n d , where C is the set of complex numbers. TT-decomposition of the tensor K with TT-ranks {r 0 , r 1 , . . . , r d } is defined as the product K (i 1 , i 2 , . . . , i n ) = 1 α0=1 r1 α1=1 • • • r d-1 α d-1 =1 1 α d =1 G 1 (α 0 , i 1 , α 1 )G 2 (α 1 , i 2 , α 2 ) • • • • • • G d-1 (α d-2 , i d-1 , α d-1 )G d (α d-1 , i d , α d ), where tensors G i ∈ C ri-1×ni×ri are called cores of the TT-decomposition (we let r 0 = r d = 1).

FUNCTIONS

Note that if one needs to calculate the value of the TT-tensor in a given multi-index {i 1 , . . . , i d }, it is more efficient to start the calculation from one end, as described in (Oseledets, 2011) . In this approach, we take the vector v 1 := G 1 (1, i 1 , :) and multiply it by the matrix G 2 (:, i 2 , :). Then we multiply the obtained vector v 2 by the matrix G 3 (:, i 3 , :), etc. At each step we get v k = v k-1 G k (: , i k , :). The above process can be viewed as a process of sequential transformation (in this case, linear) of the vector obtained at the previous step. Our algorithm is based on the idea of the inverse step: knowing the sequence of arbitrary (in particular, nonlinear) transformations, can we obtain cores of TT-decomposition, each of which corresponds to its sequential transformation? In addition, there is the problem of reducing an arbitrary function of many variables to a sequence of functions, each of which depends on only one index and the value of the previous function.

2.1. DERIVATIVE FUNCTIONS

From an algebraic point of view, we want to build TT-decomposition of such tensors K, each element K (i 1 , i 2 , . . . , i d ) of which can be calculated in two consecutive passages. One, from to the left of the right, is as follows a 1 (i 1 ) = f (1) (i 1 , 0), a 2 (i 1 , i 2 ) = f (2) (i 2 , a 1 ), a 3 (i 1 , i 2 , i 3 ) = f (3) (i 3 , a 2 ), • • • a l-1 (i 1 , i 2 , • • • , i l-1 ) = f (l-1) (i l-1 , a l-2 ), then from right to left a d (i d ) = f (d) (i d , 0), a d-1 (i d , i d-1 ) = f (d-1) (i d-1 , a d ), • • • a l+1 (i d , i d-1 , • • • , i l+1 ) = f (l+1) (i l+1 , a l+2 ), and, finally, K (i 1 , i 2 , . . . , i d ) = f (l) (i l , a l-1 , a l+1 ). (3) f (l) i l f (l-1) i l-1 f (l-2) i l-2 • • • f (2) i 2 f (1) i 1 0 f (l+1) i l+1 f (l+2) i l+2 • • • f (d-1) i d-1 f (d) i d 0 (a) Computation tree G1 G2 r1 Gl-2 rl-3 r2 Gl-1 rl-2 a1 := f (1) (i1, 0) a2 := f (2) (i2, a1) al-3 := f (l-3) (il-3, al-4) al-2 := f (l-2) (il-2, al-3) i1 i2 il-2 Gl Gl+1 rl-1 rl il+1 il-1 il al-1 := f (l-1) (il-1, al-2) al+1 := f (l+1) (il+1, al+2) K (i1, . . . , id) = f (l) (il, al-1, al+1) Gl+2 Gd-1 Gd id id-1 il+2 rl+2 rd-2 rd-1 rl+1 ad := f (d) (id, 0) ad-1 := f (d-1) (id-1, ad) al+3 := f (l+3) (il+3, al+4) al+2 := f (l+2) (il+2, al+3) (b) Tensor in TT-format, corresponding to the computational tree Figure 2 : Computation tree we can handle and the resulting TT-decomposition The computation tree of this procedure is shown on Fig. 2a . The underlying idea of reducing a non-linear transformation f (k) (i k , a k-1 ) → a k to a linear one v k-1 G k (:, i k , :) → v k is quite simple: we assign to each possible input and output value of the derived function a different basic vector e, then the specified transformation can be represented as matrix-vector multiplication (see Theorem 2.1 below). Thus, each function f (k) corresponds to a core in the TT-decomposition of the resulting tensor, see Fig. 2b . In this figure, the cores of the expansion, which are 3-dimensional tensors, are represented by circles. The blue lines with indices correspond to the input indices of the tensor. The black lines connecting the cores correspond to the dimensions, by which the summation takes place. The red arrows show the correspondence between the output vector obtained by successive multiplication of the cores, starting from the left or right end, and the values of the derivative functions. We write the first argument of the derivative functions f (k) as a lower index when describing examples and applications.

2.2. REPRESENTATION OF A MULTIVARIATE FUNCTION AS A SEQUENCE OF FUNCTIONS

Unfortunately, there is no single algorithmic approach for constructing a sequence of derivative functions based on a given analytical dependence of the tensor value on its indices. Moreover, such a representation is not unique and different representations can lead to different ranks of the resulting tensor. Several examples of different sets of derivative functions for the same tensor are given further in the section with experiments and in Appendix. However, it turns out that there are fairly general patterns for derivative functions, and we hope that, based on the examples we give from various applications of the method, it is easy to construct by analogy the desired sequence of derivative functions for a particular application problem. As an example, consider step function in the so-called Quantized Tensor Train decomposition (QTT) ( (Oseledets, 2010; Khoromskij, 2011) ) when the tensor indices are binary i k ∈ {0, 1} and all together represent a binary representation of some integer from the set {0, 1, . . . , 2 d -1}. The value of the tensor I represents the value of some given function P defined on this set, with function P equal to the step function P = P step in this example: I(i 1 , i 2 , . . . , i d ) = P d-1 j=0 i d-i 2 j , P step (x) = 1 x>t := 0, x ≤ t, 1, x > t (4) for the given integer number t, 0 ≤ t < 2 d . Let the binary representation of t be t = d-1 j=0 b d-i 2 j . Then the form of the derivative functions for this tensor are depend only on the value of b k and do not depend on the index k itself. This function are the following: If b k = 0, then f k 0 (x) := f (k) (0, x) = x, f k 1 (x) := f (k) (1, x) = 1; if b k = 1, then f k 0 (x) := f (k) (0, x) = 1, x = 1 None, x = 0. , f k 1 (x) := f (k) (1, x) = x. In our method, the functions f are predefined in the following way. If, in the process of calculating a tensor value, the function f (k) arguments are not in its domain, we assume that it returns an empty value (we denote this value by None as in Python language). The next function (f (k-1) or f (k+1) ), having received None, returns also None, and so on, up to the "middle" function f (l) , which returns 0 if at least one of its arguments is None. A detailed explanation with technical details of how such a set of derivation functions leads to the step function is given in the Appendix. In this example, the "middle" function is the last function: l = d, thus we consider it as a function of two arguments. The same is true for other examples in which the "middle" function is the first or last. Note that in this example, the original analytic representation for the tensor did not assume pairwise interaction of the indices. On the contrary, the formula (4) is quite integral: its value depends on all variables at once. Nevertheless, the expressions for the derivative function turned out to be simple enough. This situation holds for many examples as well, see Applications and Appendix. Thus, our method can find a wide application. It is worth noting that the arguments of the functions f have different physical meaning. The first argument is the original natural number, it corresponds to an index of the tensor under consideration. By contrast, the second argument and the function value itself determine the relation between two indices of the tensor and this relation can be complex. In the general case, middle-function f (l) is complex-valued, and the values of all other function f (k) , k ̸ = l, can be of any nature for which a comparison operation is defined.

2.3. TT-DECOMPOSITION WITH THE GIVEN DERIVATIVE FUNCTIONS

Theorem 2.1. Let D j be the image of the j-th function f (j) from the derivative function set for 1 ≤ j < l ("left" function): j) is defined at (i, x) , j = 1, . . . , l -1. D j = f (j) (i, x) : 1 ≤ i ≤ n i , x ∈ D j-1 , if f ( We let D 0 = {0}. Similarly for the "right" functions with l < j ≤ d: j+1) is defined at (i, x) , j = l, . . . , d-1, D j = f (j+1) (i, x) : 1 ≤ i ≤ n i , x ∈ D j+1 , if f ( where we let D d = {0}. Then there exists TT-representation of the tensor K (3) with TT-ranks r not greater than r = {|D 0 | = 1, |D 1 | , . . . , |D d-1 | , |D d | = 1}, where |A| denotes the cardinality of a set A. Proof. The proof is constructive. The construction of the required cores of the TT-decomposition takes place in two stages. We first enumerate elements of all images {D i } d i=0 in arbitrary order, such that we can address them by index. Denote D j [n] the n-th element of j-th image, n = 1, . . . , |D j |. Now we can switch from the initial functions f with arbitrary values to functions f , the range of each is a consecutive set of natural numbers starting from 1: f (j) (i, x) := index of f (j) i, D j-1 [x] , D j , j = 1, . . . , l -1, x = 1, . . . , |D j-1 | f (j) (i, x) := index of f (j) i, D j+1 [x] , D j , j = l + 1, . . . , d, x = 1, . . . , |D j+1 | , (5) where function index of is defined as follows z = index of(y, A) ⇐⇒ y = A[z] for some ordered set A and any y ∈ A. We let index of(None, A) := None. Function f (l) , which corresponds to the "middle" function f (l) , is defined as follows f (l) (i, x, y) := f (l) i, D l-1 [x], D l [y] . In the second stage, we assign each integer input and output of new functions { fi } to a corresponding basis vector efoot_2 . The basic idea is to construct the j-th "left" core (j < l) of the desired TTdecomposition corresponding to the function f (j) according to the following scheme: if y = f (j) (i, x), then (e (x) ) T G j (:, i, :) = (e (y) ) T , i = 1, . . . , n j where G j (:, i, :) ∈ R |Dj-1|×|Dj | denotes the matrix representing the i-th slice of j-th core. The elements of this core are constructed explicitly: G j (x, i, y) = 1, if y = f (j) (i, x) 0, else , x = 1, . . . , |D j-1 | , y = 1, . . . , |D j | We do the same for the "right" cores or which i > l, except that multiplication on the basis vector takes place on the right: if y = f (j) (i, x), then G j (:, i, :)e (x) = e (y) , i = 1, . . . , n j . Finally, we construct the middle-core G l which corresponds to the function f (l) : G l (:, i, :) = f (l) (i, x, y), if f (l) defined on (i, x, y) 0, else , x = 1, . . . , |D l-1 | , y = 1, . . . , |D l | . We summarize this two stages in Algorithms 1-2 in Appendix. Theorem statement follows from this construction: after multiplying m "left" constructed cores, 1 ≤ m < l, starting from the first one, we get the following basis vector: G 1 (1, i 1 , :)G 2 (:, i 2 , : ) • • • G m (:, i m , :) = (e (am) ) T , where a m is defined in (1). A similar basis vector e (ap) is obtained by successive multiplication of all cores, starting from the last one with the index d and up to some p > l with a p defined in (2). Conclusively, the statement of the theorem follows from the relation e (a l-1 ) G l (:, i l , :)e (a l+1 ) = K (i 1 , i 2 , . . . , i d ) which is a consequence of the definition of the elements of the middle-core G l and which corresponds to the relation (3).

Rank reduction

One way to reduce the TT-rank in the case when an image D i of a function f (i) contains too many elements is to partition the image D i into several sets and map the basis vector e, discussed in the second stage of the Theorem, to one of these sets. This is possible if the value of the function belongs to a space with a given topology. In the simplest case, when the value of the function is real, we can combine into one specified set only those elements from the image of the function for which |x -y| < ϵ, x, y ∈ D i is satisfied with the given ϵ. In addition, we can specify a maximum number of such sets, thus fixing the maximum rank, increasing the value of ϵ or combining some sets with each other. Other ways of reducing the rank are described in the Appendix. Complexity The cores, except perhaps the middle-core, obtained by our method are highly sparse. Each row of the slice G(:, i, :) of the core to the left of the middle-core consists of zeros, except maybe one unit. The same is true for the columns of the core to the right of the middle-core. When multiplying a slice of a core by a vector, we consider only those operations which do not involve addition with zero or multiplication by zero or unit. When using the compressed format, formally we do not need addition and multiplication to obtain a single tensor element, since its calculation is reduced only to choosing the desired slice indices. Consider a TT-tensor G obtained by our method with dimensions {n 1 , n 2 , . . . , n d } and ranks {r 0 , r 1 , . . . , r d } with the middle-core at the position l, 1 ≤ l ≤ d. To calculate the convolution of the tensor G and an arbitrary rank-one tensor W of the form ⟨G, W⟩ = r1, r2, ..., r d-1 n1, n2, ..., n d G 1 (1, n 1 , r 1 ) . . . G d (r d-1 , n d , 1)w(n 1 ) • • • w(n d ), where w(n) := W (1, n, 1), we need no more than n conv additions and no more than n conv multiplications with n conv = l-1 i=1 n i r i-1 + d i=l+1 n i r i + n m r l-1 r l + min(r l-1 , r l ) . Indeed, the first two sums in the last expression correspond to successive multiplication of the vector by the current core slice for successive multiplication from each end of the tensor train up to the middle kernel with index l. The last term corresponds to the multiplication of the two resulting vectors by the middle-core (which we assume dense) left and right.

3. APPLICATIONS

As a practical application of our method, in this section we give: a) examples from the field of cooperative games, where we compare with existing methods and show our superiority both in execution time and in accuracy (our algorithm gives machine precision); b) the problem of calculating matrix permanent, where our method gives an estimated number of operations only twice as large as the optimized ad hoc method of calculating the permanent using Hamiltonian walks. As an example, consider several so-called cooperative games (von Neumann & Morgenstern, 2007) . Omitting the details of the economic formulation of the problem, let us briefly consider its mathematical model. In general, in the theory of cooperative games it is necessary to calculate the following sum over all subsets of the given set T of players

3.1. COOPERATIVE GAMES EXAMPLES

π(k) := S⊆T\{k} p(|S|) ν(S ∪ {k}) -ν(S) , for all k ∈ T. ( ) Here p is some function of the number of players in a coalition S. The function of a coalition ν is the function of interest, it depends on the game under consideration. This function denotes the gain that a given coalition receives (value of the coalition). Below we briefly review several cooperative games and compare the performance and accuracy of our algorithm for solving them with the algorithm presented in (Ballester-Ripoll, 2022), which is based on the TT-cross approximation methodfoot_3 . Due to space constraints, we give only a brief statement of the problem, the constructed derivative functions and the results of our method. A detailed description of the application of our approach to cooperative games can be found in Appendix. The results of the algorithm comparison are shown in Fig. 3 . In all the examples shown in this figure our algorithm produces results with machine accuracy and faster than the one from the cited paper. Shoe sale game. In this game, participants are divided into two categories-those who sell left boots (indices 1 through L) and those who sell right boots (indices L + 1 through 2L + 1). As shoes can be sold only in pairs, the value of a coalition is the minimum of the numbers of "left" and "right" players in a coalition. Let us build tensors for this game. To find the required value π (8) in the case of cooperative games, it is convenient to construct tensors that have a dimension equal to the number of players. Each index of this tensor is binary: 1 means a player is a member of a coalition, 0 means he is not. To construct the TT decomposition of the tensors p(|S|)ν(S) using our method, let us take the following derivative functions: f (k) i (x) = x+i, 1 ≤ k ≤ d, k ̸ = L+1, f (L+1) i (x, y +i) = min(x, y)p(x+y +i), i = 0, 1, thus middle-core is placed on the position l = L + 1. The derivative functions for constructing the tensor p(|S| -1)ν(S) are selected in a similar way (we let p(-1) = p(2L + 1) = 0). Once the cores for both tensors are constructed, we can calculate the sum ( 8) for different values of k by taking the corresponding slices of the cores by the method described in (Ballester-Ripoll, 2022) and performing the convolution. Airport. It is not a cooperative game in its purest form, as instead of gain we have a payoff, but the mathematical essence is in the spirit of cooperative games. Each player represents an aircraft that needs a landing strip of length c k . Thus, ν(S) = max{c i : i ∈ S}. In order to construct a TTrepresentation of the tensor corresponding to this ν, let us first order the values of c k in descending order. Then the derivative functions are f k 0 (x) = x, f k 1 (x) = x, x > 0 c k , x = 0, , 1 ≤ k ≤ d. However, with these derivative functions the TT-ranks can get very large, especially for noninteger c k . To reduce the ranks, we do the following trick: we break the second function f k 1 into two terms, f k 1 (x) = f k (1) (x) + c k • f k (2) (x), f k (1) (x) = x, x > 0 None, else , f k (2) (x) = 1, x = 0 None, else . After that, we use three derivative functions to build TT-cores, taking the multipliers c k out of the build core as shown in Figure 4a . This gives us a TT-Tucker format with TT-ranks equal to 2 and matrices A k with the following elements: The dashed line shows that at this point the rank is 1, which means that there is no summation and the corresponding cores are independent. A k = 1 0 0 0 1 c k . A1 G1 3 2 A2 G2 3 2 A3 G3 3 2 A4 G4 3 2 A5 G5 3 2 A6 G6 3 2 2 2 2 2 2 (a) Airport game A:, 1 G1 6 A:, 2 G2 6 A:, 3 Other games. Fig. 3 also shows comparisons with the other two games. In Weighted majority game tensor ν has the form ν(S) = 1 if i∈S w i ≥ M and 0 otherwise for given weights {w i } and threshold M . In Bankruptcy game, ν(S) = max(0, Ei / ∈S c i ) for the given values of {c i } and E. The derivative functions for these problems are chosen quite straightforwardly.

3.2. MATRIX CALCULUS: PERMANENT

Consider a task of calculating permanent of a matrix {a ij } = A ∈ C d×d . To solve this problem using the presented technique, let us construct two tensors in TT-format. The first tensor A will represent products of matrix A elements in the form: A(i 1 , i 2 , . . . , i d ) = A i11 A i22 • • • A i d d . This is rank-1 tensor and its cores {H k ∈ C 1×d×1 } d k=1 are H k (1, i, 1) = A ik , i = 1, . . . , d. The second tensor I is an indicator tensor for such a set of indices, in which all indices are different I(i 1 , i 2 , . . . , i d ) = 1, if all i 1 , i 2 , . . . , i d are different, 0, else. The cores G of this tensor are obtained using the following derivative functions f k i (x) = x + 2 i , x & 2 i = 0, None, else , k < d; f d i (x) = 1, x & 2 i = 0, 0, else , where the ampersand sign stands for bitwise AND for integers. In this scheme the middle-core is the last (d-th) core. The k-th bit of the input x of the derivative functions corresponds to the i-th value of the set {1, 2, . . . , d}: if this bit is zero, then the i-th value of the index has not occurred before, and thus this value is valid for the current index. The function sets this bit in x and the value x is passed on. If the bit has already been set, the derivative function returns None, since two identical indexes are forbidden. The permanent value is equal to the convolution of tensor I and tensor A. Since the tensor A is onerank tensor, we can look at the computation of the permanent as a contraction of the tensor I with weights equal to the corresponding elements of the given matrix A, see Fig. 4b . After calculating the function f k i we get a number x in the binary representation of which there are exactly k bits equal to one. From this one can concludes that the corresponding rank r k is equal to r k = d! (d-k)!k! . Using relation ( 7) we can obtain an upper estimate on the total number of operations for the convolution of the given tensor as 2n conv ∼ (2 N +1 N ). However, one can notice that on average half of the rows in the cores slices consist entirely of zeros, since index repetition is "forbidden": G k (i, j, :) = 0 at index i corresponds to the x with the j-th bit set. Thus, n conv = 1/2n conv and after the cores are built, total number of operations n tot (both additions and multiplications) required to obtain the result at these ranks has asymotics n tot = 2n conv ∼ 2 N N . This asymptotic is better than the one that can be obtained from the well-known Ryser's formula for calculating the permanent: P (A) = (-1) N S⊆{1,2,...,N } (-1) |S| N i=1 j∈S A ij . When applied head-on, this formula requires O(2 N -1 N 2 ) operations. It is true that if one uses Hamiltonian walk on (N -1)-cube (Gray code) for a more optimal sequence of subset walks, this formula will give asymptotic n tot ∼ (2 N -1 N ) which is only twice as good as ours (Nijenhuis & Wilf, 1978, pp. 220-224 ). This is an example of a problem where we can first pre-calculate the tensor I with conditions and then reuse it with different data tensors A containing elements of a particular matrix.

4. CONCLUSIONS AND FUTURE WORK

We presented an algorithm for constructing the tensor in TT-format in the case when an explicit analytic dependence is given between the indices. The cores of the obtained TT-tensor are sparse, which speeds up manipulations with such a tensor. Examples are given in which our method can be used to construct TT-representations of tensors encountered in many applied and theoretical problems. In some problems, our representation yields an answer faster and more accurately than state-of-the-art algorithms. Thus, in the vast majority of cases, we are not faced with a trade-off between speed and accuracy. As a limitation of our method, let us point out the fast growth of the ranks in the case when the derivative functions have a large size of their image set. Although we have a rank restriction procedure for this case, as plans for the future we specify an extension of our algorithm to accurately construct a TT-decomposition for such cases as well, if it is known to be low-ranked.

APPENDICES A ALGORITHMS

Algorithms 1-2 summarize the constructive construction of TT-decomposition cores, which is described in Theorem 2.1. Here, the function order orders the set in any way, dom denotes the domain of a function. If a function returns None on some set of its arguments, then we assume that it is not defined on that set.

B OTHER APPLICATIONS

The examples in this section fall into two (possibly intersecting) broad types. The first of them contains the calculation of the exact value of some quantity for which the TT decomposition of the tensor arising in the problem is needed. In this case, we do not perform a TT tensor rounding (except when we reduce the rank using SVD decomposition with zero threshold, but we assume that after that the tensor value remains the same within machine accuracy). From a technical point of view, usually the convolution operations (or other similar operations) are performed without explicitly constructing the cores of TT-decomposition completely in computer memory. Instead, we use the functions obtained after applying the Algorithm 1 directly as arrays of their values. This is equivalent to working with matrices in a compressed format. The second type of application consists of problems in which rounding with a given accuracy is the advantage of the tensor approach-we get the answer with a certain error, but faster.

B.1 SIMPLE (MODEL) EXAMPLES

Sum Consider the following tensor S, which is some function P of the sum of the elements of the given vectors a 1 , a 2 , . . ., a d : I(i 1 , i 2 , . . . , i d ) = P (a 1 [i 1 ] + a 2 [i 2 ] + . . . + a d [i d ]). We can easily build its TT-representation using the presented technique if we put the derivative functions equal to f k i (x) = x + a k [i]. The view of these functions is the same for all cores except for the middle-core. The middle-core can stand in any place in this case, but it makes sense to put it in the middle of the tensor train (at the position ⌊(d + 1)/2⌋) to reduce the TT-ranks. For the middle-core on the m-th place the derivative functions are equal to f m i (x, y) = P (x + y + a m [i]). In the simple case, when the vector elements are consecutive numbers from zero to a given number: a i = {0, 1, . . . , b i }, we have a natural correspondence between the value of the function y and the basis vector e T (i) representing it: y ⇐⇒ e (y+1) Thus, for example, the third slice (of any of first m -1 cores) corresponding to the addition of number 2 will be of the form G(:, 3, :) =              0 0 0 0 0 • • • 0 • • • 0 0 0 0 0 0 • • • 0 • • • 0 1 0 0 0 0 • • • 0 • • • 0 0 1 0 0 0 • • • 0 • • • 0 0 0 1 0 0 • • • 0 • • • 0 . . . 0 0 0 0 0 • • • 1 • • • 0              , Algorithm 1 Construction of the integer-valued functions based on the given complex-valued functions Require: Middle-index l, set of functions {f (i) } d i=1 of two variables (function f (l) have 3 arguments) Ensure:  Functions { f (i) j } 1: # Initialization 2: n k = maximum R[i] ← order({f (i) (k, x) : k = 1, . . . , n i , x ∈ R[i -1]}) 7: end for 8: for i = d -1 to l step -1 do 9: R[i] ← order({f (i) (k, x) : k = 1, . . . , n i , x ∈ R[i + 1]}) 10: end for 11: # Part II. Defining new functions 12: # From the left 13: for i = 1 to l -1 do 14: for j = 1 to n i do 27: for j = 1 to n i do 15: for k in R[i -1] do 16: if defined f (i) j (k) then 17: x ← index of k, R[i -1] 18: y ← index of f (i) (j, k), R[i] 19: f (i) j for k in R[i] do 28: if defined f (i) j (k) then 29: x ← index of k, R[i] 30: y ← index of f (i) (j, k), R[i -1] 31: f (i) j (x) := y 32: end if 33: end for 34: end for 35: end for 36: # Middle-index function 37: for j = 1 to n l do 38:  for k 1 in R[l -1], k 2 in R[l] do 39: if defined f (l) (j, k 1 , k 2 ) then 40: x 1 ← index of k 1 , R[l -1] 41: x 2 ← index of k 2 , R[l] 42: f (l) j (x 1 , x 2 ) := f (l) (j, k 1 , k 2 ) G i ← zeros max j sup(dom f (i) j ), n i , max j, x f (i) j (x) 6: for j = 1 to n i do #Build each slice of the core as in (6) 7: for x in dom f (i) j do 8: G i (x, j, f (i) j (x)) ← 1 9: end for 10: end for 11: end for 12: # From tail 13: for i = d to l + 1 step -1 do 14: G i ← zeros max j, x f (i) j (x), n i , max j sup(dom f (i) j ) 15: for j = 1 to n i do 16: for x in dom f (i) j do 17: G i ( f (i) j (x), j, x) ← 1 18: end for 19: end for 20: end for 21: # Build middle-index core 22: G l ← zeros max j max({x 1 : (x 1 , x 2 ) ∈ dom f (l) j }), n l , max j max({x 2 : (x 1 , x 2 ) ∈ dom f (l) j }) 23: for j = 1 to n l do 24: for (x 1 , x 2 ) in dom f (l) j do 25: G l (x 1 , j, x 2 ) ← f (l) j (x 1 , x 2 ) 26: end for 27: end for 28: Return cores G 1 , . . . , G d . This matrix can be written in block form as G(:, 3, :) = O 1 O 2 I O 3 , where O 1 , O 2 and O 3 are zero matrices, and I is the identity matrix. If the result of the calculation of the previous function is 3, which corresponds to the vector (e (4) ) T as the result of multiplication of all previous kernels, then after multiplying by the slice data we obtain (e (6) ) T = (e (4) ) T G(:, 3, :). The basis vector (e (6) ) T expectedly corresponds to the value of the function equal to 5. But in the more complex case, the correspondence between the function value and its vector representation may not be obvious. Consider the following first two vectors a 1 = {1, 2, 10}, a 2 = {-1, 0, 5, 8}. Then the second slice of the second core G 2 (; , 2, :) which is generated by the function f (x) = x+0, is equal to G 2 (; , 2, :) =    0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0    . This is true because the first derivative function has three values, which correspond to the first three basis vectors: 1 ⇐⇒ e (1) , 2 ⇐⇒ e (2) , 10 ⇐⇒ e (3) , whereas the value area of the second derivative function has 8 elements, which correspond to 0 ⇐⇒ e (1) , 1 ⇐⇒ e (2) , 2 ⇐⇒ e (3) , . . . , 10 ⇐⇒ e (7) , 15 ⇐⇒ e (8) , 18 ⇐⇒ e (9) . Note that in the examples above we chose to assign basis vectors to the values of the derived functions according to their their ascending sorting. This is not a crucial point, since this correspondence is conditional, it can change and be different for each core. In the degenerate case, when the function P is identical: P (x) = x we can construct the desired TTdecomposition with ranks equal to 2. Namely, in this case the cores have the following explicit form G 1 (:, i, :) = 1, a 1 [i] ; G k (:, i, :) = 1 a k [i] 0 1 , 2 ≤ k ≤ d -1; G d (:, i, :) = a d [i] 1 . (9) These cores can be constructed using our techniques as follows. Consider the following tensor with binary indices i k ∈ {0, 1} which is equal to 1 iff only one of its indices is 1: I(i 1 , i 2 , . . . , i d ) =      1, d k=1 i k = 1, 0, else . We can construct its cores G using the following derivative functions f k 0 (x) = x, f k 1 (x) = 1, x = 0, None, else. And consider one-rank tensor H with values to be summed up with the following cores K k (:, 0, :) = (1), K k (:, 1, :) = (v k ). For this tensor it is true that H(0, 0, . . . , 0 k-1 , 1, 0, . . . , 0) = v k , thus its convolution with the tensor I gives the sum of elements of v: 1 i1=0 1 i2=0 • • • 1 i d =0 I(i 1 , i 2 , . . . , i d )H(i 1 , i 2 , . . . , i d ) = d k=1 v k . ( ) This convolution operation is shown schematically in Fig. 5a . c1 G1 2 c2 G2 2 c3 G3 2 c4 G4 2 c5 G5 2 2 2 2 2 (a) Representation of the sum (10), c k = {1, v k } C1 G1 2 n1 C2 G2 2 n2 C3 G3 2 n3 C4 G4 2 n4 C5 G5 2 n5 2 2 2 2 (b) Obtaining G via the operation (11) Figure 5 : Tensor network for TT-tensor for simple sum for d = 5. Now, to be able to select the terms by choosing the indices of the tensor G, we replace the vectors a in this network with matrices C k of the following form C k = 1 1 1 • • • 1 1 v k [1] v k [2] v k [3] • • • v k [n k -1] v k [n k ] , where n k is the length of the vector a k . Finally, we obtain the cores G k by the convolution of the cores G k and the matrices C k (see Fig. 5b .): G k (:, i, :) = 1 j=0 G(:, j, :)C k (j, i). Step function in QTT format The definition of step function in QTT format and its derivative functions are given in Section 2. Let us consider in more detail how these functions give the desired value of the tensor built on them. Further, we assume that the number t appearing in the definition of the step function is constant, and that the multi-index {i 1 , i 2 , . . . , i d } we choose to obtain a particular tensor value may be different. Recall, that t has the following bit decomposition t = d-1 j=0 b d-i 2 j . Multi-index {i 1 , i 2 , . . . , i d } have the meaning of bit decomposition of the argument x of the step function x = d-1 j=0 i d-i 2 j . Thus, if x ≤ t then the tensor must return 0, otherwise its value is equal to 1. Recall, that in this example derivative functions are the following: If b k = 0, then f k 0 (x) := f (k) (0, x) = x, f k 1 (x) := f (k) (1, x) = 1; if b k = 1, then f k 0 (x) := f (k) (0, x) = 1, x = 1 None, x = 0. , f k 1 (x) := f (k) (1, x) = x. Assume, that for some k, 1 ≤ k < d all bit in the numbers of t and x before this index are coincide: b j = i j , 1 ≤ j < k and their k-th bits are different. As the first case consider the case when b k = 0 but i k = 1. It means that x > t and the value of the tensor must be 1: I = 1, regardless of the correlation between subsequent bits, as they correspond to the lower bits. Let's trace what values the derivative functions receives as input and outputs to get the value of 1 at the end. For bits with indices less than k, the derivative functions are equal to the identity function. Indeed, if for some 1 ≤ j < k, b j = 0, it means, that i j = 0 and derivative function corresponding to this sequence member is taken from the first line of the definition above: f (j) (0, x) = x. If for some 1 ≤ j < k, b j = 1, then derivative function corresponding to this sequence member is taken from the second line of the definition above: f (j) (1, x) = x. Thus, the k-th derivative function receives as input the initial value that is fed to the input of the very first function and in this case this value is not essential. The k-th derivative function, in turn, returns 1 according to its definition in the first line of the expression with definitions of f : f (k) (1, x) = 1. This value will be "carried" to the end without change as for all derivative functions in this example equality f (1) = 1 holds. It gives the resulting value 1 of the TT-tensor. So we have achieved the expected behaviour in this case. If, on the contrary, b k = 1 and i k = 0 then we surely know that I = P (x) = 0 as x < t in this case. As we discussed in the previous case, the k-th derivative function receives an initial value as input. In this case it is important that this value is equal to 0. The k-th derivative function is undefined at this value of the argument, which we symbolically write as f (0) k (0, 0) = None. All derivative functions are pre-defined in such a way that they always return None if they are not an middle-function, and the middle-function returns 0 if it has at least one argument that is None. In our example, the middle-function is the last (d-th) function, and therefore has two arguments. Thus, the None value will be "carry on" until the last function, which will return 0. In the language of vectors, this means that after multiplication by the k-th core we get a zero vector, not a basis vector, so that for any subsequent values of the indices the result will be zero. So we have achieved the expected behaviour in this case too. In the remaining case when b j = i j for all 1 ≤ j ≤ d, the derivative functions leave the initial argument unchanged. This argument is 0. Thus, the constructed TT-tensor have the expected value in this case as well. The maximum TT-rank value in this example is no more than 2. For the case of rank equal to 2, the explicit form of the cores are the following, for b k = 0: G k (:, 0, :) = 1 0 0 1 , G k (:, 1, :) = 0 1 0 1 , b k = 0, and for b k = 1: G k (:, 0, :) = 0 0 0 1 , G k (:, 1, :) = 1 0 0 1 , b k = 1. By direct calculations it can be seen that if we assign to number 0 the basis vector e (1) = {1, 0}, and to number 1 the basis vector e (2) = {0, 1}, then multiplication of these vectors by the prescribed matrices gives exactly the same results as the application of derivative functions for this example.

B.2 COOPERATIVE GAMES

In this subsection, we take a closer look at the examples that were briefly given in Section 3.1, and we also give some more examples of applying our methods to cooperative games, which are described in (Ballester-Ripoll, 2022). Shoe sale game Brief game conditions and the derivative functions for this game are described in general form above. Consider what happens when a certain number k is given, i. e. the order number of the player for whom we calculate the payoff π(k). For each k we rebuild tensors p(|S|)ν(S∪{k}) and p(|S|)ν(S). At the k-th place we leave only one function of the two: f (k) 1 for the first specified tensor, and f (k) 0 for the second. Thus, although we formally have a tensor of dimension d, there is actually no summation at the k-th index, and we obtain a sum over all subsets of the set T \ {k}. In addition, in the case of the tensor p(|S|)ν(S ∪ {k}), we subtract a unit from the argument p since in this case p(|S|)ν(S ∪ {k}) = p(|S ∪ {k}| -1)ν(S ∪ {k}). When we go through the computational tree of Figure 2a using the derivative functions of this example, we have the following. Let k > L + 1 and multi-index {i 1 , i 2 , . . . , i d } has the meaning given in the main article: if any index i j equals 1, it means that the player with number j is part of the coalition, if 0, he is not. When we go from the left end, each derivative function adds 1 to its argument if the corresponding index is 1. If the corresponding player is not part of the coalition, the corresponding index is zero and the derivative function returns the argument unchanged. The "left" functions thus count the number of players in the coalition. When passing from the right end the same happens, the number of players with numbers greater than L + 1 ("right players" in terms of this cooperative game) is counted, except for the player with the number k. This player is always counted when we build cores for the tensor p(|S|)ν(S ∪ {k}) and is not counted is the case of tensor p(|S|)ν(S) as explained above. Thus the middle-function with index L + 1 gets two arguments: x-the sum of the number of "left" players and y-the sum of the number of "right" players, excluding the player with the number L+1. The middle-function adds one to y if index i L+1 is 1, and does not change it otherwise. Finally, the middle-function returns min(x, y)p(x + y) which coincides with the expected value of the tensor. At the next step we contract each tensor with ones, namely, we calculate the following expression n1 i1=1 G 1 (1, i 1 , :) n2 i2=1 G 2 (:, i 2 , :) • • • n d i d =1 G d (:, i d , 1) from left to right in sequence. In the case of this example, all n j = 2, j = 1, . . . , d. Thus we build and contract 2N = 4L + 2 tensors, but since we do not build full cores in memory and instead work in a compressed format, we get the computation times shown in Fig. 3a . Airport Brief game conditions and the derivative functions for this game are described in general form above. In this problem we once explicitly build the cores for the tensor ν, making a convolution with matrices A k (see Fig. 4a ). This convolution in fact, is reduced to the replacement of the corresponding unit in the core slice by the value of c k . Separately, we construct the cores for the tensor p, which is the given function of the sum of all indices equal to 1. The construction of such a tensor is described at the beginning of Sec. B.1; in this case we put a j [0] = 0, a j [1] = 1, j = 1, . . . , d. To calculate the value of π(k), we applied a trick similar to the one used in (Ballester-Ripoll, 2022). Namely, for each k we have left the core G (p) k from the TT-decomposition to the tensor p only the first slice of dimensionality, which corresponds to the index i k = 0, so the following is true for the new core G (p) k : G (p) k (:, 0, :) = G (p) k (:, 0, :), G (p) k ∈ R r (p) k-1 ×1×r (p) k . For the core G (ν) k from the TT-decomposition of the tensor ν we also removed the second slice and took the slice difference of the original core as the first slice: G (ν) k (:, 0, :) = G (ν) k (:, 1, :) -G (ν) k (:, 0, :), G (ν) k ∈ R r (ν) k-1 ×1×r (ν) k . For the final result, we have taken a convolution of these tensors, which in TT-format is written as π(k) = 2 i1=1 G (p) 1 (1, i 1 , :) ⊗ G (ν) 1 (1, i 1 , :) 2 i2=1 G (p) 2 (:, i 2 , :) ⊗ G (ν) 2 (:, i 2 , :) • • • • • • G (p) k (:, 1, :) ⊗ G (ν) k (:, 1, :) • • • 2 i d =1 G (p) d (:, i d , 1) ⊗ G (ν) d (:, i d , 1) , where ⊗ denotes the Kronecker product of matrices. For the numerical experiments we take values c k as i.i.d. random values uniformly distributed on the interval [0, 1]. Weighted majority game For this game, briefly described in Section 3.1, we took the following derivative functions for the tensor p(|S|)(ν(S ∪ {k}) -ν(S)) of dimension d = |T| -1: f j 0 (x) = x, f j 1 (x) = {x[1] + w j , x[2] + 1}, x[1] + w j ≤ M, None, else , 1 ≤ j < k, f j 0 (x) = x, f j 1 (x) = {x[1] + w j+1 , x[2] + 1}, x[1] + w j+1 ≤ M, None, else , k ≤ j < d, and for the middle-function, which is the last one in this example, we have f d i (x) = p(x[2] + i), x[1] + iw d+1 + w k ≥ M > x[1] + iw d+1 , None, else , i = 0, 1. (In the case of k = |T| = d + 1, we take w d instead of w d+1 ). Note that in this example the derivative functions are defined on a set of vectors of length 2. The first component of this vector accumulates a sum of weights {w j } to compare with the threshold M . The second component of the input vector counts the number of players in the coalition S, this value is passed to the function p in the last derivative function. If at step j, j < d it turns out that the accumulated sum of weights already exceeds the threshold M (x[1] + w j > M ) then the derivative function returns None, thus zeroing out the value of the tensor. Indeed, in this case, the difference ν(S ∪ {k}) -ν(S)) will be obviously equal to zero regardless of what other coalition members are added, i.e. regardless of i l , l > j index values. This trick reduces the TT-ranks of the resulting tensors and thus reduces the execution time. In this example, we construct the cores of the specified tensor N = |T| times, and then perform its convolution by the formula (12) in the sparse format. For the numerical experiments we take values {w k } as i.i.d. random integers uniformly distributed on the interval [1, 10] and take threshold equal to M = ⌊1/2 k w k ⌋ + 1. Technically, we can avoid passing vectors to the deriving functions, but limit ourselves to an integer argument X equal to X = x[1] + x[2]N big , where N big is a sufficiently large integer (N big = 2 15 ). Then in each derivative function we produce an "unpacking" x [1] = X mod N big and x[2] = ⌊X/N big ⌋. Bankruptcy This game has a function of max on the sum of the values, so it is mathematically similar to the airport game, but we have chosen a different way of constructing derivative functions. Namely, we take f j 0 (x) = {x[1] -c add (i), x[2]}, x[1] -c add (i) > 0, None, else , f j 1 (x) = {x[1], x[2] + 1}, 1 ≤ j < d, where c add (i) = c[i] if i < k and c add (i) = c[i + 1], else. For the middle function, which is the last one (l = d = N -1), we have: f d 0 (x) = x[1] -c add -max(0, x[1] -c add -c[k]) p(x[2]), x[1] -a add > 0, 0, else , where c add = c[d + 1] if k < d + 1, and c add = c[d] if k = d + 1; f d 1 (x) = x[1] -max(0, x[1] -c[k]) p(x[2] + 1). Note that in the case of this example, the initial functions f 1 are given the value {E, 0} as input instead of 0. This is done to simplify the type of derivative functions. Like in Weighted majority game, derivative functions take as input a vector of 2 elements, the first of which accumulates the values of {c j }, and the second element summarizes the number of players in a coalition. In this example, we construct the cores of the specified tensor N = |T| times, and then perform its convolution by the formula (12) in the sparse format. For the numerical experiments we take values {c k } as i.i.d. random integers uniformly distributed on the interval [1, 10] and take the value of E equal to E = 1/2 k c k . One seller market game In this game, first player in selling some good, players 2, . . . , |T| offer prices {a 2 , a 3 , . . . , a |T| } for this good, a k ≥ 0. If the first player is in a coalition S, the value of this coalition is equal to the maximum price offered by the members of this coalition ν(S) = max k∈S, k̸ =1 a k . If there is no first player in the coalition, its price is zero: ν(S) = 0. For the first player we take the following derivative functions f 1 0 (x) = None, f 1 1 (x) = 0, and for the rest of the players: f k 0 (x) = x, f k 1 (x) = max(x, a k ). The problem under consideration is another example of such a tensor construction, the ranks of which depend on the order of the indices. For an unsuccessful sequence, the TT-ranks of the resulting tensor can be large and it is necessary to perform an SVD-step for reducing the TT-ranks. The final ranks depend on specific values of {a k }. To reduce the ranks, we we can change the sequence of players, as it will not affect the calculation of the sum (8). Namely, place the first player first, and sort the other players according to their predetermined prices in descending order. With this sorting we can take the same derivative function for k > 1 as in Airport game. Thus, in this problem we come to the tensor network shown in Fig. 6 with matrices A k of the same kind as the matrices in the airport problem with the following elements G1 2 A2 G2 3 2 A3 G3 3 2 A4 G4 3 2 A5 G5 3 2 A6 G6 3 2 2 2 2 2 2 Figure 6 : Tensor network for building TT-tensor for the one seller market game with optimal players ordering for |T| = 6. The numbers near lines denote ranks (number of terms in sum). A k = 1 0 0 0 1 a k , k = 2, . . . , |T| . Technically, the convolution procedure can be omitted explicitly, taking it into account at the stage of tensor construction. Note that since the cores G of the resulting TT-tensor are constructed using the same pattern as well as the cores of the tensor representing values of p, they can be built on the fly rather than stored when calculating the sum (8). Moreover, we can abandon the construction of the cores at all, immediately producing all algebraic operations, given the special (sparse) kind of kernels. For the considered problem, the sequence of matrix operations to calculate the sum ( 8) is reduced to the Algorithm 3. Algorithm 3 Algorithm for calculating the sum (8)for one seller market game, based on obtained TT-cores Require: Values {a i } sorted in descending order, number k, weight function P Ensure: Value of π k 1: v ← {0, 0, . . . , 0 2|T| } # Initialization 2: v[0 : 4] ← {0, 1} ⊗ {1, 0} 3: for i = 1 to |T| -2 do 4: if i == k then 5: for j = i to 0 step -1 do 6: v[2j + 1] ← a i+1 • v[2j] 7: v[2j] ← -v[2j] 8: end for 9: else 10: for j = i to 0 step -1 do 11: v[3 + 2j + 1] ← v[3 + 2j + 1] + a i+1 • v[2j] + v[2j + 1] 12: end for 13: end if 14: end for 15: s ← 0 16: for i = 1 to |T| -2 do 17: s ← s + v[2i]P (i + 1)a |T| 18: s ← s + v[2i + 1](P (i) + P (i + 1)) 19: end for 20: s ← s + v[2 |T| -1]P (|T| -1) 21: Return s This algorithm works only for 1 < k < |T| and it is not the most efficient. However, it is shown to illustrate the possibility of applying the technique described in the article to build this kind of efficient iterative algorithms. It is worth noting that the complexity of this algorithm is O(|T| 2 ).

B.3 KNAPSACK PROBLEM

Consider a knapsack problem. The formulation is the following: we have n type of items, each item have weight w i ≥ 0 and value v i ≥ 0, i = 1, 2, . . . , n. The task is to solve optimization problem maximize {x1, x2, ..., xn} n i=1 v i x i , s.t. n i=1 w i x i ≤ W, where {x i } n i=1 are the unknown number of each item, value W ∈ R is the given maximum total weight. By imposing different constraints on the unknowns x i , we obtain different formulations of the problem. 0-1 knapsack problem First, consider a formulation in which it is assumed that x i ∈ {0, 1} (so called 0-1 knapsack problem). To solve this problem, we construct two tensors. First tensor V represent the total cost of the knapsack V (x 1 , x 2 , . . . , x n ) = d i=1 v i x i . Since the expression on the right side of this definition is linear, this tensor can be represented in TTformat with TT-ranks of no more than 2 (see ( 9)). Namely, the cores {H k } of the TT-decomposition of the tensor V are H 1 (:, 0, :) = 1, 0 ; H k (:, 0, :) = 1 0 0 1 , 2 ≤ k ≤ d -1; H d (:, 0, :) = 1 0 ; H 1 (:, 1, :) = 1, v 1 ; H k (:, 1, :) 1 v k 0 1 , 2 ≤ k ≤ d -1; H d (:, 1, :) = v d 1 . Here we use indices 0 and 1 for the middle indices of the TT-cores so that they correspond to the physical sense of the task-the presence or absence of this item in the knapsack. The second tensor I is the indicator tensor of the condition in the knapsack problem: I(x 1 , x 2 , . . . , x n ) =      1, if n i=1 w i x i ≤ W, 0, else. We can build functional TT-decomposition of the tensor I with the following set of functions f k i (x) = x + iw[k], x + iw[k] ≤ W, None, else , i = 0, 1; 1 ≤ k ≤ d. Note that the condition of not exceeding the weight is checked in each function, i. e., the conditions for partial sums are checked: w 1 x 1 ≤ W , w 1 x 1 + w 2 x 2 ≤ W , etc. This does not affect the final result as w i ≥ 0, but it allows us to reduce the ranks of the cores. The TT-ranks in this problem are highly dependent on the specific weights of the knapsack elements: whether they are integer, how large a range of values they have, etc. The final answer to the knapsack problem is given by finding the maximum of the tensor, which is the elementwise product of the constructed tensors arg max {x1, x2, ..., xn} V (x 1 , x 2 , . . . , x n ) • I(x 1 , x 2 , . . . , x n ). The TT-cores of such a tensor are found as the Kronecker product of the slices of the multiplier cores (see (Oseledets, 2011) ). The operation of approximate finding the maximum value of the tensor in TT-format is implemented, for example, in the package ttpy. Multi-Dimensional bounded knapsack problem Another addition, which can be implemented with slight modifications of the above scheme, is related to the presence of several constraints. Namely, let there be several weights {w (j) i } associated with each item, with separate conditions imposed on them maximize {x1, x2, ..., xn} n i=1 v i x i , s.t. n i=1 w (j) i x i ≤ W (j) , j = 1, . . . , M ; x i = 0, 1, . . . N i , where M is the length of condition vector. To solve this problem, we generate M indicator tensors {I (j) } M j=1 , one for each condition, according to the algorithm above. The we find element-wise product of all this tensors, and find (quasi-) maximum element in the resulting tensor arg max {x1, x2, ..., xn} V (x 1 , x 2 , . . . , x n ) • M j=1 I (j) (x 1 , x 2 , . . . , x n ). The product of a large number of tensors I (j) can lead to a rapid growth of ranks, to get around this we can round the tensor after each successive multiplication.

B.4 PARTITION PROBLEM

Consider the partition problem in the following formulation. We have a multiset S, |S| = d of positive integers (multiset is a set with possibly repeating elements) and an integer n. The task is to partition the set S on n sets {S i } n i=1 such that the sum of elements in each set are equal: a∈S1 a = a∈S2 a = • • • = a∈Sn a = 1 n a∈S a. In order to use our approach to solve this problem, we construct n indicator tensors, each of which corresponds to one of the equalities in the expression above. Namely, the j-th tensor is defined as I j (i 1 , i 2 , . . . , i d ) =      1, if d k=1 s[k] • δ(i k , j) = T, 0, else , where s[k] is the k-th element of the (ordered in some way) set S, δ is the Dirac delta function and T := 1 n a∈S a. The maximum indices value of this tensor is n. The index value i k = l means that the k-th element of the set S belongs to the set S l . Thus we have one-to-one correspondence between indices set {i 1 , i 2 , . . . , i d } and a partition of the set S. Derivative functions for the j-th tensor are the following f k i (x) = x + s[k], if i = j, x, else , 1 ≤ k < d, 1 ≤ i ≤ n, and the middle function which is the last one: f d i (x) =    1, if x + s[d] = T and i = j, 1, if x = T and i ̸ = j, 0, else , 1 ≤ i ≤ n. Finally we construct indicator tensor I as a Hadamard product of the built tensors I = I 1 ⊙ I 2 ⊙ • • • ⊙ I n . The value of this tensor I is 1 only for the indices corresponding to the desired problem statement. We can find them by finding the maximal element of the indicator TT-tensor, see Section D in the Appendix. Consider the classical problem of eight queens and its extensions. In the classical variant it is necessary to place 8 queens on a usual 8 × 8 chessboard so that the queens do not beat each other. In other words, no two queens stand on the same vertical, horizontal, or diagonal. We will consider this problem on an N × N board with N queens. To solve this problem by our techniques we construct a tensor of dimension N , each k-th index i k of which denote the position of the queen on the k-th vertical, so 1 ≤ i k ≤ N . The value of the tensor is an indicator function of the desired state: 1 if the location of the queens satisfies the condition and zero otherwise. d e f f ( x , i ) : l a s t = i == N-1 i f ( x [ i ] ) > 0 or \ ( x [ i + N ] ) > 0 or \ ( x [ i + 2 * N ] ) > 0 : r e t u The derivative functions f i k (x) in Python for such a tensor are shown in Fig. 7a . The are the same for all k, middle-core is the last core, the above code also covers the function for the last core. Let us briefly describe their work. The input is not a single number, but an array (it is allowed in our scheme) of zeros and ones. The first N bits show the position of the previous (leftmost) queens horizontally. The bit corresponding to the current queen is added to them. The next N bits show those fields which are broken from bottom to top, and they are shifted forward. And finally, the last series of N bits shows those fields that are broken from top to bottom, and they are shifted back. If the position of the bit corresponding to the position of the current queen contains a one in at least one of the sets, the producing function returns None, since it means that the condition has already been violated. After constructing the tensor, we can find the desired position: this problem is reduced to finding a non-zero element in the tensor (in this case the value of 1), and its algorithm is described in the Appendix, Section D. The result is shown on Fig. 7b . Note that although the state that is passed to each derivative function is 3N in length and thus can potentially take 2 3N combinations, the real TT-rank is much smaller, since only a small fraction of this set of combinations is admissible. See Table 1 for the numerical values for the ranks together with the error in calculating the total number of combinations Table 1 : TT-ranks and the number of positions calculated using the tensor depending on the size of the boar N and the truncation threshold ϵ. Of course, solving the problem of finding permissible combinations in the described way is inefficient. But we can get the number of possible combinations even faster than finding one of them. This number is obtained by convolution of the tensor with vectors consisting of ones. Besides, by constructing cores of decomposition, we can solve extended problems. For example, the complement problem, when some of the queens are already placed on the board, and we need to find the location of the others. Or we can consider that queens are placed on the board randomly, with a given probability distribution law, and the problem is to find the probability that such an arrangement will result in the position given by the puzzle rules.

B.6 COMBINATORIAL PROBLEMS

Sawtooth sequence Sawtooth sequence is a sequence of integer numbers {a 1 , a 2 , . . . , a d } such that if a i-1 < a i then a i > a i+1 , and vice versa. Suppose we have a set of arrays {c j }, and for a given set of indices i 1 , i 2 , . . . , i d we want to construct an indicator tensor equal to 1 if the corresponding sequence {c 1 [i 1 ], c 2 [i 2 ], . . . , c d [i d ]} is sawtooth. In this example, the derivative functions receive as input a sequence of two elements, the first element contains the value of the previous member of the sequence, and the second denotes the direction, "up" or "down". f k i (x) =      {c k [i], "up ′′ }, if c k [i] < x[1] and x[2] = "down ′′ {c k [i], "down ′′ }, if c k [i] > x[1] and x[2] = "up ′′ None, else Middle function is the last one and it is of the same form. We need to slightly alter this definition for the first function for k = 1. Such sequences are often found in game problems. We can combine this indicator tensor with other conditions. For example, we can find the number of possible tooth sequences with certain conditions through convolution (12) of the resulting tensor with units. Number of subsets Consider Problem #10 from the Advanced problems chapter of the book (Andreescu & Feng, 2002) : Find the numbers of subsets of {1, . . . , 2000}, the sum of whose elements is divisible by 5. We can immediately construct an indicator tensor with binary indices, which equals 1 if the given subset (of indices with value 1) satisfies the condition of the problem. Namely, let us take the following derivative functions f k i (x) = (x + ik) mod 5, 1 ≤ k < d, i = 0, 1, and the middle function which is the last one in this example as f d i (x) = 1, if (x + ik) mod 5 = 0 0, else However, for the specified number of elements of the sequence (2000), it can be a time-consuming task to convolve such a tensor even in a sparse format. Therefore, let us try to solve this problem analytically, using the explicit representation of the cores of this tensor. Note that for the product of any five cores of this tensor, starting from the number that gives 1 when divided by 5, it is true    2 i5n+1=1 G 5n+1 (:, i 5n+1 , :)     2 i5n+2=1 G 5n+2 (:, i 5n+2 , :)     2 i5n+3=1 G 5n+3 (:, i 5n+3 , :)   × ×   2 i5n+4=1 G 5n+4 (:, i 5n+4 , :)     2 i5n+5=1 G 5n+5 (:, i 5n+5 , :)   =              = A        32 0 0 0 0 0 2 0 0 0 0 0 2 0 0 0 0 0 2 0 0 0 0 0 2        A T . By direct calculations we find that for the matrix A and any q , s ∈ R we have B := A        q 0 0 0 0 0 s 0 0 0 0 0 s 0 0 0 0 0 s 0 0 0 0 0 s        A T = 1 5        q + 4s q -s q -s q -s q -s q -s q + 4s q -s q -s q -s q -s q -s q + 4s q -s q -s q -s q -s q -s q + 4s q -s q -s q -s q -s q -s q + 4s        . Given that the first and last cores of the TT-disposition are vectors, not matrices, we need only the first element of the first row B[1, 1] = 1/5(q + 4s). Using the diagonal form of the matrix (13) we can immediately power it to degree 400 by power its eigenvalues to this degree and obtaining q = 32 400 and s = 2 400 . Thus, the final answer to the problem: 1/5(32 400 + 2 400 ) number of subsets.

B.7 SAT PROBLEM

Consider the standard Boolean satisfiability problem (SATisfiability) in conjunctive normal form (CNF-SAT). Given d Boolean variables {x i } d i=1 , taking the value "True" or "False". From these variables we form m logical expressions {A i } m i=1 containing a set of some variables {x i } or their negations (which we denote by the symbol ¬) united by logical OR (symbol ∨), for example: A 1 = x 1 ∨ ¬x 3 ∨ x 5 , A 2 = ¬x 1 ∨ ¬x 2 ∨ x 4 ∨ x 5 ∨ x 10 , etc. The problem is to determine if there is such a set of variables x that expressions A i , combined with logical AND (symbol ∧), give a logical "True": A 1 ∧ A 2 ∧ . . . ∧ A m = True. Let us use our method to construct a tensor with n indices taking the values 0 and 1, corresponding to logical "True" and "False", which is equal to 1 if the latter equality is satisfied, and zero otherwise. Each index corresponds to a different variable x i . In this problem, we construct m separate indicator tensors for each value A j , which we then multiply elementwise, which corresponds to the logical AND operator. The indices of these tensors are binary and correspond to the True or False value of the corresponding variable. The derivative functions for the j-th tensor are the following: f k True (x) =    x, if the variable x k is not part of the condition A j x, if the variable x k is included in the condition A j with the negation 1, else , f k False (x) =    x, if the variable x k is not part of the condition A j x, if the variable x k is included in the condition A j without the negation 1, else The function that corresponds to the last numbered variable (x 10 for A 2 in the above example) returns None if its argument x = 0 thus zeroing out the tensor: a zero value of x means that no member of the condition A j took the value True. In this problem, the final rank of the constructed tensor depends strongly on the order of the indices, although it should be noted that many SAT algorithms also have heuristics on the sequence of passes over the variables.

C RANK OPTIMIZATION

The algorithm presented in the main Theorem 2.1 does not always give optimal ranks. Below in this section we present additional steps that reduce the ranks. Let's now look more closely at the method of rank reduction briefly described in section 2.3. In this method, we combine different, but close, values of the derivative functions. Let the values of the derived functions be real and we are given the parameter ϵ of "smallness". We start by sorting all possible set values of each of the sets R[i], i = 1, . . . , d -1 in Algorithm 1 in ascending order. We then partition all the values into such non-overlapping sets whose difference between a larger and a smaller element does not exceed ϵ. Namely, we put S[i][1] = {x ∈ R[i] : R[i][1] ≤ x < R[i][1] + ϵ} and then sequentially define S[i][k] = x ∈ R[i] : min R[i] \ k-1 j=1 S[i][j] ≤ x < min R[i] \ k-1 j=1 S[i][j] + ϵ until all elements in the initial set R[i] are exhausted. If additionally a maximum rank is given and the number of elements in S[i] exceeds it, we combine the sets further, trying to do so as uniformly as possible. For each i we update R[i] after defining all sets S[i] at each step: R[i][k] ← average (S[i][k]), k = 1, . . . , |S[i]| , where the average function can be chosen in different ways, reasonable choices for it are the arithmetic mean and the average between the maximum and minimum elements. Finally, the change we make in the algorithm is to replace the index of function with the following set index of function z = set index of(y, A) ⇐⇒ y ∈ A[z]. As the second argument, we pass the sets S to this function instead of R. Technically, we use the searchsorted function from the numpy package, and work with interval bounds rather than looking in sets as stated in the definition.

C.1 DECREASE THE NUMBER OF OUTPUTS

Assume that the middle-index is the last index: l = d. This avoids duplicating operations. If the number of possibles vales of the tensor is small, i. e. the length of the array R[d] is small, we can perform the following trick. Consider the images (f 

C.2 ROUNDING

The cores that are obtained using Algorithm 2 have one important property-they are ,,almost" orthogonal in the sense that i α G k (α, i, l)G k (α, i, m) = λ l δ lm , k = 1, . . . , d -1. ( ) This is true, because matrices Q k (i) = G k (:, i, :) consist of rows either identically equal to the zero vector or the basis vector e(j) T for some j. The values of the coefficients λ l have the physical meaning of the number of occurrences of the value l as the value of the function f (k) . This number is equal to the number of occurrences of row vector e(l) T in all matrices Q k (i). Thus λ l > 0. The consequence of relation ( 15) is the following theorem which is a modified Lemma 3.1 from (Oseledets, 2011) : Theorem C.1. Let Q k (i) = G k (:, i, :), Q k (i) ∈ R r k-1 ×r k , r 0 = 1, be indexed matrices with tensors G k satisfying (15). Let the matrix Z be defined as follows Z(i 1 i 2 . . . i k , l) := Q 1 (i 1 )Q 2 (i 2 ) • • • Q k (i k ) = = α1, ..., α k-1 G 1 (1, i 1 , α 1 )G 2 (α 1 , i 2 , α 2 ) • • • G k (α k-1 , i k , l). Then the matrix Z satisfy the following orthogonality condition (Z T Z)(l, m) = i1i2...i k Z(i 1 i 2 . . . i k , l)Z(i 1 i 2 . . . i k , m) = Λ l δ lm with natural Λ l ∈ N. Thus, when rounding the tensor with the algorithm described in (Oseledets, 2011) , we can start with the second step of this algorithm, skipping the orthogonalization step. In the case of setting the accuracy ϵ the threshold for discarding the singular numbers must take into account the values Λ l . In the case where we need an exact representation of the given tensor in TT-format, but with the smallest possible ranks, the following consequence helps us. Consequence C.1. The cores, obtained by Algorithm 2, have the optimal ranks if the unfolding of the last core is of full-column rank.

D FINDING NON-ZERO ELEMENT IN A TT-TENSOR

Consider the case when the TT-tensor is an indicator tensor of some subset of its index values, i. e. it is equal to one (or an arbitrary values greater than zero) on a small number of combinations of its index values and to zero on the other combinations. The problem is to find at least one combination where the tensor value is greater than zero. The solution to this problem is based on the algorithmic simplicity of multidimensional summation of the tensor defined in the TT format. Let I be the tensor of interest. Consider the sum of values of I on all variables except the first one v 1 (i) = n2 i2=1 • • • n d i d =1 I(i, i 2 , . . . , i d ). Knowing v 1 , we can find out the value of î1 of the first index of the desired combination: î1 = arg max i v 1 (i) > 0. Indeed, if î1 is not a part of any desired combination, then for any values of other indices the value of the tensor is zero: I(î 1 , i 2 , . . . , i d ) = 0. But this contradicts the fact that the value of the sum over these variables is greater than zero. Then we sequentially find the indices of the desired combination. For the second index:  v 2 (i) =



By analytic dependence we mean the known symbolic formula for the tensor value, not the definition of the term within complex analysis. https://github.com/G-Ryzhakov/Constructive-TT column of the identity matrix of appropriate size. We use the same setup as described in the cited paper for the experiments, and take code from the open source repository https://github.com/rballester/ttgames/ of the author of this paper.



Figure 1: The solution of 10-queen problem, obtained by our algorithm.

Figure 3: Times in seconds and relative accuracy as functions of number of players for four cooperative games. Brute force-calculating the sum (8) directly, Cross-results from the paper (Ballester-Ripoll, 2022).

Figure4: Tensor network for two problems, d = 6. The numbers denote ranks (number of terms in sum). The dashed line shows that at this point the rank is 1, which means that there is no summation and the corresponding cores are independent.

of the domain of the function f (k) on first variable for k = 1, . . . , d (dimensions of the resulting tensor) 3: # Part I. Finding the function outputs R of each function. R is a list of arrays. 4: R[0] ← {0}, R[d] ← {0} 5: for i = 1 to l -1 do 6:

From the right 25: for i = d to l + 1 step -1 do 26:

45: end for 46: Return functions { f (i) j }, (optionally) outputs R. Algorithm 2 Explicit construction of the cores of the functional tensor Require: Middle-index l, set of integer-valued functions { f (i) j } of one variable (function { f (l) j } have 2 arguments) Ensure: Cores G 1 , . . . , G d of TT-decomposition of the functional tensor 1: # Initialization 2: n k = maximum of the index j value of the function f (k) j for k = 1, . . . , d 3: # From head 4: for i = 1 to l -1 do 5:

i f t b o t t o m -t o -t o p s e t x [N: 2 * N-1 ] = x [N+ 1 : 2 * N] x [ 2 * N-1 ] = 0 # s h i f t t o p -t o -b o t t o m s e t x [2 * N+ 1 : 3 * N] = x [ 2 * N: 3 * N-1 Positions for N = 9 and 10.

Figure 7: Eight queens puzzle

i d ) -1 (a) := {x : f (d) i d (x) = a}, i d = 1, . . . , n d .The idea of reducing the rank of the TT-decomposition is that the values from this set are indistinguishable for the function f (d) i d . Thus, if there exists a set indistinguishable for all set of the last derivative functions, then its elements can be encoded by a single basis vector. Namely, consider the following array (ordered set) of setsS[d -1] := s = i ) -1 (a i ) : {a 1 , a 2 , . . . , a n d } ∈ (R[d]) ×n d , s ̸ = ∅ .The array S[d -1] contains all nonempty sets, that are indistinguishable for functions f(d) ifor any value of i = 1, . . . , n d . The order in which these sets are included in S[d -1] is unimportant, but it is fixed. The number of all this sets is not greater than the number of outputs in R[d -1]: len(S[d -1]) ≤ len(R[d -1]) as each element of R[d -1] belongs exactly to one set in S[d -1].Due to this, a decrease in ranks is achieved. This procedure is repeated sequentially for each output. At the k th step we find an array of setsS[k -1] := s = i ) -1 (A i ) : {A 1 , A 2 , . . . , A n k } ∈ (S[k]) ×n d , s ̸ = ∅ ,where {A i } are sets and we define the image on a set as(f (k) i k ) -1 (A) := {x : f (k) i k (x) ∈ A}.After all sets S[k] (k = 1, . . . , d -1) are found, we construct integer-valued functions by analogy with the functions (5)f (k) i (x) = set index of f (k) i x , S[i] , x ∈ S[k -1][x], q = 1, . . . , d.(14)We let S[d] = {{x} : x ∈ R[d]}.The definition (14) is correct because the value of the functions f(i)  does not depend on the choice of a particular element x in the set S[i -1][x].

1 , i, i 3 , . . . , i d ), î2 = arg max i v 2 (i) ans so on. This sequence of steps is summarized in Algorithm 4. Algorithm 4 Algorithm for calculating indices set of the non-zero value of a TT-tensor Require: Cores {G i } d i=1 of the TT-decomposition of the tensor I that takes non-negative values Ensure: Set of indices {î 1 , î2 , . . . , îd } s.t. I(î 1 , î2 , . . . , îd ) > 0 1: for k = 1 to d do for 4: for k = 1 to d do 5:v k (i) ← G 1 (1, î1 , :)G 2 (:, î2 , :) • • • G k (:, îk-1 , :)G k (:, i, :)G k+1 • • • G d 6: îk ← arg max i v k (i)7: end for 8: Return {î 1 , î2 , . . . , îd }

ACKNOWLEDGEMENTS

The work was supported by the Analytical center under the RF Government (subsidy agreement 000000D730321P5Q0002, Grant No. 70-2021-00145 02.11.2021).

