t-NN framework- a NN framework with multidimensionaltensordata based on thet-product(multiply 2 tensors with circulant convolution)
- quicker learning because of reduced parameter space
- improved generalizability of stable t-NNs
- reduced computation cost, memory footprint, friendlier to distributed computation
- extract more meaningful info when relevant data is limited
- potential to extract multidimensional correlations, given that data is not vectorized
- Let A be a real-valued
l x m x ntensor - frontal slices
are l x mmatrices fork = 1, . . . , n - lateral slices
are l x nmatrices forj = 1, . . . , m - tubes
a_ijaren x 1vectors fori = 1, . . . , landj = 1, . . . , n
- fully connected layers use parameters inefficiently
Suppose we have m samples of two-dimensional data of size n x n. We can vectorize these samples and store them as columns of a matrix A of size n^2 x m or orient these samples as lateral (from the side) slices stored in a tensor A of size n x m x n:

This efficient parameterization can be even more substantial for higher-dimensional data.
- for the same number of parameters, the tensor weights can capture the same features as the matrix weights and additional features from applying circulant shifts of the frontal slices.
- we are able to extract more features for the same number of learnable parameters
tubesare the scalars of our tensor space (e.g., tubes commute under the t-product)
- Tensors act on
lateral slices, thus we consider lateral slices as analogous to vectors, so data is stored as lateral slices in our framework
Given
, bcirc(A) is a ln x mn block circulant matrix of frontal slices:

- the first
block-columnof bcirc(A)
- takes all the frontal slices of A and merges them together to make A
fold(unfold(A)) = A
- algebraic formulation to multiply tensors via circulant convolution
Suppose
. Then,
is the transpose of each frontal slice with slices 2 through n reversed.
The transpose is taken so that:

You can think of the transpose as performing the following frontal slice mapping:

The identity tensor
is a tensor whose first frontal slice is the m x m identity matrix and the remaining slices are zero
- bcirc(I) is an
mn x mnidentity matrix - An identity tube e_1 is the first standard basis vector oriented along the third dimension
- A circulant matrix has one vector, c, in the first column of C. The remaining columns of C are each cyclic permutations of the vector c with offset equal to the column index. The last row of C is the vector c in reverse order, and the remaining rows are each cyclic permutations of the last row.
- eigenvectors of a NxN
circulant matrixare the Discrete Fourier Transform (DFT) sinusoids for a length N DFTDFT- computes a discrete frequency spectrum from a discrete-time signal of finite length
- very efficient to compute using the
FFTalgorithm and thecircular convolution theorem - commutative
- circular convolution is taken beween 2 periodic sequences of period N and compute over a single period for n=0 to N-1
- linear convolution is computed over all relevant values of n from -infinity to infinity
A tensor is often thought of as a generalized matrix. That is, it could be a 1-D matrix (a vector is actually such a tensor), a 3-D matrix (something like a cube of numbers), even a 0-D matrix (a single number), or a higher dimensional structure that is harder to visualize. The dimension of the tensor is called its rank. (source)
If one transforms other entities in the tensor in a regular way, then the tensor must obey a related transformation rule.
| Matrix | Tensor |
|---|---|
| 2-dim table to organize information | n-dim table (a generalized matrix) |
5x5matrix <= tensor of rank25x5x5matrix <= tensor of rank3
Not every matrix can be represented as a rank 2 tensor, but any rank 2 tensor can be represented as a matrix.
















