Stable Tensor Neural Networks for Rapid Deep Learning

Paper source

Summary

t-NN framework - a NN framework with multidimensional tensor data based on the t-product (multiply 2 tensors with circulant convolution)

Pros of using t-NNs

quicker learning because of reduced parameter space
improved generalizability of stable t-NNs
reduced computation cost, memory footprint, friendlier to distributed computation
extract more meaningful info when relevant data is limited
potential to extract multidimensional correlations, given that data is not vectorized

3rd Order Tensors Background

Let A be a real-valued l x m x n tensor
frontal slices are l x m matrices for k = 1, . . . , n
lateral slices are l x n matrices for j = 1, . . . , m
tubes a_ij are n x 1 vectors for i = 1, . . . , l and j = 1, . . . , n

How parameter space is reduced

Before

fully connected layers use parameters inefficiently

t-NN

Suppose we have m samples of two-dimensional data of size n x n. We can vectorize these samples and store them as columns of a matrix A of size n^2 x m or orient these samples as lateral (from the side) slices stored in a tensor A of size n x m x n:

This efficient parameterization can be even more substantial for higher-dimensional data.

Improved Featurization

for the same number of parameters, the tensor weights can capture the same features as the matrix weights and additional features from applying circulant shifts of the frontal slices.
we are able to extract more features for the same number of learnable parameters

Tensor Algebra

tubes are the scalars of our tensor space (e.g., tubes commute under the t-product)

Tensors act on lateral slices, thus we consider lateral slices as analogous to vectors, so data is stored as lateral slices in our framework

bcirc

Given , bcirc(A) is a ln x mn block circulant matrix of frontal slices:

unfold

the first block-column of bcirc(A)

fold

takes all the frontal slices of A and merges them together to make A
fold(unfold(A)) = A

t-product

algebraic formulation to multiply tensors via circulant convolution

Given and ,

for a frontal slice:

transpose

Suppose . Then, is the transpose of each frontal slice with slices 2 through n reversed.

The transpose is taken so that:

You can think of the transpose as performing the following frontal slice mapping:

identity

The identity tensor is a tensor whose first frontal slice is the m x m identity matrix and the remaining slices are zero

bcirc(I) is an mn x mn identity matrix
An identity tube e_1 is the first standard basis vector oriented along the third dimension

forward propagation

Definitions

circulant matrix

A circulant matrix has one vector, c, in the first column of C. The remaining columns of C are each cyclic permutations of the vector c with offset equal to the column index. The last row of C is the vector c in reverse order, and the remaining rows are each cyclic permutations of the last row.

properties of circulant matrices

eigenvectors of a NxN circulant matrix are the Discrete Fourier Transform (DFT) sinusoids for a length N DFT
- DFT - computes a discrete frequency spectrum from a discrete-time signal of finite length

circulant convolution

very efficient to compute using the FFT algorithm and the circular convolution theorem
commutative

linear convolution vs circular convolution

circular convolution is taken beween 2 periodic sequences of period N and compute over a single period for n=0 to N-1
linear convolution is computed over all relevant values of n from -infinity to infinity

tensor

A tensor is often thought of as a generalized matrix. That is, it could be a 1-D matrix (a vector is actually such a tensor), a 3-D matrix (something like a cube of numbers), even a 0-D matrix (a single number), or a higher dimensional structure that is harder to visualize. The dimension of the tensor is called its rank. (source)

If one transforms other entities in the tensor in a regular way, then the tensor must obey a related transformation rule.

matrix vs tensor

Matrix	Tensor
2-dim table to organize information	n-dim table (a generalized matrix)

5x5 matrix <= tensor of rank 2
5x5x5 matrix <= tensor of rank 3

Not every matrix can be represented as a rank 2 tensor, but any rank 2 tensor can be represented as a matrix.

Smarker/Stable Tensor Neural Networks for Rapid Deep Learning.md

Select an option

No results found