Acronyms and Notation Tables

Table 1: Definitions of acronyms used in HTSR

Acronym	Description
DNN	Deep Neural Network
ML	Machine Learning
SGD	Stochastic Gradient Descent
RMT	Random Matrix Theory
MP	Marchenko Pastur
ESD	Empirical Spectral Density
PL	Power Law
HT	Heavy-Tailed
TW	Tracy Widom (Law)
SVD	Singular Value Decomposition
FC	Fully Connected (Layer)
VC	Vapnik Chrevoniks (Theory)
SMTOG	Statistical Mechanics Theory of Generalization

Table 2: Definitions of notation used in HTSR

Notation	Description
$$\mathbf{W}$$	DNN layer weight matrix of size $$N \times M$$, with $$N \geq M$$
$$\mathbf{W}_l$$	DNN layer weight matrix for $$l^{th}$$ layer
$$\mathbf{W}_l^e$$	DNN layer weight matrix for $$l^{th}$$ layer at $$e^{th}$$ epoch
$$\mathbf{W}^{rand}$$	random rectangular matrix, elements from truncated Normal distribution
$$\mathbf{W}(\mu)$$	random rectangular matrix, elements from Pareto distribution
$$\mathbf{X} = (1/N)\mathbf{W}^T\mathbf{W}$$	normalized correlation matrix for layer weight matrix $$\mathbf{W}$$
$$Q = N/M > 0$$	aspect ratio of $$\mathbf{W}$$
$$\nu$$	singular value of $$\mathbf{W}$$
$$\lambda$$	eigenvalue of $$\mathbf{X}$$
$$\lambda_{max}$$	maximum eigenvalue in an ESD
$$\lambda^+$$	eigenvalue at edge of MP Bulk
$$\lambda_k$$	eigenvalue lying outside MP Bulk, $$\lambda^+ < \lambda_k \leq \lambda_{max}$$
$$\rho_{emp}(\lambda)$$	actual ESD, from some $$\mathbf{W}$$ matrix
$$\rho(\lambda)$$	theoretical ESD, infinite limit
$$\rho_N(\lambda)$$	theoretical ESD, finite $$N$$ size
$$\rho(\nu)$$	theoretical empirical density of singular values, infinite limit
$$\sigma^2_{mp}$$	elementwise variance of $$\mathbf{W}$$, used to define MP distribution
$$\sigma^2_{shuf}$$	elementwise variance of $$\mathbf{W}$$, as measured after random shuffling
$$\sigma^2_{bulk}$$	elementwise variance of $$\mathbf{W}$$, after removing/ignoring all spikes $$\lambda_k > \lambda^+$$
$$\sigma^2_{emp}$$	elementwise variance of $$\mathbf{W}$$, determined empirically
$$\mathcal{R}(\mathbf{W})$$	Hard Rank, number of non-zero singular values, Eqn. (5)
$$\mathcal{S}(\mathbf{W})$$	Matrix Entropy, as defined on $$\mathbf{W}$$, Eqn. (6)
$$\mathcal{R}_s(\mathbf{W})$$	Stable Rank, measures decay of singular values, Eqn. (7)
$$\mathcal{R}_{mp}(\mathbf{W})$$	MP Soft Rank, applied after and depends on MP fit, Eqn. (11)
$$\mathcal{S}(\mathbf{v})$$	Vector Entropy, as defined on vector $$\mathbf{v}$$
$$\mathcal{L}(\mathbf{v})$$	Localization Ratio, as defined on vector $$\mathbf{v}$$
$$\mathcal{P}(\mathbf{v})$$	Participation Ratio, as defined on vector $$\mathbf{v}$$
$$p(x) \sim x^{-1-\mu}$$	Pareto distribution, parameterized by $$\mu$$
$$p(x) \sim x^{-\alpha}$$	Pareto distribution, parameterized by $$\alpha$$
$$\rho(\lambda) \sim \lambda^{-(\mu/2+1)}$$	theoretical relation, for ESD of $$\mathbf{W}(\mu)$$, between $$\alpha$$ and $$\mu$$ (for $$0 < \mu < 4$$)
$$\rho_N(\lambda) \sim \lambda^{-(\mu+b)}$$	empirical relation, for ESD of $$\mathbf{W}(\mu)$$, between $$\alpha$$ and $$\mu$$ (for $$2 < \mu < 4$$)
$$\Delta\lambda = \|\|\lambda - \lambda^+\|\|$$	empirical uncertainty, due to finite-size effects, in theoretical MP bulk edge
$$\Delta$$	model of perturbations and/or strong correlations in $$\mathbf{W}$$

Martin, C.H., Peng, T.(. & Mahoney, M.W. Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data. Nat Commun 12, 4122 (2021). https://doi.org/10.1038/s41467-021-24025-8

Paper: https://www.nature.com/articles/s41467-021-24025-8

Code: https://github.com/CalculatedContent/WeightWatcher

DSamuelHodge/HTSR_Theory_Notion_Tables.md

Select an option

No results found

Select an option

No results found

Acronyms and Notation Tables

Table 1: Definitions of acronyms used in HTSR

Table 2: Definitions of notation used in HTSR