| $$\mathbf{W}$$ |
DNN layer weight matrix of size $$N \times M$$, with $$N \geq M$$
|
| $$\mathbf{W}_l$$ |
DNN layer weight matrix for $$l^{th}$$ layer |
| $$\mathbf{W}_l^e$$ |
DNN layer weight matrix for $$l^{th}$$ layer at $$e^{th}$$ epoch |
| $$\mathbf{W}^{rand}$$ |
random rectangular matrix, elements from truncated Normal distribution |
| $$\mathbf{W}(\mu)$$ |
random rectangular matrix, elements from Pareto distribution |
| $$\mathbf{X} = (1/N)\mathbf{W}^T\mathbf{W}$$ |
normalized correlation matrix for layer weight matrix $$\mathbf{W}$$
|
| $$Q = N/M > 0$$ |
aspect ratio of $$\mathbf{W}$$
|
| $$\nu$$ |
singular value of $$\mathbf{W}$$
|
| $$\lambda$$ |
eigenvalue of $$\mathbf{X}$$
|
| $$\lambda_{max}$$ |
maximum eigenvalue in an ESD |
| $$\lambda^+$$ |
eigenvalue at edge of MP Bulk |
| $$\lambda_k$$ |
eigenvalue lying outside MP Bulk, $$\lambda^+ < \lambda_k \leq \lambda_{max}$$
|
| $$\rho_{emp}(\lambda)$$ |
actual ESD, from some $$\mathbf{W}$$ matrix |
| $$\rho(\lambda)$$ |
theoretical ESD, infinite limit |
| $$\rho_N(\lambda)$$ |
theoretical ESD, finite $$N$$ size |
| $$\rho(\nu)$$ |
theoretical empirical density of singular values, infinite limit |
| $$\sigma^2_{mp}$$ |
elementwise variance of $$\mathbf{W}$$, used to define MP distribution |
| $$\sigma^2_{shuf}$$ |
elementwise variance of $$\mathbf{W}$$, as measured after random shuffling |
| $$\sigma^2_{bulk}$$ |
elementwise variance of $$\mathbf{W}$$, after removing/ignoring all spikes $$\lambda_k > \lambda^+$$
|
| $$\sigma^2_{emp}$$ |
elementwise variance of $$\mathbf{W}$$, determined empirically |
| $$\mathcal{R}(\mathbf{W})$$ |
Hard Rank, number of non-zero singular values, Eqn. (5) |
| $$\mathcal{S}(\mathbf{W})$$ |
Matrix Entropy, as defined on $$\mathbf{W}$$, Eqn. (6) |
| $$\mathcal{R}_s(\mathbf{W})$$ |
Stable Rank, measures decay of singular values, Eqn. (7) |
| $$\mathcal{R}_{mp}(\mathbf{W})$$ |
MP Soft Rank, applied after and depends on MP fit, Eqn. (11) |
| $$\mathcal{S}(\mathbf{v})$$ |
Vector Entropy, as defined on vector $$\mathbf{v}$$
|
| $$\mathcal{L}(\mathbf{v})$$ |
Localization Ratio, as defined on vector $$\mathbf{v}$$
|
| $$\mathcal{P}(\mathbf{v})$$ |
Participation Ratio, as defined on vector $$\mathbf{v}$$
|
| $$p(x) \sim x^{-1-\mu}$$ |
Pareto distribution, parameterized by $$\mu$$
|
| $$p(x) \sim x^{-\alpha}$$ |
Pareto distribution, parameterized by $$\alpha$$
|
| $$\rho(\lambda) \sim \lambda^{-(\mu/2+1)}$$ |
theoretical relation, for ESD of $$\mathbf{W}(\mu)$$, between $$\alpha$$ and $$\mu$$ (for $$0 < \mu < 4$$) |
| $$\rho_N(\lambda) \sim \lambda^{-(\mu+b)}$$ |
empirical relation, for ESD of $$\mathbf{W}(\mu)$$, between $$\alpha$$ and $$\mu$$ (for $$2 < \mu < 4$$) |
| $$\Delta\lambda = ||\lambda - \lambda^+||$$ |
empirical uncertainty, due to finite-size effects, in theoretical MP bulk edge |
| $$\Delta$$ |
model of perturbations and/or strong correlations in $$\mathbf{W}$$
|