This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Suppose you have neural network that | |
| # x_l = a_l * W_l x_{l-1}, W_l_{i,j} ~ N(0, b_l^2), Learning rate of W_l := c_l, | |
| # If you are using adam, you can | |
| # a_l <- a_l * A , b_l <- b_l / A, c_l <- c_l / A | |
| # and it will have exactly identical training dynamics as before. | |
| # This is known as ABC (ABCD) redundancy. For more general case: https://arxiv.org/abs/2308.01814 | |
| # Let me show you what I mean: | |
| import torch |