Repeated theme: use lighter "augmentation" of attention (e.g. clustering, dotprod, lsh lookup, etc) to vet only highly relevant tokens for expensive attention.
Repeated theme: use lighter "augmentation" of attention (e.g. clustering, dotprod, lsh lookup, etc) to vet only highly relevant tokens for expensive attention.