Meta GPU Techniques Team

                     Next Generation GPU Enablement (Blackwell, MI350)
                     High-performance Custom GPU Kernels
                     Advanced Algortihms for Large Language Models
                     Compiler & Ecosystem Advancements (Triton, GEMM Tuning)

Our work was recognized over $100 million infra annual saving, top-tier publications, & open-source contributions.

Efficient Speculative Decoding for Llama at Scale: Challenges and Solutions, Arxiv, Aug 11, 2025
SpinQuant: LLM quantization with learned rotations, ICLR, 2025
HadaCore: Tensor Core Accelerated Hadamard Transform Kernel, PyTorch Blog, Dec 12, 2024
Context Parallelism for Scalable Million-Token Inference, MLSys 2025, Nov 10, 2024
Enhancing Performance and Scalability of Large-Scale Recommendation Systems with Jagged Flash Attention, RecSys 2024, Sep 19, 2024
The Llama 3 Herd of Models, July 31, 2024
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision, NIPS 2024 Spotlight Poster, July 12, 2024
INT4 Decoding GQA CUDA Optimizations for LLM Inference, PyTorch Blog, June 6, 2024
Flash-Decoding for long-context inference, Stanford Blog, Oct 13, 2023
Accelerated Generative Diffusion Models with PyTorch 2, PyTorch Blog, April 14, 2023
Faster, more flexible inference on GPUs using AITemplate, a revolutionary new inference engine, Meta Research Blog, Oct 3, 2022

We contributed to open-source projects, including but not limited to:

tonycwliu/Meta_GPU_Techniques_Team.md

Select an option

No results found

Select an option

No results found

Meta GPU Techniques Team