Author: Reuben Stern
-
A User’s Guide to FlexAttention in FlashAttention CuTe DSL
Many variants of attention (Vaswani et al., 2017) have become popular in recent years, for reasons related to performance and model quality. These include: The PyTorch team at Meta recognized that most of these variants (including all of the above) can be unified under one elegant framework, dubbed FlexAttention (Guessous et al., 2024). This simple Go to article…

