2805 Bowers Ave, Santa Clara, CA 95051 | 408-730-2275
research@colfax-intl.com

GPU Mode: CUTLASS and FlashAttention-3

In this GPU Mode lecture, Jay Shah presents his joint work on FlashAttention-3 and how to implement the main compute loop in the algorithm using CUTLASS.

The code discussed in this lecture can be found at this commit in the FlashAttention-3 codebase.

cutlass-flashattn3-slides

Note: Slides adapted from a talk given by Tri Dao.

Discover more from Colfax Research

Subscribe to get the latest posts sent to your email.

Posted

November 18, 2024

in

Deep Learning, Video