2805 Bowers Ave, Santa Clara, CA 95051 | 408-730-2275
research@colfax-intl.com

Author: Colfax Research

  • Tutorial: Python bindings for CUDA libraries in PyTorch

    Tutorial: Python bindings for CUDA libraries in PyTorch

    PyTorch today is one of the most popular AI frameworks. Developed by Meta (then Facebook) and open-sourced in 2017, it features approachable, “pythonic” interfaces. This ease-of-use makes it especially potent for research and development, where a researcher might need to go through multiple iterations of novel AI workloads that they are developing. However, developing in… Go to article…

  • Delivering 1 PFLOP/s of Performance with FP8 FlashAttention-2

    Delivering 1 PFLOP/s of Performance with FP8 FlashAttention-2

    We recently released an update to our FlashAttention-2 forward pass implementation on NVIDIA Hopper™ architecture that incorporates a number of new optimizations and improvements, including a software pipelining scheme and FP8 support. In this article, we will explain a challenge with achieving layout conformance of register fragments for WGMMA instructions that we encountered in the… Go to article…