2805 Bowers Ave, Santa Clara, CA 95051 | 408-730-2275
research@colfax-intl.com

Author: Colfax Research

  • Installing Ubuntu 22.04 LTS over the Network on Servers with the NVIDIA® Grace Hopper™ Superchip

    Installing Ubuntu 22.04 LTS over the Network on Servers with the NVIDIA® Grace Hopper™ Superchip

    Grace™, NVIDIA’s first datacenter CPU, is a new choice of platform available for datacenter, CPU and HPC applications. The common property of these new NVIDIA Superchips is the Arm® architecture. This post reports on our experience provisioning the Ubuntu 22.04 LTS operating system (OS) on servers based on the NVIDIA Grace Hopper Superchip over the… Go to article…

  • Tutorial: Python bindings for CUDA libraries in PyTorch

    Tutorial: Python bindings for CUDA libraries in PyTorch

    PyTorch today is one of the most popular AI frameworks. Developed by Meta (then Facebook) and open-sourced in 2017, it features approachable, “pythonic” interfaces. This ease-of-use makes it especially potent for research and development, where a researcher might need to go through multiple iterations of novel AI workloads that they are developing. However, developing in… Go to article…

  • Delivering 1 PFLOP/s of Performance with FP8 FlashAttention-2

    Delivering 1 PFLOP/s of Performance with FP8 FlashAttention-2

    We recently released an update to our FlashAttention-2 forward pass implementation on NVIDIA Hopper™ architecture that incorporates a number of new optimizations and improvements, including a software pipelining scheme and FP8 support. In this article, we will explain a challenge with achieving layout conformance of register fragments for WGMMA instructions that we encountered in the… Go to article…