Colfax, an NVIDIA Elite partner across several competencies, is a thought leader in compute performance optimization and developer enablement. Our mission is to empower developers to write highly performant CUDA kernels to take maximal advantage of current and next-generation NVIDIA GPUs.
To this end, our current activities include research, training and consulting on CUDA kernel optimization with a strong focus on bespoke CUTLASS kernels and AI-specific workloads. We offer the following paid services:
- A two or three-day CUTLASS training course, delivered remote or in-person by one of our research team members.
- Custom CUDA/CUTLASS kernel development and optimization (“consulting”).
For example, we have worked with Character.AI and Augment Code to integrate FlashAttention-3 into their production environments and build out custom features of the attention kernel needed for their specific use cases — see this blog post by Character.AI and this blog post by Augment Code.
Looking ahead, we’re currently writing a book on CUTLASS and advanced GPU programming that consolidates and expands on our existing tutorials.
If you would like to work with us, please contact us at services@colfax-intl.com.