Skip to content

Heterogeneous Computing

CUDA Programming Model

  • Grid, Block, Thread hierarchy
  • Shared Memory vs Global Memory optimization

Operator Development

  • Introduction to Triton
  • Custom C++ Operator binding

Hardware Acceleration

  • Tensor Core principles
  • Mixed Precision (FP16/BF16)

AI-HPC Organization