Skip to content

Instantly share code, notes, and snippets.

@tspeterkim
tspeterkim / spmm.cu
Created November 8, 2025 04:05
Faster than cuSPARSE CSR SpMM Kernel
#include <cuda_runtime.h>
#include <cusparse.h>
#include <iostream>
#include <vector>
#include <random>
#define NNZ_PER_ROW 40 // static workload assumption: M=N=K=4096, uniform sparsity=0.01 -> nnzPerRow=40
#define CEIL_DIV(a, b) (((a) + (b) - 1) / (b))
#define IS_CLOSE(a, b) (abs(a - b) < 1e-5 && abs(a - b) / (abs(a) + abs(b) + 1e-5) < 1e-5)