Skip to content

Instantly share code, notes, and snippets.

View allanmac's full-sized avatar

Allan MacKinnon allanmac

  • Dispatch3 Inc.
  • South Florida, USA
  • 21:08 (UTC -05:00)
  • X @pixelio
View GitHub Profile
@allanmac
allanmac / vulkaninfo_S905X5M_Mali_G310.txt
Created October 19, 2025 16:36
vulkaninfo ODROID C5 S905X5M Mali G310
==========
VULKANINFO
==========
Vulkan Instance Version: 1.4.328
Instance Extensions: count = 14
===============================
VK_EXT_debug_report : extension revision 10
@allanmac
allanmac / vkpeak_S905X5M_Mali_G310.txt
Created October 19, 2025 16:34
vkpeak ODROID C5 S905X5M Mali G310
device = Mali-G310
fp32-scalar = 52.58 GFLOPS
fp32-vec4 = 44.02 GFLOPS
fp16-scalar = 52.00 GFLOPS
fp16-vec4 = 99.07 GFLOPS
fp16-matrix = 0.00 GFLOPS
fp64-scalar = 0.00 GFLOPS
@allanmac
allanmac / vulkaninfo_Mali_G610.txt
Last active September 9, 2025 01:43
Radxa Rock 5B (Rockchip 3588) Vulkan vulkaninfo
Captured on a Weston Wayland compositor.
---------------------------
vulkanCapsViewer data here: https://vulkan.gpuinfo.org/displayreport.php?id=41734
---------------------------
$ vulkaninfo
arm_release_ver: g24p0-00eac0, rk_so_ver: 10
@allanmac
allanmac / vkpeak_RK3588_Mali_G610.txt
Last active September 8, 2025 23:46
Radxa Rock 5B (Rockchip 3588) Vulkan vkpeak benchmark
$ ./vkpeak 0
arm_release_ver: g24p0-00eac0, rk_so_ver: 10
device = Mali-G610
arm_release_ver: g24p0-00eac0, rk_so_ver: 10
fp32-scalar = 467.75 GFLOPS
fp32-vec4 = 496.48 GFLOPS
fp16-scalar = 470.50 GFLOPS
fp16-vec4 = 977.37 GFLOPS
@allanmac
allanmac / cub_sort.cu
Last active June 17, 2022 17:29
Benchmark CUB Radix Sort with uniformly random data
//
// Build:
//
// nvcc -lcurand --generate-code arch=compute_50,code=compute_50 --generate-code arch=compute_75,code=compute_75 -D CUB_SORT_TYPE=uint32_t -o sort_cub_32 cub_sort.cu
// nvcc -lcurand --generate-code arch=compute_50,code=compute_50 --generate-code arch=compute_75,code=compute_75 -D CUB_SORT_TYPE=uint64_t -o sort_cub_64 cub_sort.cu
//
#define THRUST_IGNORE_CUB_VERSION_CHECK
#include <curand.h>
@allanmac
allanmac / sort.cu
Last active August 15, 2018 16:54
CUB Radix Sort benchmark
// -*- compile-command: "nvcc -I ../cub-1.8.0 -lcurand -arch sm_50 -o sort sort.cu"; -*-
#include <curand.h>
#include <cub/cub.cuh>
//
//
//
#include <stdbool.h>
@allanmac
allanmac / warp_scan.cu
Created August 6, 2016 18:48
Inclusive vs. exclusive warp scan
#include <stdio.h>
#include <stdint.h>
#define WARP_SIZE 32
//
//
//
@allanmac
allanmac / README.md
Last active June 10, 2023 11:11
Macros for neatly error checking OpenCL API functions.

Simply adding two parentheses cl(...) gives you error checking for OpenCL API functions that return a cl_int error code.

The second cl_ok(err) macro is for error checking API functions that initialize their error code as an argument.

The header also includes a useful function for converting OpenCL errors to strings:

char const * clGetErrorString(cl_int const err);
// -*- compile-command: "nvcc -arch sm_50 -Xptxas=-v -use_fast_math unit16v2.cu -o unit16v2"; -*-
#include <stdio.h>
#include <stdint.h>
//
//
//
#define WARP_SIZE 32
@allanmac
allanmac / ck_2.cu
Last active July 13, 2025 20:24
Concurrent kernel test that demonstrates _different_ kernels running concurrently. Hacked from NVIDIA's example. ck_2.cu has two kernels each requiring half of an sm_50 multiprocessor's shared memory. Kernel "a" is run on 5 out of 6 launches, otherwise kernel "b" is launched. ck_6.cu has six kernels.
/*
* Copyright 1993-2015 NVIDIA Corporation. All rights reserved.
*
* Please refer to the NVIDIA end user license agreement (EULA) associated
* with this source code for terms and conditions that govern your use of
* this software. Any use, reproduction, disclosure, or distribution of
* this software and related documentation outside the terms of the EULA
* is strictly prohibited.
*
*/