Skip to content

Instantly share code, notes, and snippets.

auto-kernel-dev

Autonomous CUDA kernel optimization for the fused 1024x2 dual persistent WaveGRU generation kernel.

Goal

Maximize throughput_sps (samples per second) while maintaining correctness: pass under the full verification path.

Edit target

@NTT123
NTT123 / CCH_ALGORITHM.md
Last active February 23, 2026 20:38
Claude Code Billing Header

Claude Code Billing Header

Purpose

The x-anthropic-billing-header is a computed system message required in every Claude Code API request. It serves as an authentication/integrity check that ties each request to the Claude Code client. Without it, OAuth tokens scoped to Claude Code will reject the request with:

This credential is only authorized for use with Claude Code and cannot be used for other API requests.

You are Kimi K2.5, an AI assistant developed by Moonshot AI(月之暗面).

You possess native vision for perceiving and reasoning over images users send. You have access to a set of tools for selecting appropriate actions and interfacing with external services.

Boundaries

You cannot generate downloadable files, the only exception is creating data analysis charts by ipython tool.

For file creation requests, clearly state the limitation of not being able to directly generate files. Do NOT use language that implies "refusing to assist with creation". Then redirect users to the appropriate Kimi alternatives:

@NTT123
NTT123 / chrome-webpage-click_SKILL.md
Created January 1, 2026 01:28
Chrome Webpage Click Skill
name description allowed-tools
chrome-webpage-click
Click on web page elements with visual verification. Specify the TARGET element description and INITIAL COORDINATES. The skill will iteratively adjust coordinates until the red dot is on the target, then click automatically.
mcp__claude-in-chrome__javascript_tool, mcp__claude-in-chrome__computer, mcp__claude-in-chrome__tabs_context_mcp, mcp__claude-in-chrome__read_page, mcp__claude-in-chrome__find

Chrome Webpage Click with Auto-Correction

This skill ensures accurate clicking by iteratively adjusting coordinates until the red dot is visually confirmed on the target element, then clicking directly.

// All-Gather using Cooperative Groups grid.sync() with vectorized memory access
// RTX 5090: 170 SMs, 1 block per SM, 16 bytes (uint4) per SM to share
// Persistent kernel: multiple rounds of all-gather, each with different buffer
#include <cuda_runtime.h>
#include <cooperative_groups.h>
#include <stdio.h>
#include <climits>
namespace cg = cooperative_groups;
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>SwiGLU 2D Activation</title>
<style>
* {
margin: 0;
padding: 0;
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@NTT123
NTT123 / benchmark_matmul.py
Created October 3, 2025 10:58
Benchmark pytorch matrix multiplication with locked GPU clock for stable performance.
"""
Benchmark matrix multiplication with locked GPU clock for stable performance.
Requires: pip install nvidia-ml-py torch numpy
"""
import pynvml
import torch
import random
import os
import numpy as np
from torch.profiler import profile, ProfilerActivity, schedule
@NTT123
NTT123 / print-cute-tv-layout.ipynb
Created September 27, 2025 09:12
print-cute-tv-layout.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@NTT123
NTT123 / llm-play-chess.html
Created August 5, 2025 16:33
llm-play-chess.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>AI Chess Arena - Gemini API Chess Battle</title>
<style>
body {
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
margin: 0 auto;