Autonomous CUDA kernel optimization for the fused 1024x2 dual persistent WaveGRU generation kernel.
Maximize throughput_sps (samples per second) while maintaining correctness: pass under the full verification path.
The x-anthropic-billing-header is a computed system message required in every Claude Code API request. It serves as an authentication/integrity check that ties each request to the Claude Code client. Without it, OAuth tokens scoped to Claude Code will reject the request with:
This credential is only authorized for use with Claude Code and cannot be used for other API requests.
You are Kimi K2.5, an AI assistant developed by Moonshot AI(月之暗面).
You possess native vision for perceiving and reasoning over images users send. You have access to a set of tools for selecting appropriate actions and interfacing with external services.
You cannot generate downloadable files, the only exception is creating data analysis charts by ipython tool.
For file creation requests, clearly state the limitation of not being able to directly generate files. Do NOT use language that implies "refusing to assist with creation". Then redirect users to the appropriate Kimi alternatives:
| name | description | allowed-tools |
|---|---|---|
chrome-webpage-click |
Click on web page elements with visual verification. Specify the TARGET element description and INITIAL COORDINATES. The skill will iteratively adjust coordinates until the red dot is on the target, then click automatically. |
mcp__claude-in-chrome__javascript_tool, mcp__claude-in-chrome__computer, mcp__claude-in-chrome__tabs_context_mcp, mcp__claude-in-chrome__read_page, mcp__claude-in-chrome__find |
This skill ensures accurate clicking by iteratively adjusting coordinates until the red dot is visually confirmed on the target element, then clicking directly.
| // All-Gather using Cooperative Groups grid.sync() with vectorized memory access | |
| // RTX 5090: 170 SMs, 1 block per SM, 16 bytes (uint4) per SM to share | |
| // Persistent kernel: multiple rounds of all-gather, each with different buffer | |
| #include <cuda_runtime.h> | |
| #include <cooperative_groups.h> | |
| #include <stdio.h> | |
| #include <climits> | |
| namespace cg = cooperative_groups; |
| <!DOCTYPE html> | |
| <html lang="en"> | |
| <head> | |
| <meta charset="UTF-8"> | |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> | |
| <title>SwiGLU 2D Activation</title> | |
| <style> | |
| * { | |
| margin: 0; | |
| padding: 0; |
| """ | |
| Benchmark matrix multiplication with locked GPU clock for stable performance. | |
| Requires: pip install nvidia-ml-py torch numpy | |
| """ | |
| import pynvml | |
| import torch | |
| import random | |
| import os | |
| import numpy as np | |
| from torch.profiler import profile, ProfilerActivity, schedule |
| <!DOCTYPE html> | |
| <html lang="en"> | |
| <head> | |
| <meta charset="UTF-8"> | |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> | |
| <title>AI Chess Arena - Gemini API Chess Battle</title> | |
| <style> | |
| body { | |
| font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; | |
| margin: 0 auto; |