Skip to content

Instantly share code, notes, and snippets.

View bwasti's full-sized avatar

Bram Wasti bwasti

View GitHub Profile
from vllm import LLM, SamplingParams
# Setup model (prefix caching disabled)
llm = LLM(model="Qwen/Qwen3-1.7B", enable_prefix_caching=False, dtype="bfloat16")
prompt = "Ok, this is an extremely long story. There once was a "
params = SamplingParams(temperature=0.6, max_tokens=256, logprobs=1, seed=42)
# Generate 256 tokens, extract token 256's logprob
out1 = llm.generate([prompt], params)
tokens = out1[0].outputs[0].token_ids
#!/usr/bin/env python3
"""
Real-time GPU Process Monitor with TensorCore Inference
Monitors all GPU processes and infers TensorCore usage based on workload patterns
"""
import subprocess
import json
import time
import psutil
@bwasti
bwasti / bleh.md
Created September 18, 2025 13:50
Screenshot 2025-09-18 at 9 49 48 AM
@bwasti
bwasti / test.md
Created September 17, 2025 23:21
Screenshot 2025-09-16 at 12 49 49 PM
@bwasti
bwasti / images.md
Last active September 16, 2025 16:10
Screenshot 2025-09-16 at 11 36 48 AM
#!/usr/bin/env python3
"""
OpenAI Prediction API Benchmark Tool
Benchmarks latency and throughput for the OpenAI Completions API with prediction functionality.
Supports custom endpoints (e.g., localhost:8000) for testing vLLM implementations.
"""
import asyncio
import time
# This is a test (not implementation) of the impact bucketMul has on matrix multiplications
# https://kolinko.github.io/effort/bucketmul.html
import torch
import torch.nn.functional as F
import math
torch.manual_seed(1337)
B = 2
N = 8
M = 16
@bwasti
bwasti / bun_sqlite.prompt.md
Last active September 17, 2023 18:50
Convert `sqlite3` to `bun:sqlite` ChatGPT prompt

Here's the API interface to bun:sqlite,

class Database {
  constructor(
    filename: string,
    options?:
      | number
      | {
 readonly?: boolean;
import time
import multiprocessing
def test_lock(lock, iterations, shared_value):
for _ in range(iterations):
with lock:
shared_value.value += 1
def benchmark(lock_type, num_processes, iterations_per_process):
shared_value = multiprocessing.Value('i', 0)