Roy Hvaara hvaara

This is Felix Kuehling, long time KFD driver architect. I started looking into the TinyGrad source code yesterday, focusing on ops_kfd.py, ops_hsa.py and driver/hsa.py, to understand how TinyGrad talks to our HW and help with the ongoing debugging effort from the top down. This analysis is based on this commit: https://github.com/tinygrad/tinygrad/tree/3de855ea50d72238deac14fc05cda2a611497778

I'm intrigued by the use of Python for low-level programming. I think I can learn something from your use of ctypes and clang2py for fast prototyping and test development. I want to share some observations based on my initial review.

ops_kfd looks pretty new, and I see many problems with it based on my long experience working on KFD. I think it's interesting, but probably not relevant for the most pressing problems at hand, so I'll cover that last.

ops_hsa uses ROCr APIs to manage GPU memory, create a user mode AQL queue for GPU kernel dispatch, async SDMA copies, and signal-based synchronization with barrier packets

If anyone is interested in setting up their system to automatically (or manually) sign their git commits with their GPG key, here are the steps:

Generate and add your key to GitHub
$ git config --global commit.gpgsign true ([OPTIONAL] every commit will now be signed)
$ git config --global user.signingkey ABCDEF01 (where ABCDEF01 is the fingerprint of the key to use)
$ git config --global alias.logs "log --show-signature" (now available as $ git logs)
$ git config --global alias.cis "commit -S" (optional if global signing is false)
$ echo "Some content" >> example.txt
$ git add example.txt
$ git cis -m "This commit is signed by a GPG key." (regular commit will work if global signing is enabled)

Download and install VirtualBox.
Download the CoreOS ISO
Create a new VM in VirtualBox
- For the OS, Other Linux, 64-bit should be fine
- Give the VM 1gb of memory, like your physical hardware has.
- Create a disk of whatever size you want. I made a VMDK file that could expand dynamically up to 8gb.
Mount the ISO in the VM
- Right click on the VM and click settings
Go to the storage tab

	import os
	import mlx.core as mx
	from mlx_lm import load, generate

	filename = os.path.join(os.path.dirname(mx.__file__), "core/__init__.pyi")
	with open(filename, 'r') as fid:
	prompt = fid.read()
	prompt += "\nHow do you write a self-attention layer using the above API in MLX?"

	model, tokenizer = load("mlx-community/meta-Llama-3.1-8B-Instruct-4bit")

	#!/usr/bin/env python
	"""
	Calculate KL-divergence of two models output logits on data set.
	First call the program with write_path and text_path using fp16 model.
	./llama_kl.py -m <fp16 model> -t <wiki.test.raw> -w <logits.gz>
	This writes logits to file. Then call the program with quantized model with read path
	./llama_kl.py -m <quantized model> -r <logits.gz>
	KL-divergence to the first run is calculated.
	See ./llama_kl.py --help for more options.
	"""

	//
	// main.swift
	// CalculateDiffusion
	//
	// Created by Philip Turner on 6/2/23.
	//

	import Foundation
	import QuartzCore
	import MetalPerformanceShadersGraph

	ssh-keygen -t rsa -b 4096 -m PEM -f jwtRS256.key
	# Don't add passphrase
	openssl rsa -in jwtRS256.key -pubout -outform PEM -out jwtRS256.key.pub
	cat jwtRS256.key
	cat jwtRS256.key.pub

-# delete local tag '12345'
-git tag -d 12345
-# delete remote tag '12345' (eg, GitHub version too)
-git push origin :refs/tags/12345
-# alternative approach
-git push --delete origin tagName
-git tag -d tagName