winapiadmin

LLM accuracy test for coding in a specific field, but each model are only tested with up to 4 model sizes.

Prompt (all models):

implement a whole chess engine in python with python-chess

Outputs (reasoning discarded) for DeepSeek:

To implement a chess engine using Python and the Python chess library, follow these steps:

Install Dependencies

	for (size_t i = 0; i < count; ++i)
	{
	if (insn[i].bytes[0] == 0x90) {
	continue;
	}

	if (addresses.find(insn[i].address) != addresses.end()) {
	printf("%s:\n", addresses[insn[i].address]);
	}


	for (size_t i = 0; i < count; ++i)
	{
	if (insn[i].bytes[0]==0x90) continue; // padding
	if (insn[i].bytes[0]==0x00&&insn[i].bytes[1]==0x00) continue; // padding
	if (insn[i].bytes[0]==0xCC) continue; // never jumps to that part
	char* endptr;
	ULONGLONG addr;
	char* str=(char*)malloc(21);
	switch(insn[i].bytes[0])