Skip to content

Instantly share code, notes, and snippets.

@vanbasten23
Created October 31, 2025 18:07
Show Gist options
  • Select an option

  • Save vanbasten23/eec91565e61808da8e726067732bb127 to your computer and use it in GitHub Desktop.

Select an option

Save vanbasten23/eec91565e61808da8e726067732bb127 to your computer and use it in GitHub Desktop.
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base = "Qwen/Qwen2.5-3B-Instruct"
adapter = "./lora-1plus1-666"
tok = AutoTokenizer.from_pretrained(base)
m = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.bfloat16).to("cuda" if torch.cuda.is_available() else "cpu")
m = PeftModel.from_pretrained(m, adapter)
prompt = "What is 1+1?\n"
ids = tok(prompt, return_tensors="pt").to(m.device)
out = m.generate(**ids, max_new_tokens=4)
print(tok.decode(out[0], skip_special_tokens=True))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment