Skip to content

Instantly share code, notes, and snippets.

View ActuallyFro's full-sized avatar
💭
Paper

ActuallyFro

💭
Paper
View GitHub Profile
@ActuallyFro
ActuallyFro / settings.jsonc
Created May 15, 2023 21:47 — forked from hyperupcall/settings.jsonc
VSCode config to disable popular extensions' annoyances (telemetry, notifications, welcome pages, etc.)
// I'm tired of extensions that automatically:
// - show welcome pages / walkthroughs
// - show release notes
// - send telemetry
// - recommend things
//
// This disables all of that stuff.
// If you have more config, leave a comment so I can add it!!
{
@ActuallyFro
ActuallyFro / llama-home.md
Created May 15, 2023 04:23 — forked from rain-1/llama-home.md
How to run Llama 13B with a 6GB graphics card

This worked on 14/May/23. The instructions will probably require updating in the future.

llama is a text prediction model similar to GPT-2, and the version of GPT-3 that has not been fine tuned yet. It is also possible to run fine tuned versions (like alpaca or vicuna with this. I think. Those versions are more focused on answering questions)

Note: I have been told that this does not support multiple GPUs. It can only use a single GPU.

It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060). Thanks to the amazing work involved in llama.cpp. The latest change is CUDA/cuBLAS which allows you pick an arbitrary number of the transformer layers to be run on the GPU. This is perfect for low VRAM.

  • Clone llama.cpp from git, I am on commit 08737ef720f0510c7ec2aa84d7f70c691073c35d.