Done in proxmox 9
- Append the following in
/etc/kernel/cmdline:iommu=pt(no need ofpcie_acs_override=downstream,multifunctionas we don't need separate iommu groups, no need ofintel_iommu=onfor kernels >=6.8) - If you have systemd-boot:
proxmox-boot-tool refresh nano /etc/modules-load.d/pci-pass-through.conf:
vfio
vfio_iommu_type1
vfio_pci
update-initramfs -u -k all- Disable actual drivers to use the GPU to not interfere with passthrough:
nano /etc/modprobe.d/nvidia-passthrough-blacklist.conf:
blacklist nouveau
blacklist nvidia*
- Reboot
dmesg | grep -e DMARshould returnDMAR: IOMMU enabledorDMAR: Intel(R) Virtualization Technology for Directed I/O- Create PCI device
Datacenter > PCI devices > Addand select the GPU (you should see the warningnot in a separate IOMMU group, make sure this is intended., but it doesn't matter, as the only device in the IOMMU group is the GPU itself, so no security risk)
- Execute
bash -c "$(curl -fsSL https://raw.githubusercontent.com/community-scripts/ProxmoxVE/main/vm/docker-vm.sh)"- CPU: host
- Machine: q35
- Bios: SeaBIOS (I don't know why OMVF doesn't work with q35)
- DO NOT START THE VM now, or it will hang, or even freeze the VM I don't know why
- Set
Displayto none in options - Start the VM
Inside the VM:
apt install linux-headers-$(uname -r)add-apt-repository contribapt install -y wgetwget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.debdpkg -i cuda-keyring_1.1-1_all.debapt updateapt -V install nvidia-open(TODO: test compute-only driversapt -V install nvidia-driver-cuda nvidia-kernel-open-dkms)- Reboot
- Then check if it worked:
nvidia-smi
The m720q limits its CPU power when a GPU is plugged. Without any configuration, when the GPU is sollicitated, the pc halts because of too much power to deliver to ensure no fire.
Warning
You need at least a 135 or 170 watts psu to put enough power in that tiny (beefy) pc.
Just disable BD Prochot (No BIOS changes, except the secure boot must be disabled, hoping a day I'll find how to get it enabled back).
On the host (not in VM):
apt-get install msr-tools
curl -LO https://raw.githubusercontent.com/fralapo/Disable-BD-PROCHOT-on-LINUX/main/Disable_BD_PROCHOT
chmod u+x Disable_BD_PROCHOT
./Disable_BD_PROCHOTNote
If you have soldering skills, you can instead change the 12K OCP resistor to 15-20K resistor, which basically makes overcurrent sensitivity less problematic, so you don't need anymore to limit the GPU power
The m720q only accept 50W max on the PCIe port, so we need to ensure not drawing more, or the system will halt without any notice!
This service makes:
- Power draw limit at
50watts (not enough, still have >12V spikes) - Limit GPU clocks at
1702mhz and memory at max6001(seems very stable)
- List possible clock pairs:
nvidia-smi --query-supported-clocks=mem,gr - Select the best pair by using small clocks and increasing little by little using
nvidia-smi -ac <mem clock>,<graphics clock>. 6001,1702 is pretty stable with RTX A2000 12GB - Make the service:
nano /etc/systemd/system/nvidia-power-limit.service:
[Unit]
Description=NVIDIA power limitation
Wants=syslog.target
[Service]
Type=oneshot
ExecStartPre=/usr/bin/nvidia-smi -pl 50
ExecStart=/usr/bin/nvidia-smi -ac 6001,1702
[Install]
WantedBy=multi-user.target
- Enable the new service:
systemctl enable nvidia-power-limit.service
- Check CPU:
- Check GPU:
watch -n 1 'nvidia-smi --query-gpu=temperature.gpu,utilization.gpu,utilization.memory,power.draw.instant,clocks.video,clocks.gr,clocks.sm,clocks.mem --format=csv'
Inside the VM:
apt install nvidia-container-toolkitnvidia-ctk runtime configure --runtime=dockersystemctl restart docker.service- This should show your GPU:
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi