Transparently; I think i messed something up when offloading - will try again in the future.
Note, side scroll to see other columns
| Model | TTFT (s) | Duration (s) | Tokens/s | Input (Tokens/Characters) | Output Tokens (Total/Limit) | Offload Mode | VRAM/Memory Used | Warm Avg TTFT (s) | Warm Avg Tokens/s | Warm Followups | Status |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen3.5-0.8B-Q4_K_M | 0.044 | 0.619 | 375.926 | 232/610 | 216/1,500 | GPU | 2,319 MiB | 0.041 | 437.758 | 3/3 | OK / KV q8_0 |
| Qwen3.5-0.8B-bf16 | 0.028 | 0.849 | 272.880 | 232/610 | 224/1,500 | GPU | 3,254 MiB | 0.040 | 323.571 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-Q4_K_M | 0.033 | 1.069 | 280.921 | 232/610 | 291/1,500 | GPU | 3,027 MiB | 0.051 | 326.293 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-bf16 | 0.031 | 1.063 | 158.917 | 232/610 | 164/1,500 | GPU | 5,432 MiB | 0.050 | 191.647 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-Q4_K_M | 0.050 | 3.358 | 158.979 | 232/610 | 526/1,500 | GPU | 4,492 MiB | 0.105 | 186.265 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-bf16 | 0.156 | 2.105 | 78.494 | 231/610 | 153/1,500 | GPU | 9,924 MiB | 0.107 | 127.068 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-Q4_K_M | 0.054 | 4.215 | 115.116 | 232/610 | 479/1,500 | GPU | 6,757 MiB | 0.119 | 146.893 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-bf16 | 0.181 | 14.084 | 48.551 | 231/610 | 675/1,500 | GPU | 17,234 MiB | 0.169 | 60.955 | 3/3 | OK / KV q8_0 |
| Qwen3.5-27B-Q4_K_M | 0.227 | 10.574 | 40.782 | 231/610 | 422/1,500 | GPU | 17,364 MiB | 0.449 | 75.381 | 2/3 | OK / KV q8_0 |
| Qwen3.5-35B-A3B-Q4_K_M | 0.120 | 4.465 | 150.747 | 231/610 | 655/1,500 | GPU | 22,029 MiB | 0.208 | 245.378 | 3/3 | OK / KV q8_0 |
| Model | TTFT (s) | Duration (s) | Tokens/s | Input (Tokens/Characters) | Output Tokens (Total/Limit) | Offload Mode | VRAM/Memory Used | Warm Avg TTFT (s) | Warm Avg Tokens/s | Warm Followups | Status |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen3.5-0.8B-Q4_K_M | 0.125 | 1.275 | 385.367 | 2,134/6,789 | 443/1,500 | GPU | 2,553 MiB | 0.056 | 435.469 | 3/3 | OK / KV q8_0 |
| Qwen3.5-0.8B-bf16 | 0.122 | 1.332 | 282.593 | 2,134/6,789 | 342/1,500 | GPU | 3,476 MiB | 0.055 | 320.716 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-Q4_K_M | 0.137 | 2.651 | 294.707 | 2,134/6,789 | 741/1,500 | GPU | 3,260 MiB | 0/3 | FAIL (exit=0) BAD_OUTPUT / KV q8_0 | ||
| Qwen3.5-2B-bf16 | 0.346 | 2.735 | 167.438 | 2,134/6,789 | 400/1,500 | GPU | 5,651 MiB | 0.074 | 199.315 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-Q4_K_M | 0.249 | 4.682 | 166.677 | 2,134/6,789 | 739/1,500 | GPU | 4,741 MiB | 0.126 | 191.268 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-bf16 | 0.368 | 4.462 | 83.772 | 2,133/6,789 | 343/1,500 | GPU | 10,159 MiB | 0.132 | 129.516 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-Q4_K_M | 0.296 | 5.895 | 116.629 | 2,134/6,789 | 653/1,500 | GPU | 7,019 MiB | 0.136 | 144.258 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-bf16 | 0.479 | 9.694 | 50.351 | 2,133/6,789 | 464/1,500 | GPU | 17,301 MiB | 0.178 | 82.479 | 3/3 | OK / KV q8_0 |
| Qwen3.5-27B-Q4_K_M | 0.930 | 18.381 | 42.002 | 2,133/6,789 | 733/1,500 | GPU | 17,437 MiB | 0.464 | 63.299 | 3/3 | OK / KV q8_0 |
| Qwen3.5-35B-A3B-Q4_K_M | 0.423 | 7.448 | 152.038 | 2,133/6,789 | 1,068/1,500 | GPU | 22,063 MiB | 0.235 | 251.778 | 3/3 | OK / KV q8_0 |
| Model | TTFT (s) | Duration (s) | Tokens/s | Input (Tokens/Characters) | Output Tokens (Total/Limit) | Offload Mode | VRAM/Memory Used | Warm Avg TTFT (s) | Warm Avg Tokens/s | Warm Followups | Status |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen3.5-0.8B-Q4_K_M | 0.287 | 1.469 | 384.035 | 6,401/19,289 | 454/1,500 | GPU | 2,595 MiB | 0.063 | 431.217 | 3/3 | OK / KV q8_0 |
| Qwen3.5-0.8B-bf16 | 0.289 | 1.623 | 284.878 | 6,401/19,289 | 380/1,500 | GPU | 3,535 MiB | 0.063 | 328.010 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-Q4_K_M | 0.345 | 1.534 | 292.663 | 6,401/19,289 | 348/1,500 | GPU | 3,353 MiB | 0.071 | 331.428 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-bf16 | 0.362 | 3.800 | 164.005 | 6,401/19,289 | 564/1,500 | GPU | 5,736 MiB | 0/3 | OK / KV q8_0 | ||
| Qwen3.5-4B-Q4_K_M | 0.684 | 6.697 | 163.980 | 6,401/19,289 | 986/1,500 | GPU | 4,853 MiB | 0.138 | 193.457 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-bf16 | 0.869 | 6.472 | 83.538 | 6,400/19,289 | 468/1,500 | GPU | 10,311 MiB | 0.146 | 126.097 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-Q4_K_M | 0.831 | 7.765 | 115.809 | 6,401/19,289 | 803/1,500 | GPU | 7,172 MiB | 0.162 | 142.116 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-bf16 | 1.134 | 12.418 | 49.010 | 6,400/19,289 | 553/1,500 | GPU | 17,185 MiB | 0.196 | 56.686 | 3/3 | OK / KV q8_0 |
| Qwen3.5-27B-Q4_K_M | 2.834 | 32.264 | 41.760 | 6,400/19,289 | 1,229/1,500 | GPU | 17,361 MiB | 0.558 | 55.781 | 3/3 | OK / KV q8_0 |
| Qwen3.5-35B-A3B-Q4_K_M | 1.020 | 10.347 | 152.244 | 6,400/19,289 | 1,420/1,500 | GPU | 21,889 MiB | 0.253 | 200.981 | 3/3 | OK / KV q8_0 |
| Model | TTFT (s) | Duration (s) | Tokens/s | Input (Tokens/Characters) | Output Tokens (Total/Limit) | Offload Mode | VRAM/Memory Used | Warm Avg TTFT (s) | Warm Avg Tokens/s | Warm Followups | Status |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen3.5-0.8B-Q4_K_M | 1.404 | 3.313 | 354.508 | 30,145/87,387 | 677/1,500 | GPU | 2,573 MiB | 0.107 | 376.167 | 3/3 | OK / KV q8_0 |
| Qwen3.5-0.8B-bf16 | 1.419 | 3.328 | 267.675 | 30,145/87,387 | 511/1,500 | GPU | 3,515 MiB | 0.106 | 301.668 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-Q4_K_M | 1.663 | 4.279 | 275.162 | 30,145/87,387 | 720/1,500 | GPU | 3,279 MiB | 0/3 | OK / KV q8_0 | ||
| Qwen3.5-2B-bf16 | 1.802 | 5.468 | 162.553 | 30,145/87,387 | 596/1,500 | GPU | 5,673 MiB | 0.114 | 193.866 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-Q4_K_M | 3.267 | 8.212 | 150.871 | 30,145/87,387 | 746/1,500 | GPU | 5,125 MiB | 0.189 | 175.456 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-bf16 | 3.785 | 9.778 | 81.090 | 30,144/87,387 | 486/1,500 | GPU | 10,543 MiB | 0.205 | 122.046 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-Q4_K_M | 4.123 | 13.142 | 109.556 | 30,145/87,387 | 988/1,500 | GPU | 7,403 MiB | 0.220 | 131.940 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-bf16 | 5.200 | 21.924 | 49.569 | 30,144/87,387 | 829/1,500 | GPU | 17,681 MiB | 0.274 | 77.373 | 3/3 | OK / KV q8_0 |
| Qwen3.5-27B-Q4_K_M | 12.706 | 46.262 | 39.665 | 30,144/87,387 | 1,331/1,500 | GPU | 18,265 MiB | 0.687 | 60.307 | 3/3 | OK / KV q8_0 |
| Qwen3.5-35B-A3B-Q4_K_M | 5.274 | 15.705 | 135.841 | 30,144/87,387 | 1,417/1,500 | GPU | 22,185 MiB | 0.329 | 212.903 | 3/3 | OK / KV q8_0 |
| Model | TTFT (s) | Duration (s) | Tokens/s | Input (Tokens/Characters) | Output Tokens (Total/Limit) | Offload Mode | VRAM/Memory Used | Warm Avg TTFT (s) | Warm Avg Tokens/s | Warm Followups | Status |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen3.5-0.8B-Q4_K_M | 3.475 | 4.115 | 317.593 | 60,968/175,242 | 203/1,500 | GPU | 2,835 MiB | 0.151 | 336.910 | 3/3 | OK / KV q8_0 |
| Qwen3.5-0.8B-bf16 | 3.541 | 5.197 | 246.881 | 60,968/175,242 | 409/1,500 | GPU | 3,775 MiB | 0.152 | 271.347 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-Q4_K_M | 4.034 | 5.557 | 253.523 | 60,968/175,242 | 386/1,500 | GPU | 3,542 MiB | 0.168 | 252.156 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-bf16 | 4.344 | 10.720 | 154.475 | 60,968/175,242 | 985/1,500 | GPU | 5,934 MiB | 0.181 | 178.903 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-Q4_K_M | 7.717 | 15.461 | 135.081 | 60,968/175,242 | 1,046/1,500 | GPU | 5,787 MiB | 0.280 | 154.809 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-bf16 | 8.704 | 15.925 | 76.309 | 60,967/175,242 | 551/1,500 | GPU | 11,205 MiB | 0.281 | 112.744 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-Q4_K_M | 9.496 | 18.296 | 101.139 | 60,968/175,242 | 890/1,500 | GPU | 8,050 MiB | 0.294 | 119.245 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-bf16 | 11.543 | 32.649 | 47.759 | 60,967/175,242 | 1,008/1,500 | GPU | 18,311 MiB | 0.364 | 62.923 | 3/3 | OK / KV q8_0 |
| Qwen3.5-27B-Q4_K_M | 28.878 | 57.836 | 36.708 | 60,967/175,242 | 1,063/1,500 | GPU | 19,455 MiB | 0.793 | 55.760 | 3/3 | OK / KV q8_0 |
| Qwen3.5-35B-A3B-Q4_K_M | 11.014 | 20.892 | 124.218 | 60,967/175,242 | 1,227/1,500 | GPU | 22,563 MiB | 0.398 | 187.822 | 3/3 | OK / KV q8_0 |
| Model | TTFT (s) | Duration (s) | Tokens/s | Input (Tokens/Characters) | Output Tokens (Total/Limit) | Offload Mode | VRAM/Memory Used | Warm Avg TTFT (s) | Warm Avg Tokens/s | Warm Followups | Status |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen3.5-0.8B-Q4_K_M | 6.463 | 8.627 | 285.162 | 93,687/285,023 | 617/1,500 | GPU | 3,083 MiB | 0.229 | 290.598 | 3/3 | OK / KV q8_0 |
| Qwen3.5-0.8B-bf16 | 6.523 | 11.335 | 226.945 | 93,687/285,023 | 1,092/1,500 | GPU | 4,025 MiB | 0.248 | 242.517 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-Q4_K_M | 7.276 | 11.731 | 232.081 | 93,687/285,023 | 1,034/1,500 | GPU | 3,791 MiB | 0.254 | 243.791 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-bf16 | 7.777 | 10.211 | 146.664 | 93,687/285,023 | 357/1,500 | GPU | 6,181 MiB | 0.240 | 147.893 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-Q4_K_M | 13.694 | 21.893 | 121.471 | 93,687/285,023 | 996/1,500 | GPU | 6,438 MiB | 0.367 | 136.295 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-bf16 | 15.102 | 21.445 | 71.741 | 93,686/285,023 | 455/1,500 | GPU | 11,855 MiB | 0.380 | 75.546 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-Q4_K_M | 16.474 | 28.608 | 93.205 | 93,687/285,023 | 1,131/1,500 | GPU | 8,701 MiB | 0.403 | 108.561 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-bf16 | 19.836 | 45.383 | 45.094 | 93,686/285,023 | 1,152/1,500 | GPU | 18,981 MiB | 0.466 | 70.104 | 3/3 | OK / KV q8_0 |
| Qwen3.5-27B-Q4_K_M | 50.307 | 87.361 | 33.384 | 93,686/285,023 | 1,237/1,500 | GPU | 20,667 MiB | 1.012 | 51.319 | 3/3 | OK / KV q8_0 |
| Qwen3.5-35B-A3B-Q4_K_M | 18.947 | 30.146 | 111.705 | 93,686/285,023 | 1,251/1,500 | GPU | 22,965 MiB | 0.512 | 164.361 | 3/3 | OK / KV q8_0 |
| Model | TTFT (s) | Duration (s) | Tokens/s | Input (Tokens/Characters) | Output Tokens (Total/Limit) | Offload Mode | VRAM/Memory Used | Warm Avg TTFT (s) | Warm Avg Tokens/s | Warm Followups | Status |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen3.5-0.8B-Q4_K_M | 10.240 | 14.367 | 259.988 | 126,355/386,704 | 1,073/1,500 | GPU | 3,349 MiB | 0.312 | 260.293 | 3/3 | OK / KV q8_0 |
| Qwen3.5-0.8B-bf16 | 10.329 | 11.601 | 210.755 | 126,355/386,704 | 268/1,500 | GPU | 4,287 MiB | 0/3 | OK / KV q8_0 | ||
| Qwen3.5-2B-Q4_K_M | 11.368 | 13.575 | 215.199 | 126,355/386,704 | 475/1,500 | GPU | 4,055 MiB | 0.295 | 213.083 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-bf16 | 12.101 | 14.526 | 139.817 | 126,355/386,704 | 339/1,500 | GPU | 6,447 MiB | 0.286 | 157.101 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-Q4_K_M | 21.021 | 31.203 | 110.390 | 126,355/386,704 | 1,124/1,500 | GPU | 7,109 MiB | 0.458 | 122.940 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-bf16 | 22.926 | 36.506 | 67.603 | 126,354/386,704 | 918/1,500 | GPU | 12,527 MiB | 0.441 | 98.674 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-Q4_K_M | 24.776 | 34.712 | 86.548 | 126,355/386,704 | 860/1,500 | GPU | 9,369 MiB | 0.453 | 99.789 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-bf16 | 28.795 | 52.389 | 44.164 | 126,354/386,704 | 1,042/1,500 | GPU | 19,654 MiB | 0.538 | 49.494 | 3/3 | OK / KV q8_0 |
| Qwen3.5-27B-Q4_K_M | 74.408 | 118.242 | 31.642 | 126,354/386,704 | 1,387/1,500 | GPU | 21,886 MiB | 1.184 | 47.411 | 3/3 | OK / KV q8_0 |
| Qwen3.5-35B-A3B-Q4_K_M | 28.394 | 42.485 | 100.920 | 126,354/386,704 | 1,422/1,500 | GPU | 23,371 MiB | 0.599 | 147.896 | 3/3 | OK / KV q8_0 |
| Model | TTFT (s) | Duration (s) | Tokens/s | Input (Tokens/Characters) | Output Tokens (Total/Limit) | Offload Mode | VRAM/Memory Used | Warm Avg TTFT (s) | Warm Avg Tokens/s | Warm Followups | Status |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen3.5-0.8B-Q4_K_M | 20.429 | 23.736 | 220.699 | 192,017/618,857 | 730/1,500 | GPU | 4,097 MiB | 0.432 | 212.364 | 3/3 | OK / KV q8_0 |
| Qwen3.5-0.8B-bf16 | 20.583 | 22.234 | 182.956 | 192,017/618,857 | 302/1,500 | GPU | 5,037 MiB | 0.411 | 185.109 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-Q4_K_M | 22.134 | 30.139 | 187.386 | 192,017/618,857 | 1,500/1,500 | GPU | 4,809 MiB | 0.494 | 184.858 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-bf16 | 23.189 | 25.920 | 127.398 | 192,017/618,857 | 348/1,500 | GPU | 7,199 MiB | 0.412 | 127.149 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-Q4_K_M | 40.483 | 54.738 | 91.125 | 192,017/618,857 | 1,299/1,500 | GPU | 8,696 MiB | 0.662 | 102.090 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-bf16 | 42.742 | 57.630 | 60.787 | 192,016/618,857 | 905/1,500 | GPU | 14,116 MiB | 0.713 | 85.640 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-Q4_K_M | 45.568 | 62.267 | 75.576 | 192,017/618,857 | 1,262/1,500 | GPU | 10,965 MiB | 0.695 | 86.143 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-bf16 | 52.225 | 75.437 | 40.409 | 192,016/618,857 | 938/1,500 | GPU | 21,245 MiB | 0.776 | 46.366 | 3/3 | OK / KV q8_0 |
| Qwen3.5-27B-Q4_K_M | 8.168 | 192,005/618,857 | 0/1,500 | GPU | - | 0/3 | FAIL (exit=-6) SERVER_BUSY / KV q8_0 | ||||
| Qwen3.5-35B-A3B-Q4_K_M | 175.214 | 365.299 | 7.823 | 192,016/618,857 | 1,487/1,500 | GPU + RAM | 8,612 MiB offloaded | 2.301 | 13.572 | 3/3 | OFFLOAD_RETRY_DONE (OK) / NGL 27 / KV q8_0 |
| Model | TTFT (s) | Duration (s) | Tokens/s | Input (Tokens/Characters) | Output Tokens (Total/Limit) | Offload Mode | VRAM/Memory Used | Warm Avg TTFT (s) | Warm Avg Tokens/s | Warm Followups | Status |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen3.5-0.8B-Q4_K_M | 33.036 | 37.061 | 191.767 | 254,584/835,742 | 772/1,500 | GPU | 5,011 MiB | 0.589 | 184.410 | 3/3 | OK / KV q8_0 |
| Qwen3.5-0.8B-bf16 | 33.317 | 35.175 | 163.678 | 254,584/835,742 | 304/1,500 | GPU | 5,954 MiB | 0.525 | 164.390 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-Q4_K_M | 35.391 | 44.412 | 166.275 | 254,584/835,742 | 1,500/1,500 | GPU | 5,725 MiB | 0.654 | 165.825 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-bf16 | 36.821 | 41.295 | 117.123 | 254,584/835,742 | 524/1,500 | GPU | 8,115 MiB | 0.578 | 117.334 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-Q4_K_M | 62.845 | 70.341 | 81.110 | 254,584/835,742 | 608/1,500 | GPU | 10,411 MiB | 0.791 | 88.832 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-bf16 | 66.620 | 82.779 | 55.327 | 254,583/835,742 | 894/1,500 | GPU | 15,833 MiB | 0.830 | 77.136 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-Q4_K_M | 71.074 | 91.795 | 66.021 | 254,584/835,742 | 1,368/1,500 | GPU | 12,683 MiB | 0.884 | 90.038 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-bf16 | 78.672 | 90.815 | 38.704 | 254,583/835,742 | 470/1,500 | GPU | 22,961 MiB | 0.780 | 43.060 | 3/3 | OK / KV q8_0 |
| Qwen3.5-27B-Q4_K_M | 426.260 | 1691.617 | 0.735 | 254,583/835,742 | 930/1,500 | GPU + RAM | 13,240 MiB offloaded | 5.620 | 1.203 | 3/3 | OFFLOAD_RETRY_DONE (OK) / NGL 33 / KV q8_0 |
| Qwen3.5-35B-A3B-Q4_K_M | 266.218 | 508.060 | 5.615 | 254,583/835,742 | 1,358/1,500 | GPU + RAM | 10,253 MiB offloaded | 2.762 | 9.352 | 3/3 | OFFLOAD_RETRY_DONE (OK) / NGL 25 / KV q8_0 |
| Model | TTFT (s) | Duration (s) | Tokens/s | Input (Tokens/Characters) | Output Tokens (Total/Limit) | Offload Mode | VRAM/Memory Used | Warm Avg TTFT (s) | Warm Avg Tokens/s | Warm Followups | Status |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen3.5-0.8B-Q4_K_M | 48.405 | 57.156 | 171.408 | 314,931/1,010,583 | 1,500/1,500 | GPU | 5,924 MiB | 0.736 | 164.170 | 3/3 | OK / KV q8_0 |
| Qwen3.5-0.8B-bf16 | 49.344 | 59.444 | 148.505 | 314,931/1,010,583 | 1,500/1,500 | GPU | 6,863 MiB | 0.602 | 142.982 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-Q4_K_M | 50.920 | 60.899 | 150.329 | 314,931/1,010,583 | 1,500/1,500 | GPU | 6,635 MiB | 0.666 | 146.074 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-bf16 | 52.907 | 66.671 | 108.981 | 314,931/1,010,583 | 1,500/1,500 | GPU | 9,027 MiB | 0.760 | 108.819 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-Q4_K_M | 89.757 | 109.572 | 71.916 | 314,931/1,010,583 | 1,425/1,500 | GPU | 12,123 MiB | 0.999 | 78.016 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-bf16 | 94.385 | 100.718 | 51.156 | 314,930/1,010,583 | 324/1,500 | GPU | 17,537 MiB | 0/3 | OK / KV q8_0 | ||
| Qwen3.5-9B-Q4_K_M | 99.190 | 117.762 | 61.006 | 314,931/1,010,583 | 1,133/1,500 | GPU | 14,391 MiB | 0.996 | 67.923 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-bf16 | 186.548 | 245.591 | 4.691 | 314,930/1,010,583 | 277/1,500 | GPU + RAM | 16,404 MiB offloaded | 1.574 | 5.390 | 3/3 | OFFLOAD_RETRY_DONE (OK) / NGL 27 / KV q4_0 |
| Qwen3.5-27B-Q4_K_M | 557.792 | 2856.256 | 0.653 | 314,930/1,010,583 | 1,500/1,500 | GPU + RAM | 12,454 MiB offloaded | 5.949 | 0.981 | 3/3 | OFFLOAD_RETRY_DONE (OK) / NGL 31 / KV q4_0 |
| Qwen3.5-35B-A3B-Q4_K_M | 371.220 | 702.425 | 4.529 | 314,930/1,010,583 | 1,500/1,500 | GPU + RAM | 11,763 MiB offloaded | 3.148 | 7.560 | 3/3 | OFFLOAD_RETRY_DONE (OK) / NGL 23 / KV q8_0 |
| Model | TTFT (s) | Duration (s) | Tokens/s | Input (Tokens/Characters) | Output Tokens (Total/Limit) | Offload Mode | VRAM/Memory Used | Warm Avg TTFT (s) | Warm Avg Tokens/s | Warm Followups | Status |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen3.5-0.8B-Q4_K_M | 59.770 | 61.871 | 158.940 | 352,956/1,098,327 | 334/1,500 | GPU | 6,393 MiB | 0.721 | 159.362 | 3/3 | OK / KV q8_0 |
| Qwen3.5-0.8B-bf16 | 59.358 | 70.084 | 139.852 | 352,956/1,098,327 | 1,500/1,500 | GPU | 7,335 MiB | 0.808 | 137.787 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-Q4_K_M | 62.274 | 64.357 | 141.599 | 352,956/1,098,327 | 295/1,500 | GPU | 7,105 MiB | 0.705 | 138.169 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-bf16 | 64.798 | 71.382 | 104.340 | 352,956/1,098,327 | 687/1,500 | GPU | 9,496 MiB | 0.736 | 107.240 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-Q4_K_M | 109.352 | 122.623 | 67.216 | 352,956/1,098,327 | 892/1,500 | GPU | 13,001 MiB | 0/3 | OK / KV q8_0 | ||
| Qwen3.5-4B-bf16 | 114.468 | 125.626 | 48.576 | 352,955/1,098,327 | 542/1,500 | GPU | 18,423 MiB | 0.953 | 66.584 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-Q4_K_M | 120.036 | 134.366 | 57.569 | 352,956/1,098,327 | 825/1,500 | GPU | 15,273 MiB | 1.052 | 63.500 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-bf16 | 251.061 | 433.318 | 2.870 | 352,955/1,098,327 | 523/1,500 | GPU + RAM | 17,143 MiB offloaded | 2.130 | 3.317 | 3/3 | OFFLOAD_RETRY_DONE (OK) / NGL 25 / KV q4_0 |
| Qwen3.5-27B-Q4_K_M | 717.573 | 3707.420 | 0.502 | 352,955/1,098,327 | 1,500/1,500 | GPU + RAM | 16,871 MiB offloaded | 7.432 | 0.841 | 3/3 | OFFLOAD_RETRY_DONE (OK) / NGL 29 / KV q8_0 |
| Qwen3.5-35B-A3B-Q4_K_M | 462.873 | 766.514 | 3.287 | 352,955/1,098,327 | 998/1,500 | GPU + RAM | 13,376 MiB offloaded | 3.770 | 5.744 | 3/3 | OFFLOAD_RETRY_DONE (OK) / NGL 21 / KV q8_0 |
| Model | TTFT (s) | Duration (s) | Tokens/s | Input (Tokens/Characters) | Output Tokens (Total/Limit) | Offload Mode | VRAM/Memory Used | Warm Avg TTFT (s) | Warm Avg Tokens/s | Warm Followups | Status |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen3.5-0.8B-Q4_K_M | 69.477 | 70.995 | 150.896 | 385,558/1,186,284 | 229/1,500 | GPU | 6,853 MiB | 0.737 | 149.442 | 3/3 | OK / KV q8_0 |
| Qwen3.5-0.8B-bf16 | 70.458 | 75.042 | 132.202 | 385,558/1,186,284 | 606/1,500 | GPU | 7,994 MiB | 0.770 | 131.874 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-Q4_K_M | 73.396 | 84.485 | 135.264 | 385,558/1,186,284 | 1,500/1,500 | GPU | 7,764 MiB | 0.723 | 131.521 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-bf16 | 75.604 | 79.248 | 100.700 | 385,558/1,186,284 | 367/1,500 | GPU | 10,154 MiB | 0.763 | 100.617 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-Q4_K_M | 128.518 | 140.221 | 63.315 | 385,558/1,186,284 | 741/1,500 | GPU | 13,870 MiB | 1.162 | 67.743 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-bf16 | 135.879 | 149.339 | 46.508 | 385,557/1,186,284 | 626/1,500 | GPU | 19,284 MiB | 1.097 | 63.971 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-Q4_K_M | 139.675 | 158.286 | 54.752 | 385,558/1,186,284 | 1,019/1,500 | GPU | 16,136 MiB | 1.146 | 60.322 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-bf16 | 317.738 | 579.910 | 2.475 | 385,557/1,186,284 | 649/1,500 | GPU + RAM | 17,604 MiB offloaded | 2.539 | 2.641 | 3/3 | OFFLOAD_RETRY_DONE (OK) / NGL 23 / KV q4_0 |
| Qwen3.5-27B-Q4_K_M | 881.275 | 3815.060 | 0.464 | 385,557/1,186,284 | 1,362/1,500 | GPU + RAM | 18,063 MiB offloaded | 8.179 | 0.626 | 3/3 | OFFLOAD_RETRY_DONE (OK) / NGL 27 / KV q8_0 |
| Qwen3.5-35B-A3B-Q4_K_M | 554.589 | 1019.831 | 3.142 | 385,557/1,186,284 | 1,462/1,500 | GPU + RAM | 14,590 MiB offloaded | 3.950 | 4.463 | 3/3 | OFFLOAD_RETRY_DONE (OK) / NGL 19 / KV q8_0 |
| Model | TTFT (s) | Duration (s) | Tokens/s | Input (Tokens/Characters) | Output Tokens (Total/Limit) | Offload Mode | VRAM/Memory Used | Warm Avg TTFT (s) | Warm Avg Tokens/s | Warm Followups | Status |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen3.5-0.8B-Q4_K_M | 72.553 | 73.141 | 148.013 | 392,508/1,207,925 | 87/1,500 | GPU | 6,938 MiB | 0/3 | OK / KV q8_0 | ||
| Qwen3.5-0.8B-bf16 | 72.523 | 83.914 | 131.688 | 392,508/1,207,925 | 1,500/1,500 | GPU | 7,883 MiB | 0.908 | 130.917 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-Q4_K_M | 76.160 | 87.390 | 133.566 | 392,508/1,207,925 | 1,500/1,500 | GPU | 7,653 MiB | 0.889 | 129.413 | 3/3 | FAIL (exit=0) BAD_OUTPUT / KV q8_0 |
| Qwen3.5-2B-bf16 | 78.707 | 85.010 | 97.731 | 392,508/1,207,925 | 616/1,500 | GPU | 10,038 MiB | 0.769 | 91.179 | 3/3 | OK / KV q8_0 |
| Qwen3.5-4B-Q4_K_M | 131.646 | 148.772 | 62.709 | 392,508/1,207,925 | 1,074/1,500 | GPU | 14,035 MiB | 0/3 | OK / KV q8_0 | ||
| Qwen3.5-4B-bf16 | 137.578 | 170.110 | 46.108 | 392,507/1,207,925 | 1,500/1,500 | GPU | 19,459 MiB | 1.236 | 62.996 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-Q4_K_M | 143.699 | 163.881 | 54.158 | 392,508/1,207,925 | 1,093/1,500 | GPU | 16,306 MiB | 1.175 | 59.408 | 3/3 | OK / KV q8_0 |
| Qwen3.5-9B-bf16 | 357.536 | 724.675 | 1.803 | 392,507/1,207,925 | 662/1,500 | GPU + RAM | 18,088 MiB offloaded | 2.776 | 2.094 | 3/3 | OFFLOAD_RETRY_DONE (OK) / NGL 21 / KV q4_0 |
| Qwen3.5-27B-Q4_K_M | 883.476 | 3671.606 | 0.419 | 392,507/1,207,925 | 1,167/1,500 | GPU + RAM | 19,464 MiB offloaded | 8.437 | 0.702 | 3/3 | OFFLOAD_RETRY_DONE (OK) / NGL 25 / KV q8_0 |
| Qwen3.5-35B-A3B-Q4_K_M | 599.124 | 1189.472 | 2.541 | 392,507/1,207,925 | 1,500/1,500 | GPU + RAM | 16,043 MiB offloaded | 4.204 | 4.150 | 3/3 | OFFLOAD_RETRY_DONE (OK) / NGL 17 / KV q8_0 |
| Qwen3.5-35B-A3B-Q4_K_M (Proxy) | 554.589 | 1019.831 | 3.142 | 385,557/1,186,284 | 1,462/1,500 | GPU + RAM | 14,590 MiB offloaded | 3.950 | 4.463 | 3/3 | PROXY from 393216 / OFFLOAD_RETRY_DONE (OK) / NGL 19 / KV q8_0 |