Skip to content

Instantly share code, notes, and snippets.

@smarvr
Created March 10, 2026 04:56
Show Gist options
  • Select an option

  • Save smarvr/287292d82726449ea624a4c8e25c24d9 to your computer and use it in GitHub Desktop.

Select an option

Save smarvr/287292d82726449ea624a4c8e25c24d9 to your computer and use it in GitHub Desktop.
Qwen3.5 Models (0.8B, 2B, 4B, 9B, 27B, 35B A3B) up to 400k Context (TTFT, Tok/s, Warmup/Reply) on a 4090

Transparently; I think i messed something up when offloading - will try again in the future.

Note, side scroll to see other columns

2048

Model TTFT (s) Duration (s) Tokens/s Input (Tokens/Characters) Output Tokens (Total/Limit) Offload Mode VRAM/Memory Used Warm Avg TTFT (s) Warm Avg Tokens/s Warm Followups Status
Qwen3.5-0.8B-Q4_K_M 0.044 0.619 375.926 232/610 216/1,500 GPU 2,319 MiB 0.041 437.758 3/3 OK / KV q8_0
Qwen3.5-0.8B-bf16 0.028 0.849 272.880 232/610 224/1,500 GPU 3,254 MiB 0.040 323.571 3/3 OK / KV q8_0
Qwen3.5-2B-Q4_K_M 0.033 1.069 280.921 232/610 291/1,500 GPU 3,027 MiB 0.051 326.293 3/3 OK / KV q8_0
Qwen3.5-2B-bf16 0.031 1.063 158.917 232/610 164/1,500 GPU 5,432 MiB 0.050 191.647 3/3 OK / KV q8_0
Qwen3.5-4B-Q4_K_M 0.050 3.358 158.979 232/610 526/1,500 GPU 4,492 MiB 0.105 186.265 3/3 OK / KV q8_0
Qwen3.5-4B-bf16 0.156 2.105 78.494 231/610 153/1,500 GPU 9,924 MiB 0.107 127.068 3/3 OK / KV q8_0
Qwen3.5-9B-Q4_K_M 0.054 4.215 115.116 232/610 479/1,500 GPU 6,757 MiB 0.119 146.893 3/3 OK / KV q8_0
Qwen3.5-9B-bf16 0.181 14.084 48.551 231/610 675/1,500 GPU 17,234 MiB 0.169 60.955 3/3 OK / KV q8_0
Qwen3.5-27B-Q4_K_M 0.227 10.574 40.782 231/610 422/1,500 GPU 17,364 MiB 0.449 75.381 2/3 OK / KV q8_0
Qwen3.5-35B-A3B-Q4_K_M 0.120 4.465 150.747 231/610 655/1,500 GPU 22,029 MiB 0.208 245.378 3/3 OK / KV q8_0

4096

Model TTFT (s) Duration (s) Tokens/s Input (Tokens/Characters) Output Tokens (Total/Limit) Offload Mode VRAM/Memory Used Warm Avg TTFT (s) Warm Avg Tokens/s Warm Followups Status
Qwen3.5-0.8B-Q4_K_M 0.125 1.275 385.367 2,134/6,789 443/1,500 GPU 2,553 MiB 0.056 435.469 3/3 OK / KV q8_0
Qwen3.5-0.8B-bf16 0.122 1.332 282.593 2,134/6,789 342/1,500 GPU 3,476 MiB 0.055 320.716 3/3 OK / KV q8_0
Qwen3.5-2B-Q4_K_M 0.137 2.651 294.707 2,134/6,789 741/1,500 GPU 3,260 MiB 0/3 FAIL (exit=0) BAD_OUTPUT / KV q8_0
Qwen3.5-2B-bf16 0.346 2.735 167.438 2,134/6,789 400/1,500 GPU 5,651 MiB 0.074 199.315 3/3 OK / KV q8_0
Qwen3.5-4B-Q4_K_M 0.249 4.682 166.677 2,134/6,789 739/1,500 GPU 4,741 MiB 0.126 191.268 3/3 OK / KV q8_0
Qwen3.5-4B-bf16 0.368 4.462 83.772 2,133/6,789 343/1,500 GPU 10,159 MiB 0.132 129.516 3/3 OK / KV q8_0
Qwen3.5-9B-Q4_K_M 0.296 5.895 116.629 2,134/6,789 653/1,500 GPU 7,019 MiB 0.136 144.258 3/3 OK / KV q8_0
Qwen3.5-9B-bf16 0.479 9.694 50.351 2,133/6,789 464/1,500 GPU 17,301 MiB 0.178 82.479 3/3 OK / KV q8_0
Qwen3.5-27B-Q4_K_M 0.930 18.381 42.002 2,133/6,789 733/1,500 GPU 17,437 MiB 0.464 63.299 3/3 OK / KV q8_0
Qwen3.5-35B-A3B-Q4_K_M 0.423 7.448 152.038 2,133/6,789 1,068/1,500 GPU 22,063 MiB 0.235 251.778 3/3 OK / KV q8_0

8192

Model TTFT (s) Duration (s) Tokens/s Input (Tokens/Characters) Output Tokens (Total/Limit) Offload Mode VRAM/Memory Used Warm Avg TTFT (s) Warm Avg Tokens/s Warm Followups Status
Qwen3.5-0.8B-Q4_K_M 0.287 1.469 384.035 6,401/19,289 454/1,500 GPU 2,595 MiB 0.063 431.217 3/3 OK / KV q8_0
Qwen3.5-0.8B-bf16 0.289 1.623 284.878 6,401/19,289 380/1,500 GPU 3,535 MiB 0.063 328.010 3/3 OK / KV q8_0
Qwen3.5-2B-Q4_K_M 0.345 1.534 292.663 6,401/19,289 348/1,500 GPU 3,353 MiB 0.071 331.428 3/3 OK / KV q8_0
Qwen3.5-2B-bf16 0.362 3.800 164.005 6,401/19,289 564/1,500 GPU 5,736 MiB 0/3 OK / KV q8_0
Qwen3.5-4B-Q4_K_M 0.684 6.697 163.980 6,401/19,289 986/1,500 GPU 4,853 MiB 0.138 193.457 3/3 OK / KV q8_0
Qwen3.5-4B-bf16 0.869 6.472 83.538 6,400/19,289 468/1,500 GPU 10,311 MiB 0.146 126.097 3/3 OK / KV q8_0
Qwen3.5-9B-Q4_K_M 0.831 7.765 115.809 6,401/19,289 803/1,500 GPU 7,172 MiB 0.162 142.116 3/3 OK / KV q8_0
Qwen3.5-9B-bf16 1.134 12.418 49.010 6,400/19,289 553/1,500 GPU 17,185 MiB 0.196 56.686 3/3 OK / KV q8_0
Qwen3.5-27B-Q4_K_M 2.834 32.264 41.760 6,400/19,289 1,229/1,500 GPU 17,361 MiB 0.558 55.781 3/3 OK / KV q8_0
Qwen3.5-35B-A3B-Q4_K_M 1.020 10.347 152.244 6,400/19,289 1,420/1,500 GPU 21,889 MiB 0.253 200.981 3/3 OK / KV q8_0

32768

Model TTFT (s) Duration (s) Tokens/s Input (Tokens/Characters) Output Tokens (Total/Limit) Offload Mode VRAM/Memory Used Warm Avg TTFT (s) Warm Avg Tokens/s Warm Followups Status
Qwen3.5-0.8B-Q4_K_M 1.404 3.313 354.508 30,145/87,387 677/1,500 GPU 2,573 MiB 0.107 376.167 3/3 OK / KV q8_0
Qwen3.5-0.8B-bf16 1.419 3.328 267.675 30,145/87,387 511/1,500 GPU 3,515 MiB 0.106 301.668 3/3 OK / KV q8_0
Qwen3.5-2B-Q4_K_M 1.663 4.279 275.162 30,145/87,387 720/1,500 GPU 3,279 MiB 0/3 OK / KV q8_0
Qwen3.5-2B-bf16 1.802 5.468 162.553 30,145/87,387 596/1,500 GPU 5,673 MiB 0.114 193.866 3/3 OK / KV q8_0
Qwen3.5-4B-Q4_K_M 3.267 8.212 150.871 30,145/87,387 746/1,500 GPU 5,125 MiB 0.189 175.456 3/3 OK / KV q8_0
Qwen3.5-4B-bf16 3.785 9.778 81.090 30,144/87,387 486/1,500 GPU 10,543 MiB 0.205 122.046 3/3 OK / KV q8_0
Qwen3.5-9B-Q4_K_M 4.123 13.142 109.556 30,145/87,387 988/1,500 GPU 7,403 MiB 0.220 131.940 3/3 OK / KV q8_0
Qwen3.5-9B-bf16 5.200 21.924 49.569 30,144/87,387 829/1,500 GPU 17,681 MiB 0.274 77.373 3/3 OK / KV q8_0
Qwen3.5-27B-Q4_K_M 12.706 46.262 39.665 30,144/87,387 1,331/1,500 GPU 18,265 MiB 0.687 60.307 3/3 OK / KV q8_0
Qwen3.5-35B-A3B-Q4_K_M 5.274 15.705 135.841 30,144/87,387 1,417/1,500 GPU 22,185 MiB 0.329 212.903 3/3 OK / KV q8_0

65536

Model TTFT (s) Duration (s) Tokens/s Input (Tokens/Characters) Output Tokens (Total/Limit) Offload Mode VRAM/Memory Used Warm Avg TTFT (s) Warm Avg Tokens/s Warm Followups Status
Qwen3.5-0.8B-Q4_K_M 3.475 4.115 317.593 60,968/175,242 203/1,500 GPU 2,835 MiB 0.151 336.910 3/3 OK / KV q8_0
Qwen3.5-0.8B-bf16 3.541 5.197 246.881 60,968/175,242 409/1,500 GPU 3,775 MiB 0.152 271.347 3/3 OK / KV q8_0
Qwen3.5-2B-Q4_K_M 4.034 5.557 253.523 60,968/175,242 386/1,500 GPU 3,542 MiB 0.168 252.156 3/3 OK / KV q8_0
Qwen3.5-2B-bf16 4.344 10.720 154.475 60,968/175,242 985/1,500 GPU 5,934 MiB 0.181 178.903 3/3 OK / KV q8_0
Qwen3.5-4B-Q4_K_M 7.717 15.461 135.081 60,968/175,242 1,046/1,500 GPU 5,787 MiB 0.280 154.809 3/3 OK / KV q8_0
Qwen3.5-4B-bf16 8.704 15.925 76.309 60,967/175,242 551/1,500 GPU 11,205 MiB 0.281 112.744 3/3 OK / KV q8_0
Qwen3.5-9B-Q4_K_M 9.496 18.296 101.139 60,968/175,242 890/1,500 GPU 8,050 MiB 0.294 119.245 3/3 OK / KV q8_0
Qwen3.5-9B-bf16 11.543 32.649 47.759 60,967/175,242 1,008/1,500 GPU 18,311 MiB 0.364 62.923 3/3 OK / KV q8_0
Qwen3.5-27B-Q4_K_M 28.878 57.836 36.708 60,967/175,242 1,063/1,500 GPU 19,455 MiB 0.793 55.760 3/3 OK / KV q8_0
Qwen3.5-35B-A3B-Q4_K_M 11.014 20.892 124.218 60,967/175,242 1,227/1,500 GPU 22,563 MiB 0.398 187.822 3/3 OK / KV q8_0

98304

Model TTFT (s) Duration (s) Tokens/s Input (Tokens/Characters) Output Tokens (Total/Limit) Offload Mode VRAM/Memory Used Warm Avg TTFT (s) Warm Avg Tokens/s Warm Followups Status
Qwen3.5-0.8B-Q4_K_M 6.463 8.627 285.162 93,687/285,023 617/1,500 GPU 3,083 MiB 0.229 290.598 3/3 OK / KV q8_0
Qwen3.5-0.8B-bf16 6.523 11.335 226.945 93,687/285,023 1,092/1,500 GPU 4,025 MiB 0.248 242.517 3/3 OK / KV q8_0
Qwen3.5-2B-Q4_K_M 7.276 11.731 232.081 93,687/285,023 1,034/1,500 GPU 3,791 MiB 0.254 243.791 3/3 OK / KV q8_0
Qwen3.5-2B-bf16 7.777 10.211 146.664 93,687/285,023 357/1,500 GPU 6,181 MiB 0.240 147.893 3/3 OK / KV q8_0
Qwen3.5-4B-Q4_K_M 13.694 21.893 121.471 93,687/285,023 996/1,500 GPU 6,438 MiB 0.367 136.295 3/3 OK / KV q8_0
Qwen3.5-4B-bf16 15.102 21.445 71.741 93,686/285,023 455/1,500 GPU 11,855 MiB 0.380 75.546 3/3 OK / KV q8_0
Qwen3.5-9B-Q4_K_M 16.474 28.608 93.205 93,687/285,023 1,131/1,500 GPU 8,701 MiB 0.403 108.561 3/3 OK / KV q8_0
Qwen3.5-9B-bf16 19.836 45.383 45.094 93,686/285,023 1,152/1,500 GPU 18,981 MiB 0.466 70.104 3/3 OK / KV q8_0
Qwen3.5-27B-Q4_K_M 50.307 87.361 33.384 93,686/285,023 1,237/1,500 GPU 20,667 MiB 1.012 51.319 3/3 OK / KV q8_0
Qwen3.5-35B-A3B-Q4_K_M 18.947 30.146 111.705 93,686/285,023 1,251/1,500 GPU 22,965 MiB 0.512 164.361 3/3 OK / KV q8_0

131072

Model TTFT (s) Duration (s) Tokens/s Input (Tokens/Characters) Output Tokens (Total/Limit) Offload Mode VRAM/Memory Used Warm Avg TTFT (s) Warm Avg Tokens/s Warm Followups Status
Qwen3.5-0.8B-Q4_K_M 10.240 14.367 259.988 126,355/386,704 1,073/1,500 GPU 3,349 MiB 0.312 260.293 3/3 OK / KV q8_0
Qwen3.5-0.8B-bf16 10.329 11.601 210.755 126,355/386,704 268/1,500 GPU 4,287 MiB 0/3 OK / KV q8_0
Qwen3.5-2B-Q4_K_M 11.368 13.575 215.199 126,355/386,704 475/1,500 GPU 4,055 MiB 0.295 213.083 3/3 OK / KV q8_0
Qwen3.5-2B-bf16 12.101 14.526 139.817 126,355/386,704 339/1,500 GPU 6,447 MiB 0.286 157.101 3/3 OK / KV q8_0
Qwen3.5-4B-Q4_K_M 21.021 31.203 110.390 126,355/386,704 1,124/1,500 GPU 7,109 MiB 0.458 122.940 3/3 OK / KV q8_0
Qwen3.5-4B-bf16 22.926 36.506 67.603 126,354/386,704 918/1,500 GPU 12,527 MiB 0.441 98.674 3/3 OK / KV q8_0
Qwen3.5-9B-Q4_K_M 24.776 34.712 86.548 126,355/386,704 860/1,500 GPU 9,369 MiB 0.453 99.789 3/3 OK / KV q8_0
Qwen3.5-9B-bf16 28.795 52.389 44.164 126,354/386,704 1,042/1,500 GPU 19,654 MiB 0.538 49.494 3/3 OK / KV q8_0
Qwen3.5-27B-Q4_K_M 74.408 118.242 31.642 126,354/386,704 1,387/1,500 GPU 21,886 MiB 1.184 47.411 3/3 OK / KV q8_0
Qwen3.5-35B-A3B-Q4_K_M 28.394 42.485 100.920 126,354/386,704 1,422/1,500 GPU 23,371 MiB 0.599 147.896 3/3 OK / KV q8_0

196608

Model TTFT (s) Duration (s) Tokens/s Input (Tokens/Characters) Output Tokens (Total/Limit) Offload Mode VRAM/Memory Used Warm Avg TTFT (s) Warm Avg Tokens/s Warm Followups Status
Qwen3.5-0.8B-Q4_K_M 20.429 23.736 220.699 192,017/618,857 730/1,500 GPU 4,097 MiB 0.432 212.364 3/3 OK / KV q8_0
Qwen3.5-0.8B-bf16 20.583 22.234 182.956 192,017/618,857 302/1,500 GPU 5,037 MiB 0.411 185.109 3/3 OK / KV q8_0
Qwen3.5-2B-Q4_K_M 22.134 30.139 187.386 192,017/618,857 1,500/1,500 GPU 4,809 MiB 0.494 184.858 3/3 OK / KV q8_0
Qwen3.5-2B-bf16 23.189 25.920 127.398 192,017/618,857 348/1,500 GPU 7,199 MiB 0.412 127.149 3/3 OK / KV q8_0
Qwen3.5-4B-Q4_K_M 40.483 54.738 91.125 192,017/618,857 1,299/1,500 GPU 8,696 MiB 0.662 102.090 3/3 OK / KV q8_0
Qwen3.5-4B-bf16 42.742 57.630 60.787 192,016/618,857 905/1,500 GPU 14,116 MiB 0.713 85.640 3/3 OK / KV q8_0
Qwen3.5-9B-Q4_K_M 45.568 62.267 75.576 192,017/618,857 1,262/1,500 GPU 10,965 MiB 0.695 86.143 3/3 OK / KV q8_0
Qwen3.5-9B-bf16 52.225 75.437 40.409 192,016/618,857 938/1,500 GPU 21,245 MiB 0.776 46.366 3/3 OK / KV q8_0
Qwen3.5-27B-Q4_K_M 8.168 192,005/618,857 0/1,500 GPU - 0/3 FAIL (exit=-6) SERVER_BUSY / KV q8_0
Qwen3.5-35B-A3B-Q4_K_M 175.214 365.299 7.823 192,016/618,857 1,487/1,500 GPU + RAM 8,612 MiB offloaded 2.301 13.572 3/3 OFFLOAD_RETRY_DONE (OK) / NGL 27 / KV q8_0

262144

Model TTFT (s) Duration (s) Tokens/s Input (Tokens/Characters) Output Tokens (Total/Limit) Offload Mode VRAM/Memory Used Warm Avg TTFT (s) Warm Avg Tokens/s Warm Followups Status
Qwen3.5-0.8B-Q4_K_M 33.036 37.061 191.767 254,584/835,742 772/1,500 GPU 5,011 MiB 0.589 184.410 3/3 OK / KV q8_0
Qwen3.5-0.8B-bf16 33.317 35.175 163.678 254,584/835,742 304/1,500 GPU 5,954 MiB 0.525 164.390 3/3 OK / KV q8_0
Qwen3.5-2B-Q4_K_M 35.391 44.412 166.275 254,584/835,742 1,500/1,500 GPU 5,725 MiB 0.654 165.825 3/3 OK / KV q8_0
Qwen3.5-2B-bf16 36.821 41.295 117.123 254,584/835,742 524/1,500 GPU 8,115 MiB 0.578 117.334 3/3 OK / KV q8_0
Qwen3.5-4B-Q4_K_M 62.845 70.341 81.110 254,584/835,742 608/1,500 GPU 10,411 MiB 0.791 88.832 3/3 OK / KV q8_0
Qwen3.5-4B-bf16 66.620 82.779 55.327 254,583/835,742 894/1,500 GPU 15,833 MiB 0.830 77.136 3/3 OK / KV q8_0
Qwen3.5-9B-Q4_K_M 71.074 91.795 66.021 254,584/835,742 1,368/1,500 GPU 12,683 MiB 0.884 90.038 3/3 OK / KV q8_0
Qwen3.5-9B-bf16 78.672 90.815 38.704 254,583/835,742 470/1,500 GPU 22,961 MiB 0.780 43.060 3/3 OK / KV q8_0
Qwen3.5-27B-Q4_K_M 426.260 1691.617 0.735 254,583/835,742 930/1,500 GPU + RAM 13,240 MiB offloaded 5.620 1.203 3/3 OFFLOAD_RETRY_DONE (OK) / NGL 33 / KV q8_0
Qwen3.5-35B-A3B-Q4_K_M 266.218 508.060 5.615 254,583/835,742 1,358/1,500 GPU + RAM 10,253 MiB offloaded 2.762 9.352 3/3 OFFLOAD_RETRY_DONE (OK) / NGL 25 / KV q8_0

327680

Model TTFT (s) Duration (s) Tokens/s Input (Tokens/Characters) Output Tokens (Total/Limit) Offload Mode VRAM/Memory Used Warm Avg TTFT (s) Warm Avg Tokens/s Warm Followups Status
Qwen3.5-0.8B-Q4_K_M 48.405 57.156 171.408 314,931/1,010,583 1,500/1,500 GPU 5,924 MiB 0.736 164.170 3/3 OK / KV q8_0
Qwen3.5-0.8B-bf16 49.344 59.444 148.505 314,931/1,010,583 1,500/1,500 GPU 6,863 MiB 0.602 142.982 3/3 OK / KV q8_0
Qwen3.5-2B-Q4_K_M 50.920 60.899 150.329 314,931/1,010,583 1,500/1,500 GPU 6,635 MiB 0.666 146.074 3/3 OK / KV q8_0
Qwen3.5-2B-bf16 52.907 66.671 108.981 314,931/1,010,583 1,500/1,500 GPU 9,027 MiB 0.760 108.819 3/3 OK / KV q8_0
Qwen3.5-4B-Q4_K_M 89.757 109.572 71.916 314,931/1,010,583 1,425/1,500 GPU 12,123 MiB 0.999 78.016 3/3 OK / KV q8_0
Qwen3.5-4B-bf16 94.385 100.718 51.156 314,930/1,010,583 324/1,500 GPU 17,537 MiB 0/3 OK / KV q8_0
Qwen3.5-9B-Q4_K_M 99.190 117.762 61.006 314,931/1,010,583 1,133/1,500 GPU 14,391 MiB 0.996 67.923 3/3 OK / KV q8_0
Qwen3.5-9B-bf16 186.548 245.591 4.691 314,930/1,010,583 277/1,500 GPU + RAM 16,404 MiB offloaded 1.574 5.390 3/3 OFFLOAD_RETRY_DONE (OK) / NGL 27 / KV q4_0
Qwen3.5-27B-Q4_K_M 557.792 2856.256 0.653 314,930/1,010,583 1,500/1,500 GPU + RAM 12,454 MiB offloaded 5.949 0.981 3/3 OFFLOAD_RETRY_DONE (OK) / NGL 31 / KV q4_0
Qwen3.5-35B-A3B-Q4_K_M 371.220 702.425 4.529 314,930/1,010,583 1,500/1,500 GPU + RAM 11,763 MiB offloaded 3.148 7.560 3/3 OFFLOAD_RETRY_DONE (OK) / NGL 23 / KV q8_0

360448

Model TTFT (s) Duration (s) Tokens/s Input (Tokens/Characters) Output Tokens (Total/Limit) Offload Mode VRAM/Memory Used Warm Avg TTFT (s) Warm Avg Tokens/s Warm Followups Status
Qwen3.5-0.8B-Q4_K_M 59.770 61.871 158.940 352,956/1,098,327 334/1,500 GPU 6,393 MiB 0.721 159.362 3/3 OK / KV q8_0
Qwen3.5-0.8B-bf16 59.358 70.084 139.852 352,956/1,098,327 1,500/1,500 GPU 7,335 MiB 0.808 137.787 3/3 OK / KV q8_0
Qwen3.5-2B-Q4_K_M 62.274 64.357 141.599 352,956/1,098,327 295/1,500 GPU 7,105 MiB 0.705 138.169 3/3 OK / KV q8_0
Qwen3.5-2B-bf16 64.798 71.382 104.340 352,956/1,098,327 687/1,500 GPU 9,496 MiB 0.736 107.240 3/3 OK / KV q8_0
Qwen3.5-4B-Q4_K_M 109.352 122.623 67.216 352,956/1,098,327 892/1,500 GPU 13,001 MiB 0/3 OK / KV q8_0
Qwen3.5-4B-bf16 114.468 125.626 48.576 352,955/1,098,327 542/1,500 GPU 18,423 MiB 0.953 66.584 3/3 OK / KV q8_0
Qwen3.5-9B-Q4_K_M 120.036 134.366 57.569 352,956/1,098,327 825/1,500 GPU 15,273 MiB 1.052 63.500 3/3 OK / KV q8_0
Qwen3.5-9B-bf16 251.061 433.318 2.870 352,955/1,098,327 523/1,500 GPU + RAM 17,143 MiB offloaded 2.130 3.317 3/3 OFFLOAD_RETRY_DONE (OK) / NGL 25 / KV q4_0
Qwen3.5-27B-Q4_K_M 717.573 3707.420 0.502 352,955/1,098,327 1,500/1,500 GPU + RAM 16,871 MiB offloaded 7.432 0.841 3/3 OFFLOAD_RETRY_DONE (OK) / NGL 29 / KV q8_0
Qwen3.5-35B-A3B-Q4_K_M 462.873 766.514 3.287 352,955/1,098,327 998/1,500 GPU + RAM 13,376 MiB offloaded 3.770 5.744 3/3 OFFLOAD_RETRY_DONE (OK) / NGL 21 / KV q8_0

393216

Model TTFT (s) Duration (s) Tokens/s Input (Tokens/Characters) Output Tokens (Total/Limit) Offload Mode VRAM/Memory Used Warm Avg TTFT (s) Warm Avg Tokens/s Warm Followups Status
Qwen3.5-0.8B-Q4_K_M 69.477 70.995 150.896 385,558/1,186,284 229/1,500 GPU 6,853 MiB 0.737 149.442 3/3 OK / KV q8_0
Qwen3.5-0.8B-bf16 70.458 75.042 132.202 385,558/1,186,284 606/1,500 GPU 7,994 MiB 0.770 131.874 3/3 OK / KV q8_0
Qwen3.5-2B-Q4_K_M 73.396 84.485 135.264 385,558/1,186,284 1,500/1,500 GPU 7,764 MiB 0.723 131.521 3/3 OK / KV q8_0
Qwen3.5-2B-bf16 75.604 79.248 100.700 385,558/1,186,284 367/1,500 GPU 10,154 MiB 0.763 100.617 3/3 OK / KV q8_0
Qwen3.5-4B-Q4_K_M 128.518 140.221 63.315 385,558/1,186,284 741/1,500 GPU 13,870 MiB 1.162 67.743 3/3 OK / KV q8_0
Qwen3.5-4B-bf16 135.879 149.339 46.508 385,557/1,186,284 626/1,500 GPU 19,284 MiB 1.097 63.971 3/3 OK / KV q8_0
Qwen3.5-9B-Q4_K_M 139.675 158.286 54.752 385,558/1,186,284 1,019/1,500 GPU 16,136 MiB 1.146 60.322 3/3 OK / KV q8_0
Qwen3.5-9B-bf16 317.738 579.910 2.475 385,557/1,186,284 649/1,500 GPU + RAM 17,604 MiB offloaded 2.539 2.641 3/3 OFFLOAD_RETRY_DONE (OK) / NGL 23 / KV q4_0
Qwen3.5-27B-Q4_K_M 881.275 3815.060 0.464 385,557/1,186,284 1,362/1,500 GPU + RAM 18,063 MiB offloaded 8.179 0.626 3/3 OFFLOAD_RETRY_DONE (OK) / NGL 27 / KV q8_0
Qwen3.5-35B-A3B-Q4_K_M 554.589 1019.831 3.142 385,557/1,186,284 1,462/1,500 GPU + RAM 14,590 MiB offloaded 3.950 4.463 3/3 OFFLOAD_RETRY_DONE (OK) / NGL 19 / KV q8_0

400000

Model TTFT (s) Duration (s) Tokens/s Input (Tokens/Characters) Output Tokens (Total/Limit) Offload Mode VRAM/Memory Used Warm Avg TTFT (s) Warm Avg Tokens/s Warm Followups Status
Qwen3.5-0.8B-Q4_K_M 72.553 73.141 148.013 392,508/1,207,925 87/1,500 GPU 6,938 MiB 0/3 OK / KV q8_0
Qwen3.5-0.8B-bf16 72.523 83.914 131.688 392,508/1,207,925 1,500/1,500 GPU 7,883 MiB 0.908 130.917 3/3 OK / KV q8_0
Qwen3.5-2B-Q4_K_M 76.160 87.390 133.566 392,508/1,207,925 1,500/1,500 GPU 7,653 MiB 0.889 129.413 3/3 FAIL (exit=0) BAD_OUTPUT / KV q8_0
Qwen3.5-2B-bf16 78.707 85.010 97.731 392,508/1,207,925 616/1,500 GPU 10,038 MiB 0.769 91.179 3/3 OK / KV q8_0
Qwen3.5-4B-Q4_K_M 131.646 148.772 62.709 392,508/1,207,925 1,074/1,500 GPU 14,035 MiB 0/3 OK / KV q8_0
Qwen3.5-4B-bf16 137.578 170.110 46.108 392,507/1,207,925 1,500/1,500 GPU 19,459 MiB 1.236 62.996 3/3 OK / KV q8_0
Qwen3.5-9B-Q4_K_M 143.699 163.881 54.158 392,508/1,207,925 1,093/1,500 GPU 16,306 MiB 1.175 59.408 3/3 OK / KV q8_0
Qwen3.5-9B-bf16 357.536 724.675 1.803 392,507/1,207,925 662/1,500 GPU + RAM 18,088 MiB offloaded 2.776 2.094 3/3 OFFLOAD_RETRY_DONE (OK) / NGL 21 / KV q4_0
Qwen3.5-27B-Q4_K_M 883.476 3671.606 0.419 392,507/1,207,925 1,167/1,500 GPU + RAM 19,464 MiB offloaded 8.437 0.702 3/3 OFFLOAD_RETRY_DONE (OK) / NGL 25 / KV q8_0
Qwen3.5-35B-A3B-Q4_K_M 599.124 1189.472 2.541 392,507/1,207,925 1,500/1,500 GPU + RAM 16,043 MiB offloaded 4.204 4.150 3/3 OFFLOAD_RETRY_DONE (OK) / NGL 17 / KV q8_0
Qwen3.5-35B-A3B-Q4_K_M (Proxy) 554.589 1019.831 3.142 385,557/1,186,284 1,462/1,500 GPU + RAM 14,590 MiB offloaded 3.950 4.463 3/3 PROXY from 393216 / OFFLOAD_RETRY_DONE (OK) / NGL 19 / KV q8_0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment