Benchmark Showcase
Real-World Performance đ
igllama is optimized for high-performance inference on CPU-only systems. By leveraging Zigâs efficiency and llama.cppâs foundational power, we deliver a responsive experience even without a dedicated GPU.
Our showcase features in-depth case studies and benchmarks for the latest models in the ecosystem.
Qwen 3.5 Small Series Benchmarks
On March 2, 2026, Alibaba released the Qwen 3.5 Small Model Series (0.8B, 2B, 4B, 9B). Weâve benchmarked the entire family to find the âsweet spotâ for reasoning-heavy tasks on mid-range hardware.
Highlights:
- 0.8B: 23.01 tok/s â Ultra-fast edge AI.
- 4B: 8.48 tok/s â The âsweet spotâ for agentic reasoning.
- 9B: 6.45 tok/s â Maximum intelligence for consumer CPUs.
Qwen 3.5 35B MoE Showcase
A deep dive into running the 35B Mixture-of-Experts (MoE) model on a 16-core CPU server. This showcase details the thread count optimizations that led to a 52% performance increase.
Highlights:
- 5.56 tok/s on 16-core AMD EPYC-Rome.
- 8 threads identified as the memory-bandwidth optimum.
- Detailed analysis of memory channel alignment.
Performance Principles
All our benchmarks are conducted using the same set of core principles:
- CPU-Only: No GPU acceleration used.
- Standard Quants: Using Q4_K_XL or Q6_K GGUFs.
- Reproducible: Full CLI commands and flags provided.
- Transparent: Detailed hardware specs for every run.