Philosophy
Why igllama exists and the principles that guide its development.
Built on the Shoulders of Giants š¤
igllama stands on the foundation laid by incredible open-source projects and communities:
- llama.cpp by Georgi Gerganov - the foundational C++ library for efficient local LLM inference
- The GGML team for the tensor library that powers llama.cpp
- The open-source AI community for developing and sharing model weights in GGUF format
Why Zig? š»
We chose Zig as the implementation language for critical technical reasons:
No Garbage Collection
Zig provides manual memory management without hidden allocator calls or GC pauses. For a CLI tool that loads multi-gigabyte models and generates tokens in tight loops, predictable memory behavior is essential. Every allocation is explicit, making it easier to reason about memory usage and avoid leaks.
First-Class C Interop
llama.cpp is written in C++, but exposes a C API. Zig can directly @cImport C headers without writing any bindings by hand. This means:
- No FFI overhead
- Automatic type translation
- Compile-time verification that our calls match the C API
Build System Integration
Zigās built-in build system can compile C/C++ code with the same toolchain. We compile llama.cpp directly into igllama without needing CMake, Make, or external dependencies. A single zig build produces a statically-linked binary.
Cross-Compilation
Zig can cross-compile to any target from any host. This makes it straightforward to produce binaries for Linux, macOS, and Windows from a single development machine.
Safety Without Runtime Cost
Zig catches many classes of bugs at compile time (undefined behavior, integer overflow in debug builds) while generating code as fast as C. Thereās no hidden runtime - what you write is what runs.
Why GGUF? šļø
igllama exclusively supports the GGUF (Georgi Gerganov Unified Format) model format:
Transparency
GGUF is an open, documented format. Model files are self-describing containers that include:
- Model architecture metadata
- Tokenizer configuration
- Quantization type
- All tensor data
You can inspect any GGUF file to understand exactly what model youāre running. There are no proprietary blobs or phone-home calls.
Ecosystem Compatibility
GGUF is the standard format for the llama.cpp ecosystem. Models on Hugging Face, community quantizations, and open-source releases all use GGUF. This means igllama works with thousands of existing models out of the box.
Quantization Options
GGUF supports a wide range of quantization levels (Q2_K through Q8_0, plus IQ variants). Users can choose the trade-off between model size, memory usage, and quality that works for their hardware.
Single-File Distribution
A GGUF file contains everything needed to run inference. No separate tokenizer files, no config JSONs, no additional downloads. One file, one model.
Local-First Design š
igllama is designed for running LLMs locally, with no dependency on external services:
Privacy
Your prompts and responses never leave your machine. Thereās no telemetry, no API calls to external servers, no data collection of any kind.
Offline Operation
Once youāve downloaded a model, igllama works completely offline. No internet connection required.
Reproducibility
With a fixed model file and seed, igllama produces deterministic outputs. This is important for testing, debugging, and research.
No API Keys
No accounts, no subscriptions, no rate limits. Just download a model and start chatting.
Design Philosophy š§
Simplicity
igllama aims to be the ācurlā of local LLM inference: a single binary with sensible defaults that just works. Complex features are optional; the basic use case should be trivial.
Transparency
Every operation is visible. Model loading logs show whatās happening. The --verbose flag provides detailed information. Thereās no magic.
Unix Philosophy
igllama is designed to work well in pipelines:
- Piped input is automatically detected
- JSON output mode for scripting
- Single-shot mode with
--promptflag - Streaming output by default
Minimal Dependencies
The release binary is statically linked with no runtime dependencies. It should run on any matching OS/architecture combination without additional setup.
Comparison to Ollama
Ollama is an excellent project that inspired igllama. The key differences:
| Aspect | Ollama | igllama |
|---|---|---|
| Architecture | Client-server | Single binary |
| Language | Go | Zig |
| Model Format | Modelfile abstraction | Direct GGUF |
| Background Service | Required | Not needed |
| Container Support | Docker-first | No container needed |
| Model Discovery | Ollama registry | Hugging Face Hub |
igllama is for users who want a single CLI tool without background services, direct control over GGUF files, integration with the Hugging Face ecosystem, and minimal footprint. Both tools use llama.cpp under the hood and produce similar inference results.