Philosophy
Why igllama exists and the principles that guide its development.
Built on the Shoulders of Giants 🤝
igllama stands on the foundation laid by incredible open-source projects and communities:
- llama.cpp by Georgi Gerganov - the foundational C++ library for efficient local LLM inference
- The GGML team for the tensor library that powers llama.cpp
- The open-source AI community for developing and sharing model weights in GGUF format
Why Zig? 💻
We chose Zig as the implementation language for critical technical reasons:
No Garbage Collection
Zig provides manual memory management without hidden allocator calls or GC pauses. For a CLI tool that loads multi-gigabyte models and generates tokens in tight loops, predictable memory behavior is essential. Every allocation is explicit, making it easier to reason about memory usage and avoid leaks.
First-Class C Interop
llama.cpp is written in C++, but exposes a C API. Zig can directly @cImport C headers without writing any bindings by hand. This means:
- No FFI overhead
- Automatic type translation
- Compile-time verification that our calls match the C API
Build System Integration
Zig’s built-in build system can compile C/C++ code with the same toolchain. We compile llama.cpp directly into igllama without needing CMake, Make, or external dependencies. A single zig build produces a statically-linked binary.
Cross-Compilation
Zig can cross-compile to any target from any host. This makes it straightforward to produce binaries for Linux, macOS, and Windows from a single development machine.
Safety Without Runtime Cost
Zig catches many classes of bugs at compile time (undefined behavior, integer overflow in debug builds) while generating code as fast as C. There’s no hidden runtime - what you write is what runs.
Why GGUF? 🗃️
igllama exclusively supports the GGUF (Georgi Gerganov Unified Format) model format: