Run large language models on iOS and Android devices. Manage the entire process from quantization to deployment in one unified platform.
Challenges to overcome when running LLMs on mobile devices
GPTQ, QAT, Mixed Precision... Each model requires different configurations
iOS and Android each require different conversion tools and workflows
Reducing model size degrades accuracy, making it hard to find the optimal point
Every configuration change requires running long commands in the terminal
Handle all optimization tasks in one unified platform
Manage Qualcomm (Android) and Apple ANE (iOS) conversions from a single dashboard. One-click setup to monitoring.
Just enter your target model size and Automatic Mixed Precision finds the optimal quantization settings. No more manual layer-by-layer bitwidth adjustments.
Visualize quantization error (MSE), per-layer memory usage, and GPTQ errors to intuitively understand model quality.
Unified conversion pipeline supporting both iOS and Android
Complex optimization process made simple
Enter HuggingFace model ID or local path
Set target platform, device, and context length
Optimize with Automatic or Custom mode
Deploy DLC/mlpackage model to device
Verified optimization results across various models
| Model | Original Size | Optimized | Size Reduction | Quality Retained |
|---|---|---|---|---|
| LLaMA 3.2 1B | 2.4 GB | 620 MB | 74% | 98.5% |
| EXAONE 4.0 | 5.8 GB | 950 MB | 84% | 97.8% |
| Qwen 2.5 3B | 6.2 GB | 1.2 GB | 81% | 98.2% |
Stable and scalable architecture
Complex mobile LLM optimization made simple with GenKit