Bring LLMs to Mobile,
Simplify Complex Optimization

Run large language models on iOS and Android devices. Manage the entire process from quantization to deployment in one unified platform.

2+
Supported Platforms
84%
Max Size Reduction
98%+
Quality Retention
Dashboard
Models
Quantization
Analysis
Settings
2.4GB
Original
620MB
Optimized
98.5%
Quality

Why is On-Device AI So Difficult?

Challenges to overcome when running LLMs on mobile devices

01

Complex Quantization Pipeline

GPTQ, QAT, Mixed Precision... Each model requires different configurations

02

Platform-Specific Workflows

iOS and Android each require different conversion tools and workflows

03

Quality vs Size Trade-off

Reducing model size degrades accuracy, making it hard to find the optimal point

04

Repetitive Manual Work

Every configuration change requires running long commands in the terminal

GenKit Solves All the Complexity

Handle all optimization tasks in one unified platform

Unified Platform

Manage Qualcomm (Android) and Apple ANE (iOS) conversions from a single dashboard. One-click setup to monitoring.

Auto Optimization

Just enter your target model size and Automatic Mixed Precision finds the optimal quantization settings. No more manual layer-by-layer bitwidth adjustments.

Real-time Analysis

Visualize quantization error (MSE), per-layer memory usage, and GPTQ errors to intuitively understand model quality.

Multi-Platform, One Workflow

Unified conversion pipeline supporting both iOS and Android

Qualcomm

Snapdragon 8 Gen 2/3 | Android

  • AIMET Quantization
  • QNN SDK Integration
  • Mixed Precision Support
  • DLC Model Generation

Apple Neural Engine

iPhone / iPad / Mac

  • CoreML Conversion
  • GPTQ / LUT Quantization
  • Mixed Precision Support
  • mlpackage Generation

Supported Model Architectures

LLaMA 3.1 / 3.2
EXAONE 3.5 / 4.0
Qwen 2.5
Transformer LLM

Powerful Features for Developers

Tools supporting every stage of model optimization

Dashboard

Intuitive Dashboard

From model management to experiment comparison, all in one place

  • - Model list and status management
  • - Experiment result comparison
  • - Real-time conversion log streaming
[Dashboard Screenshot]
Auto

Automatic Mixed Precision

Automatically find optimal quantization settings based on target size

  • - SeqMSE-based layer analysis
  • - Error-rate based bitwidth allocation
  • - Target size-based auto optimization
[Auto Quantization Screenshot]
Custom

Custom Quantization

Fine-grained bitwidth control for each layer

  • - INT4 / INT8 / INT16 selection
  • - Individual Embedding, Attention, MLP settings
  • - Batch apply by layer groups
[Custom Tab Screenshot]
Analysis

Quantization Quality Analysis

Visualize model quality to derive optimal settings

  • - GPTQ error visualization
  • - SeqMSE heatmap
  • - PRE vs AUTO comparison analysis
[Analysis Charts Screenshot]

Mobile LLM in 4 Steps

Complex optimization process made simple

1

Select Model

Enter HuggingFace model ID or local path

2

Configure Platform

Set target platform, device, and context length

3

Run Quantization

Optimize with Automatic or Custom mode

4

Deploy

Deploy DLC/mlpackage model to device

Proven by Real Performance

Verified optimization results across various models

Model Original Size Optimized Size Reduction Quality Retained
LLaMA 3.2 1B 2.4 GB 620 MB 74% 98.5%
EXAONE 4.0 5.8 GB 950 MB 84% 97.8%
Qwen 2.5 3B 6.2 GB 1.2 GB 81% 98.2%

Proven Technology Stack

Stable and scalable architecture

Frontend Layer

Dashboard

React + TypeScript

REST API

FastAPI

CLI Tool

Python

Quantization Engine

Qualcomm Pipeline

AIMET, QNN SDK, SeqMSE

Apple ANE Pipeline

CoreML, GPTQ, LUT Quantizer

Backend Layer

HuggingFace

Transformers

PyTorch

Model Loading

ONNX

Runtime

Get Started Today

Complex mobile LLM optimization made simple with GenKit

# Quick Start
git clone https://github.com/your-org/genkit
cd genkit && ./start_genkit.sh