Global

Building an open-source AI tool that puts LLM power on any device

Industries:

AI, edge computing, open-source

Duration:

1 year

Team:

3 core contributors (2 ML engineers, 1 OSS architect)

Services:

AI architecture, ML engineering,enablement

Technologies:

Docker, ONNX, Python, Pytorch, Tensorflow

Integrations:

GitHub Actions, LLMs, ONNX.js, Rust, Swift

DevOps:

GitHub CI, Flake8, Pytest, Pre-commit

Project Overview

The WhiteLightning project started with a question: Does text classification need the cloud every time?

It began as an internal experiment during our ML hack days, aimed at exploring what could be done with less. LLMs are great, but for most real-world use cases, you don’t need a 175B-parameter model on standby. Instead, you need something fast, portable, and private. Something that works offline, ships inside your app, and doesn’t rack up API bills.

Instead of utilizing LLMs at runtime, we use them once to generate synthetic data. Then, distill that into a compact, ONNX-based model that runs anywhere. No cloud, no lock-in, no friction. Just a simple way to go from idea to working classifier on your terms.

95% cheaper

Uses LLMs once for data generation (~$0.01 vs $1–10 per query)

1 MB model size

Easily fits in mobile apps, kiosks, or embedded firmware

10–15 min training time

Generate a binary classifier on a laptop in minutes

2,520 texts/sec inference speed

That’s 0.38 ms per input on commodity CPUs

512 kB RAM usage

Runs on low-power hardware like Raspberry Pi Zero

8 languages and runtimes supported

Identical logits across Python, Rust, Swift, and more

100% offline-ready

No cloud, no vendor lock-in, no latency risks

All-platform deployment

ONNX.js (web), iOS/Android (mobile), MCUs (embedded), laptops (desktop)

Who it’s for

WhiteLightning is made for builders who don’t want to rent intelligence from the
cloud.

Indie developer

Add sentiment analysis to a desktop app without paying per query or relying on the cloud — it just works offline, out of the box.

Indie developer

Add sentiment analysis to a desktop app without paying per query or relying on the cloud — it just works offline, out of the box.

Indie developer

Add sentiment analysis to a desktop app without paying per query or relying on the cloud — it just works offline, out of the box.

Personal productivity and desktop apps

Smart quick-add (e.g., calendar vs. task vs. reminder)

Auto-tagging notes in Obsidian or Notion

Gmail-style inbox tabs, fully offline

Comms safety and moderation

Console games with offline parental controls

Secure chat platforms enforcing code of conduct

SMS spam filtering on Android ROMs

Healthcare and life sciences

Patient triage kiosks (e.g., refill request vs. symptoms)

Symptom classifiers for medical wearables

Transcription flaggers for allergy or dosage mentions

Customer support and compliance

On-prem ticket routing for banks and hospitals

VoIP transcription classifiers

Contact-center QA inside closed networks

IoT, automotive, and smart devices

Offline voice commands for home automation

In-car NLP for media or navigation

Industrial alarm logs classified by risk level

Retail and eCommerce

Offline voice commands for home automation

In-car NLP for media or navigation

Industrial alarm logs classified by risk level

Developer and DevOps tools

GitHub bots tagging issue types

CI pipelines detecting secrets or tone

IDE extensions nudging for better commit messages

Education

Adaptive e-readers

Captioning systems detecting topic shifts

OEM/Embedded hardware

Router firmware with built-in parental filters

3D printer UI explaining G-code errors

Yes, it runs even on a potato

WhiteLightning was built to work even on extremely low-spec hardware. With models under 1 MB, no runtime dependencies, and ONNX compatibility, it runs smoothly on:

Raspberry Pi Zero

Old laptops

Budget Android phones

In-browser via ONNX.js

Microcontrollers with limited RAM

If it can run Python or Rust, it can run WhiteLightning.

OSS as a collaboration model

WhiteLightning is 100% open-source under GPL-3.0. The classifiers it generates are
MIT-licensed and yours to use in commercial apps.

Our team maintains it publicly:

Case study Deatails

Building an open-source AI tool that puts LLM power on any device

CLI tool in Docker, edge-optimized ONNX output

Community-led, stewarded by our team

Project Overview

95% cheaper

1 MB model size

10–15 min training time

2,520 texts/sec inference speed

512 kB RAM usage

8 languages and runtimes supported

100% offline-ready

All-platform deployment

Who it’s for

WhiteLightning is made for builders who don’t want to rent intelligence from the cloud.

Indie developer

Indie developer

Indie developer

Real-world scenarios of using

Personal productivity and desktop apps

Comms safety and moderation

Healthcare and life sciences

Customer support and compliance

IoT, automotive, and smart devices

Retail and eCommerce

Developer and DevOps tools

Education

OEM/Embedded hardware

Yes, it runs even on a potato

If it can run Python or Rust, it can run WhiteLightning.

WhyWhiteLightning delivers

1.

Pay once for LLM access (via OpenRouter or other API)

2.

Generate synthetic labeled data from your task prompt

3.

Distill into a 1 MB model that runs forever, for free, anywhere

OSS as a collaboration model

GitHub issues, PRs, and test matrix are open

CI/CD runs on every PR via GitHub Actions

Dev chat happens on Discord

Docker image is onghcr

get in totch with us

1 MB model size

512 kB RAM usage

WhiteLightning is made for builders who don’t want to rent intelligence from the
cloud.

Distill into a 1 MB model that runs forever, for free, anywhere