Ollama setup
Lume uses Ollama for local AI inference. Everything runs on your machine — no API keys, no cloud, no data leaving your infrastructure.
Install Ollama
Section titled “Install Ollama”macOS / Linux:
curl -fsSL https://ollama.com/install.sh | shWindows:
Download the installer from ollama.com/download.
Pull a model
Section titled “Pull a model”ollama pull qwen2.5This downloads the qwen2.5 7B model (~4.7GB). See model recommendations for other options.
Start Ollama
Section titled “Start Ollama”ollama serve# Listening on http://localhost:11434Lume connects to http://localhost:11434 by default. The widget header shows a live connection status:
- 🟡 Yellow pulsing — connecting
- 🟢 Green — connected
- 🔴 Red — Ollama not reachable. Tooltip shows
run: ollama serve
If your app is served from a different origin than localhost, you need to allow it:
# allow all originsOLLAMA_ORIGINS="*" ollama serve
# allow a specific domainOLLAMA_ORIGINS="https://myapp.com" ollama serveVerify it’s working
Section titled “Verify it’s working”Run this to confirm Ollama is reachable and qwen2.5 is installed:
ollama run qwen2.5 "reply only with: ok"If it replies ok you’re good to go.
Apple Silicon
Section titled “Apple Silicon”On M1/M2/M3 Macs, Ollama automatically uses the GPU via Metal. No configuration needed.
A MacBook Air M2 with 16GB RAM runs qwen2.5 (7B) at ~40–60 tokens/sec — fast enough to feel instant in a chat widget.
- Model recommendations — compare available models
- Getting started — add the widget to your app