Ollama setup

Lume uses Ollama for local AI inference. Everything runs on your machine — no API keys, no cloud, no data leaving your infrastructure.

Install Ollama

macOS / Linux:

curl -fsSL https://ollama.com/install.sh | sh

Windows:

Download the installer from ollama.com/download.

ollama pull qwen2.5

This downloads the qwen2.5 7B model (~4.7GB). See model recommendations for other options.

ollama serve
# Listening on http://localhost:11434

Lume connects to http://localhost:11434 by default. The widget header shows a live connection status:

If your app is served from a different origin than localhost, you need to allow it:

# allow all origins
OLLAMA_ORIGINS="*" ollama serve

# allow a specific domain
OLLAMA_ORIGINS="https://myapp.com" ollama serve

Run this to confirm Ollama is reachable and qwen2.5 is installed:

ollama run qwen2.5 "reply only with: ok"

If it replies ok you’re good to go.

On M1/M2/M3 Macs, Ollama automatically uses the GPU via Metal. No configuration needed.

A MacBook Air M2 with 16GB RAM runs qwen2.5 (7B) at ~40–60 tokens/sec — fast enough to feel instant in a chat widget.