Skip to content

Ollama setup

Lume uses Ollama for local AI inference. Everything runs on your machine — no API keys, no cloud, no data leaving your infrastructure.


macOS / Linux:

Terminal window
curl -fsSL https://ollama.com/install.sh | sh

Windows:

Download the installer from ollama.com/download.


Terminal window
ollama pull qwen2.5

This downloads the qwen2.5 7B model (~4.7GB). See model recommendations for other options.


Terminal window
ollama serve
# Listening on http://localhost:11434

Lume connects to http://localhost:11434 by default. The widget header shows a live connection status:

  • 🟡 Yellow pulsing — connecting
  • 🟢 Green — connected
  • 🔴 Red — Ollama not reachable. Tooltip shows run: ollama serve

If your app is served from a different origin than localhost, you need to allow it:

Terminal window
# allow all origins
OLLAMA_ORIGINS="*" ollama serve
# allow a specific domain
OLLAMA_ORIGINS="https://myapp.com" ollama serve

Run this to confirm Ollama is reachable and qwen2.5 is installed:

Terminal window
ollama run qwen2.5 "reply only with: ok"

If it replies ok you’re good to go.


On M1/M2/M3 Macs, Ollama automatically uses the GPU via Metal. No configuration needed.

A MacBook Air M2 with 16GB RAM runs qwen2.5 (7B) at ~40–60 tokens/sec — fast enough to feel instant in a chat widget.