- ProblemLocal-first multi-LLM consensus system using Ollama: parallel answers → anonymized peer review → chairman synthesis.
- RoleSystems engineering demo
- TimeframePrototype build
- StackNode.js • TypeScript • Ollama • Zod
- FocusLocal AI • Distributed Systems • Ollama
- ResultsA local-first orchestration pattern that turns multi-LLM disagreement into a traceable, reviewable consensus flow without relying on cloud inference.
Problem
Single-model answers can be brittle, biased, or inconsistent. Teams struggle to compare models in a repeatable way, and it’s hard to explain why a final answer was chosen.
Local-first matters: sensitive data stays on the machine, inference works offline, costs stay predictable, and you avoid vendor lock-in.
- Local-only inference (Ollama) with no cloud calls
- Strict JSON/Zod contracts between services
- Observability + run history for reproducible demos
Context
Architecture at a glance — A client UI submits a prompt and policy to the orchestrator. The orchestrator fans out to Ollama peers, anonymizes answers, collects reviews, and passes a ranked signal to the chairman service for a final JSON response. Health/heartbeat metrics and run history persist locally.
- Quickstart (local)
npm install npm --prefix orchestrator install npm --prefix chairman install npm --prefix ui install cp .env.example .env cp orchestrator/.env.example orchestrator/.env cp chairman/.env.example chairman/.env cp ui/.env.example ui/.env ./scripts/run_all_local.sh open http://localhost:5173
Local multi-LLM orchestration with Ollama and strict JSON contracts
LLM Council runs fully on-device, coordinating multiple models with Zod-validated JSON between services.
This keeps inference private, offline-capable, and reproducible.
Consensus pipeline: anonymized peer review and chairman synthesis
Responses are anonymized, reviewed, and ranked before the chairman produces the final structured answer.
This makes the selection process transparent and auditable.
Architecture
- Pipeline stages (Stage 1 → 3)
- Stage 1: council members generate independent answers.
- Stage 2: anonymized peer review ranks responses without identity bias.
- Stage 3: chairman synthesizes the final response with strict JSON output.
- Partial failure tolerance: missing peers can be skipped without blocking the run.
- Service boundaries & contracts
- Orchestrator coordinates stages, anonymizes responses, and persists run metadata.
- Each service exchanges validated JSON payloads via Zod schemas to prevent drift.
- Run IDs and latency telemetry make outputs traceable for audits and demos.
- Observability & run history
- UI surfaces stage timing, peer health, and final synthesis details.
- SQLite stores run summaries, rankings, and exportable JSON artifacts.
Security / Threat Model
- Local-only inference via Ollama; no cloud model calls or external data egress.
- Run history is stored in SQLite and can be disabled via environment settings for sensitive demos.
- No auth or rate limiting by default (appropriate for local demos; add gateway controls for shared LAN use).
Results
A local-first orchestration pattern that turns multi-LLM disagreement into a traceable, reviewable consensus flow without relying on cloud inference.
Stack
FAQ
Why local-first instead of a cloud LLM API?
Local inference keeps sensitive data on-device, works offline, and avoids variable per‑request costs or vendor lock‑in.
Does the orchestrator call LLMs directly?
No. The orchestrator coordinates requests and aggregates responses, while member services handle LLM calls.
What makes the results reproducible?
Strict JSON/Zod contracts plus SQLite run history make it easy to replay and audit outputs.
