LLM Council

LLM Council (Local, Distributed)

Local-first multi-LLM consensus system using Ollama: parallel answers → anonymized peer review → chairman synthesis.

Local-only inference (Ollama) with no cloud calls

Strict JSON/Zod contracts between services

Observability + run history for reproducible demos

Watch Demo Read Architecture View GitHub Run Locally

Teaser demo

What you’re seeing in 30 seconds: Stage 1 answers, Stage 2 anonymized reviews, Stage 3 synthesis, health + latency.

LLM Council UI overview with health and run controls — Overview + health

Stage 1 answers with latency and peer status — Stage 1 answers

Stage 2 anonymized reviews and critiques — Stage 2 reviews

Aggregated ranking with anonymized peers — Aggregated ranking

Stage 3 chairman synthesis result — Stage 3 synthesis

Run history and export panel — Run history + export

Problem

Single-model answers can be brittle, biased, or inconsistent. Teams struggle to compare models in a repeatable way, and it’s hard to explain why a final answer was chosen.

Local-first matters: sensitive data stays on the machine, inference works offline, costs stay predictable, and you avoid vendor lock-in.

Solution overview

LLM Council separates roles into independent services: members generate answers, the orchestrator coordinates and anonymizes, the chairman synthesizes, and the UI visualizes each stage.

The orchestrator does not call LLMs directly; it only coordinates requests and aggregates responses from member services.

Architecture

Architecture at a glance — A client UI submits a prompt and policy to the orchestrator. The orchestrator fans out to Ollama peers, anonymizes answers, collects reviews, and passes a ranked signal to the chairman service for a final JSON response. Health/heartbeat metrics and run history persist locally.

Ports & services

UI: 5173 • Orchestrator API: 9000 • Chairman: 9100
Member services: 8001–8003 • Ollama: 11434
Traffic is local only; no external inference calls are made.

Pipeline walkthrough (Stages 1–3)

Stage 1 — Parallel answers

Input: prompt + policy from the UI.
Action: member services call local Ollama models to produce answers.
Output: N candidate responses with timing metadata.
Failure tolerance: missing members are skipped; the run still proceeds.

Stage 2 — Anonymized peer review

Input: Stage 1 answers mapped to A/B/C IDs.
Action: members review anonymized peers and score them (prevents identity bias).
Output: rankings + critiques with no identity leakage.
Failure tolerance: requires at least two answers for meaningful ranking.

Stage 3 — Chairman synthesis

Input: top-ranked answers + reviews.
Action: chairman produces the final response with strict JSON schema.
Output: single synthesized answer + explanation metadata.
Failure tolerance: if reviews are incomplete, synthesis still runs on available signals.

Engineering decisions

Strict JSON schemas (Zod) at every boundary to reject malformed outputs.
Borda-style aggregation to convert peer rankings into a stable ordering.
Heartbeat monitoring and aggregated /health for quick node visibility.
SQLite persistence with WAL for reproducible demos and fast local writes.
Partial failure tolerance: stage 2 needs >=2 answers; stage 1 can run with fewer peers.

Reproducibility

npm install
npm --prefix orchestrator install
npm --prefix chairman install
npm --prefix ui install

cp .env.example .env
cp orchestrator/.env.example orchestrator/.env
cp chairman/.env.example chairman/.env
cp ui/.env.example ui/.env

./scripts/run_all_local.sh
open http://localhost:5173

Tip: click Run Pipeline, then open the Stage 1–3 tabs.

Security & privacy

Local-only inference via Ollama; no cloud model calls or external data egress.
Run history is stored in SQLite and can be disabled via environment settings for sensitive demos.
No auth or rate limiting by default (appropriate for local demos; add gateway controls for shared LAN use).

Limitations & roadmap

Limitations

No auth or rate limiting by default (demo scope).
No distributed persistence; all state is local.
Output quality depends on local models and hardware.

Roadmap

Retry/backoff policies for unstable local models.
Auth + role-based access for shared LAN demos.
Consolidated inference client to standardize model config.
Expanded test coverage for stage transitions and schema failures.
Optional OpenTelemetry tracing for multi-node debugging.

What this demonstrates

Capability	Evidence in project	Where to look
Distributed orchestration	Stage coordination + fan-out	orchestrator/src/server.ts
Strict contracts	Zod schemas + JSON validation	orchestrator/src/contracts/*
Observability	Heartbeat + /health aggregation	orchestrator/src/health.ts
UI visualization	Stage views + run history	ui/