LLM Council

LLM Council (Local, Distributed)

Local-first multi-LLM consensus system using Ollama: parallel answers → anonymized peer review → chairman synthesis.

Local-only inference (Ollama) with no cloud calls
Strict JSON/Zod contracts between services
Observability + run history for reproducible demos

Teaser demo

What you’re seeing in 30 seconds: Stage 1 answers, Stage 2 anonymized reviews, Stage 3 synthesis, health + latency.

LLM Council UI overview with health and run controls
Overview + health
Stage 1 answers with latency and peer status
Stage 1 answers
Stage 2 anonymized reviews and critiques
Stage 2 reviews
Aggregated ranking with anonymized peers
Aggregated ranking
Stage 3 chairman synthesis result
Stage 3 synthesis
Run history and export panel
Run history + export

Problem

Single-model answers can be brittle, biased, or inconsistent. Teams struggle to compare models in a repeatable way, and it’s hard to explain why a final answer was chosen.

Local-first matters: sensitive data stays on the machine, inference works offline, costs stay predictable, and you avoid vendor lock-in.

Solution overview

LLM Council separates roles into independent services: members generate answers, the orchestrator coordinates and anonymizes, the chairman synthesizes, and the UI visualizes each stage.

The orchestrator does not call LLMs directly; it only coordinates requests and aggregates responses from member services.

Architecture

LLM Council architectureLocal-first multi-LLM consensus flow with orchestrator, Ollama peers, and chairman synthesis.Client UIprompt + policyOrchestratorZod contractsOllama peersanonymized votesChairmansynthesis + JSONSQLiterun historyHealth/HeartbeatLAN statusLocal onlyNo cloud inference

Architecture at a glance — A client UI submits a prompt and policy to the orchestrator. The orchestrator fans out to Ollama peers, anonymizes answers, collects reviews, and passes a ranked signal to the chairman service for a final JSON response. Health/heartbeat metrics and run history persist locally.

Ports & services
  • UI: 5173 • Orchestrator API: 9000 • Chairman: 9100
  • Member services: 8001–8003 • Ollama: 11434
  • Traffic is local only; no external inference calls are made.

Pipeline walkthrough (Stages 1–3)

Stage 1 — Parallel answers
  • Input: prompt + policy from the UI.
  • Action: member services call local Ollama models to produce answers.
  • Output: N candidate responses with timing metadata.
  • Failure tolerance: missing members are skipped; the run still proceeds.
Stage 2 — Anonymized peer review
  • Input: Stage 1 answers mapped to A/B/C IDs.
  • Action: members review anonymized peers and score them (prevents identity bias).
  • Output: rankings + critiques with no identity leakage.
  • Failure tolerance: requires at least two answers for meaningful ranking.
Stage 3 — Chairman synthesis
  • Input: top-ranked answers + reviews.
  • Action: chairman produces the final response with strict JSON schema.
  • Output: single synthesized answer + explanation metadata.
  • Failure tolerance: if reviews are incomplete, synthesis still runs on available signals.

Engineering decisions

  • Strict JSON schemas (Zod) at every boundary to reject malformed outputs.
  • Borda-style aggregation to convert peer rankings into a stable ordering.
  • Heartbeat monitoring and aggregated /health for quick node visibility.
  • SQLite persistence with WAL for reproducible demos and fast local writes.
  • Partial failure tolerance: stage 2 needs >=2 answers; stage 1 can run with fewer peers.

Reproducibility

npm install
npm --prefix orchestrator install
npm --prefix chairman install
npm --prefix ui install

cp .env.example .env
cp orchestrator/.env.example orchestrator/.env
cp chairman/.env.example chairman/.env
cp ui/.env.example ui/.env

./scripts/run_all_local.sh
open http://localhost:5173

Tip: click Run Pipeline, then open the Stage 1–3 tabs.

Security & privacy

  • Local-only inference via Ollama; no cloud model calls or external data egress.
  • Run history is stored in SQLite and can be disabled via environment settings for sensitive demos.
  • No auth or rate limiting by default (appropriate for local demos; add gateway controls for shared LAN use).

Limitations & roadmap

Limitations
  • No auth or rate limiting by default (demo scope).
  • No distributed persistence; all state is local.
  • Output quality depends on local models and hardware.
Roadmap
  • Retry/backoff policies for unstable local models.
  • Auth + role-based access for shared LAN demos.
  • Consolidated inference client to standardize model config.
  • Expanded test coverage for stage transitions and schema failures.
  • Optional OpenTelemetry tracing for multi-node debugging.

What this demonstrates

CapabilityEvidence in projectWhere to look
Distributed orchestrationStage coordination + fan-outorchestrator/src/server.ts
Strict contractsZod schemas + JSON validationorchestrator/src/contracts/*
ObservabilityHeartbeat + /health aggregationorchestrator/src/health.ts
UI visualizationStage views + run historyui/
    LLM Council — Local Multi-LLM Orchestrator — Case Study