LLM Council architecture

At a glance

Problem
Local-first multi-LLM consensus system using Ollama: parallel answers → anonymized peer review → chairman synthesis.
Role
Systems engineering demo
Timeframe
Prototype build
Stack
Node.js • TypeScript • Ollama • Zod
Focus
Local AI • Distributed Systems • Ollama
Results
A local-first orchestration pattern that turns multi-LLM disagreement into a traceable, reviewable consensus flow without relying on cloud inference.

Problem

Single-model answers can be brittle, biased, or inconsistent. Teams struggle to compare models in a repeatable way, and it’s hard to explain why a final answer was chosen.

Local-first matters: sensitive data stays on the machine, inference works offline, costs stay predictable, and you avoid vendor lock-in.

Executive summary

Local-only inference (Ollama) with no cloud calls
Strict JSON/Zod contracts between services
Observability + run history for reproducible demos

Context

Architecture at a glance

Architecture at a glance — A client UI submits a prompt and policy to the orchestrator. The orchestrator fans out to Ollama peers, anonymizes answers, collects reviews, and passes a ranked signal to the chairman service for a final JSON response. Health/heartbeat metrics and run history persist locally.

How to verify

Quickstart (local)

npm install
npm --prefix orchestrator install
npm --prefix chairman install
npm --prefix ui install

cp .env.example .env
cp orchestrator/.env.example orchestrator/.env
cp chairman/.env.example chairman/.env
cp ui/.env.example ui/.env

./scripts/run_all_local.sh
open http://localhost:5173

Links

GitHub repo Demo screenshots Architecture notes

Local multi-LLM orchestration with Ollama and strict JSON contracts

LLM Council runs fully on-device, coordinating multiple models with Zod-validated JSON between services.

This keeps inference private, offline-capable, and reproducible.

Consensus pipeline: anonymized peer review and chairman synthesis

Responses are anonymized, reviewed, and ranked before the chairman produces the final structured answer.

This makes the selection process transparent and auditable.

Architecture

Pipeline stages (Stage 1 → 3)
- Stage 1: council members generate independent answers.
- Stage 2: anonymized peer review ranks responses without identity bias.
- Stage 3: chairman synthesizes the final response with strict JSON output.
- Partial failure tolerance: missing peers can be skipped without blocking the run.
Service boundaries & contracts
- Orchestrator coordinates stages, anonymizes responses, and persists run metadata.
- Each service exchanges validated JSON payloads via Zod schemas to prevent drift.
- Run IDs and latency telemetry make outputs traceable for audits and demos.
Observability & run history
- UI surfaces stage timing, peer health, and final synthesis details.
- SQLite stores run summaries, rankings, and exportable JSON artifacts.

Security / Threat Model

Local-only inference via Ollama; no cloud model calls or external data egress.
Run history is stored in SQLite and can be disabled via environment settings for sensitive demos.
No auth or rate limiting by default (appropriate for local demos; add gateway controls for shared LAN use).

Results

A local-first orchestration pattern that turns multi-LLM disagreement into a traceable, reviewable consensus flow without relying on cloud inference.

Stack

Node.jsTypeScriptOllamaZodSQLiteVite

FAQ

Why local-first instead of a cloud LLM API?

Local inference keeps sensitive data on-device, works offline, and avoids variable per‑request costs or vendor lock‑in.

Does the orchestrator call LLMs directly?

No. The orchestrator coordinates requests and aggregates responses, while member services handle LLM calls.

What makes the results reproducible?

Strict JSON/Zod contracts plus SQLite run history make it easy to replay and audit outputs.

Back to Projects Explore more Case Studies

LLM Council — Local Multi-LLM Orchestrator