Video Intelligence

Thales Video Intelligence Indexing Platform

A full-stack, production-style system that turns raw video into searchable, analyst-ready structured data using CV + NLP + async pipelines.

Full-stack video intelligence system: frames + entities + timelines + transcripts + search.

Upload video or URL → async pipeline → shareable read-only report
Timeline + semantic search UX for fast triage
Deterministic JSON/PDF/CSV artifacts for auditability
Local-first, offline-capable processing on CPU

Local demo walkthrough

What you’ll see in 60 seconds: upload → progress → report → timeline click → search → export → share link.

Overview screen showing global search, upload actions, and system stats
Overview
Upload screen with file and optional URL inputs
Upload
Video library with processing and completed statuses
Library
Video report view with media preview and export actions
Report + exports
Frame gallery with detection overlays and pagination
Frame evidence
Consolidated timeline view of detected entities
Timeline
Entity time ranges with confidence and sources
Entity ranges
Transcript summary with audio analysis and full text
Transcript
Unified entity search with filters and semantic matching
Search
No cloud demo: heavy compute + large models + cost + privacy. Evaluate locally.

Problem

Before: analysts scrub videos manually, pause/play repeatedly, and take notes by hand. There is no structured output, no timeline correlation, and no searchable archive.

After: a single upload yields entities, timestamps, frames, transcripts, a searchable index, exportable reports, and shareable read-only links.

Success criteria: search across videos, jump directly to moments of interest, and export analyst-ready artifacts with explainable evidence.

Solution overview

Frontend (React) for upload and review, FastAPI for orchestration, Celery/Redis for heavy processing, and SQLite + filesystem for persistence.

Compute is asynchronous by design: long-running jobs are queued, progress is tracked, and results are stored as artifacts for replay and export.

The system is built end-to-end by one engineer to simulate production-style video intelligence without paid cloud services.

Architecture

graph LR
  UI[React UI :5173] --> API[FastAPI :8010]
  API --> REDIS[(Redis :6379)]
  API --> DB[(SQLite)]
  API --> FS[data/entity_indexing]
  WORKER[Celery worker] --> REDIS
  WORKER --> FS
  WORKER --> DB
  API --> SHARE[Read-only share link]

  FS --> FRAMES[frames/]
  FS --> REPORTS[reports/]
  FS --> TRANSCRIPTS[transcripts/]
  FS --> INDEX[index.db]
  • Frontend (React/Vite): 5173
  • Backend (FastAPI): 8010 (or 8000 depending on env)
  • Redis: 6379
  • Celery worker: async pipeline
  • Data paths: data/entity_indexing/{frames,reports,transcripts,index.db}
  • Share links are read-only, tokenized URLs stored in SQLite.

Pipeline walkthrough (stages + progress)

extracting_frames (0–20)
  • Input: video file or URL.
  • Output: sampled frames + metadata.
  • Failure cases: unsupported codec, missing file, ffmpeg errors.
  • Persisted: frames/ + job metadata.
transcribing_audio (20)
  • Input: extracted audio track.
  • Output: transcript + timestamps.
  • Failure cases: no audio, low quality; pipeline continues with empty transcript.
  • Persisted: transcripts/ + JSON.
detecting_entities (20–80)
  • Input: frames + model configs.
  • Output: detected entities + confidence scores + OCR text.
  • Failure cases: model load errors; pipeline records partial results.
  • Persisted: detection JSON + frame-level indices.
aggregating_report (80–95)
  • Input: detections + transcript.
  • Output: time ranges, counts, evidence frames, summaries.
  • Failure cases: partial stage data; reports still generated with warnings.
  • Persisted: JSON/PDF/CSV reports.
indexing_search (95–100)
  • Input: report entities + transcript segments.
  • Output: exact + semantic index for fast search.
  • Failure cases: embedding model errors; falls back to exact search only.
  • Persisted: index.db + metadata.

Engineering decisions

  • Async workloads with Celery + Redis keep the UI responsive during heavy CV/NLP processing.
  • CPU-first model choices (YOLOv8, MiniLM embeddings, faster-whisper base) enable offline, local deployment.
  • Hybrid search combines exact entity matches with semantic similarity for recall.
  • Entity aggregation into time ranges + confidence scoring makes timelines explainable.
  • Verification pass (CLIP) + consecutive-frame filtering improves precision.
  • OCR + audio cleanup improves downstream transcript/search quality.
  • Deterministic outputs (JSON/PDF/CSV) make results auditable and repeatable.
  • Dataset exporter to COCO + YOLO with video-level splits prevents leakage.

What this demonstrates

CapabilityEvidenceWhere to look
Full-stack system designUI + API + workerui/app.py, backend/
Async processingCelery task queuebackend/worker/
CV/NLP pipelineframes + detection + STTbackend/src/entity_indexing/
FastAPI endpointsupload + report APIsbackend/main.py
Dataset exportCOCO + YOLO exporterscripts/export_training_dataset.py

Reproducibility

Docker quickstart:
docker-compose up --build
open http://localhost:5173

Local dev quickstart:
brew install ffmpeg tesseract
python -m venv .venv && source .venv/bin/activate
pip install -r backend/requirements.txt
redis-server
uvicorn backend.main:app --host 0.0.0.0 --port 8010
celery -A backend.worker.celery_app worker --loglevel=info
cd frontend && npm install && npm run dev

Reviewer tasks:
1) Upload a sample video
2) Open the timeline and click a range
3) Run a semantic search and export the report

Demo kit (no cloud)

Use a small MP4 sample (or a public MP4 URL) to keep demos fast. Run the pipeline locally, upload the sample, then review the timeline, search results, and exportable report.

If keys are required for any model downloads, set them via env vars locally only. Do not commit secrets.

Recorded demo (placeholder)

Add a Loom/YouTube link here when a recorded demo is available.

Limitations & roadmap

Limitations
  • CPU-bound performance; larger videos can take 1–2 minutes per minute of footage on a laptop.
  • Single-node deployment; no distributed workers or shared cache yet.
  • No auth/rate limiting if exposed publicly.
  • URL uploads can fail (403) depending on the source host.
Roadmap
  • GPU support for faster inference + larger Whisper models.
  • Auth + rate limiting for shared deployments.
  • Distributed workers + Postgres for multi-user scale.
  • Object storage (S3-compatible) for large artifacts.
  • Caching, retry/backoff, and improved search ranking.
  • Expanded tests + CI for pipeline regressions.

Security & privacy

  • Optional API keys must be injected via env/.env/Docker secrets and never committed.
  • Share links are read-only tokens stored in SQLite.
  • If deployed publicly, add auth + rate limiting and isolate storage.
    Thales Video Intelligence Indexing Platform — Case Study