Video Intelligence

Thales Video Intelligence Indexing Platform

A full-stack, production-style system that turns raw video into searchable, analyst-ready structured data using CV + NLP + async pipelines.

Full-stack video intelligence system: frames + entities + timelines + transcripts + search.

Upload video or URL → async pipeline → shareable read-only report

Timeline + semantic search UX for fast triage

Deterministic JSON/PDF/CSV artifacts for auditability

Local-first, offline-capable processing on CPU

Watch Demo Architecture View GitHub Run Locally

Local demo walkthrough

What you’ll see in 60 seconds: upload → progress → report → timeline click → search → export → share link.

Overview screen showing global search, upload actions, and system stats — Overview

Upload screen with file and optional URL inputs — Upload

Video library with processing and completed statuses — Library

Video report view with media preview and export actions — Report + exports

Frame gallery with detection overlays and pagination — Frame evidence

Consolidated timeline view of detected entities — Timeline

Entity time ranges with confidence and sources — Entity ranges

Transcript summary with audio analysis and full text — Transcript

Unified entity search with filters and semantic matching — Search

No cloud demo: heavy compute + large models + cost + privacy. Evaluate locally.

Problem

Before: analysts scrub videos manually, pause/play repeatedly, and take notes by hand. There is no structured output, no timeline correlation, and no searchable archive.

After: a single upload yields entities, timestamps, frames, transcripts, a searchable index, exportable reports, and shareable read-only links.

Success criteria: search across videos, jump directly to moments of interest, and export analyst-ready artifacts with explainable evidence.

Solution overview

Frontend (React) for upload and review, FastAPI for orchestration, Celery/Redis for heavy processing, and SQLite + filesystem for persistence.

Compute is asynchronous by design: long-running jobs are queued, progress is tracked, and results are stored as artifacts for replay and export.

The system is built end-to-end by one engineer to simulate production-style video intelligence without paid cloud services.

Architecture

graph LR
  UI[React UI :5173] --> API[FastAPI :8010]
  API --> REDIS[(Redis :6379)]
  API --> DB[(SQLite)]
  API --> FS[data/entity_indexing]
  WORKER[Celery worker] --> REDIS
  WORKER --> FS
  WORKER --> DB
  API --> SHARE[Read-only share link]

  FS --> FRAMES[frames/]
  FS --> REPORTS[reports/]
  FS --> TRANSCRIPTS[transcripts/]
  FS --> INDEX[index.db]

Frontend (React/Vite): 5173
Backend (FastAPI): 8010 (or 8000 depending on env)
Redis: 6379
Celery worker: async pipeline
Data paths: data/entity_indexing/{frames,reports,transcripts,index.db}
Share links are read-only, tokenized URLs stored in SQLite.

Pipeline walkthrough (stages + progress)

extracting_frames (0–20)

Input: video file or URL.
Output: sampled frames + metadata.
Failure cases: unsupported codec, missing file, ffmpeg errors.
Persisted: frames/ + job metadata.

transcribing_audio (20)

Input: extracted audio track.
Output: transcript + timestamps.
Failure cases: no audio, low quality; pipeline continues with empty transcript.
Persisted: transcripts/ + JSON.

detecting_entities (20–80)

Input: frames + model configs.
Output: detected entities + confidence scores + OCR text.
Failure cases: model load errors; pipeline records partial results.
Persisted: detection JSON + frame-level indices.

aggregating_report (80–95)

Input: detections + transcript.
Output: time ranges, counts, evidence frames, summaries.
Failure cases: partial stage data; reports still generated with warnings.
Persisted: JSON/PDF/CSV reports.

indexing_search (95–100)

Input: report entities + transcript segments.
Output: exact + semantic index for fast search.
Failure cases: embedding model errors; falls back to exact search only.
Persisted: index.db + metadata.

Engineering decisions

Async workloads with Celery + Redis keep the UI responsive during heavy CV/NLP processing.
CPU-first model choices (YOLOv8, MiniLM embeddings, faster-whisper base) enable offline, local deployment.
Hybrid search combines exact entity matches with semantic similarity for recall.
Entity aggregation into time ranges + confidence scoring makes timelines explainable.
Verification pass (CLIP) + consecutive-frame filtering improves precision.
OCR + audio cleanup improves downstream transcript/search quality.
Deterministic outputs (JSON/PDF/CSV) make results auditable and repeatable.
Dataset exporter to COCO + YOLO with video-level splits prevents leakage.

What this demonstrates

Capability	Evidence	Where to look
Full-stack system design	UI + API + worker	ui/app.py, backend/
Async processing	Celery task queue	backend/worker/
CV/NLP pipeline	frames + detection + STT	backend/src/entity_indexing/
FastAPI endpoints	upload + report APIs	backend/main.py
Dataset export	COCO + YOLO exporter	scripts/export_training_dataset.py

Reproducibility

Docker quickstart:
docker-compose up --build
open http://localhost:5173

Local dev quickstart:
brew install ffmpeg tesseract
python -m venv .venv && source .venv/bin/activate
pip install -r backend/requirements.txt
redis-server
uvicorn backend.main:app --host 0.0.0.0 --port 8010
celery -A backend.worker.celery_app worker --loglevel=info
cd frontend && npm install && npm run dev

Reviewer tasks:
1) Upload a sample video
2) Open the timeline and click a range
3) Run a semantic search and export the report

Demo kit (no cloud)

Use a small MP4 sample (or a public MP4 URL) to keep demos fast. Run the pipeline locally, upload the sample, then review the timeline, search results, and exportable report.

If keys are required for any model downloads, set them via env vars locally only. Do not commit secrets.

Recorded demo (placeholder)

Add a Loom/YouTube link here when a recorded demo is available.

Limitations & roadmap

Limitations

CPU-bound performance; larger videos can take 1–2 minutes per minute of footage on a laptop.
Single-node deployment; no distributed workers or shared cache yet.
No auth/rate limiting if exposed publicly.
URL uploads can fail (403) depending on the source host.

Roadmap

GPU support for faster inference + larger Whisper models.
Auth + rate limiting for shared deployments.
Distributed workers + Postgres for multi-user scale.
Object storage (S3-compatible) for large artifacts.
Caching, retry/backoff, and improved search ranking.
Expanded tests + CI for pipeline regressions.

Security & privacy

Optional API keys must be injected via env/.env/Docker secrets and never committed.
Share links are read-only tokens stored in SQLite.
If deployed publicly, add auth + rate limiting and isolate storage.