Thales Video Intelligence Indexing Platform
A full-stack, production-style system that turns raw video into searchable, analyst-ready structured data using CV + NLP + async pipelines.
Full-stack video intelligence system: frames + entities + timelines + transcripts + search.
Local demo walkthrough
What you’ll see in 60 seconds: upload → progress → report → timeline click → search → export → share link.









Problem
Before: analysts scrub videos manually, pause/play repeatedly, and take notes by hand. There is no structured output, no timeline correlation, and no searchable archive.
After: a single upload yields entities, timestamps, frames, transcripts, a searchable index, exportable reports, and shareable read-only links.
Success criteria: search across videos, jump directly to moments of interest, and export analyst-ready artifacts with explainable evidence.
Solution overview
Frontend (React) for upload and review, FastAPI for orchestration, Celery/Redis for heavy processing, and SQLite + filesystem for persistence.
Compute is asynchronous by design: long-running jobs are queued, progress is tracked, and results are stored as artifacts for replay and export.
The system is built end-to-end by one engineer to simulate production-style video intelligence without paid cloud services.
Architecture
graph LR
UI[React UI :5173] --> API[FastAPI :8010]
API --> REDIS[(Redis :6379)]
API --> DB[(SQLite)]
API --> FS[data/entity_indexing]
WORKER[Celery worker] --> REDIS
WORKER --> FS
WORKER --> DB
API --> SHARE[Read-only share link]
FS --> FRAMES[frames/]
FS --> REPORTS[reports/]
FS --> TRANSCRIPTS[transcripts/]
FS --> INDEX[index.db]- Frontend (React/Vite): 5173
- Backend (FastAPI): 8010 (or 8000 depending on env)
- Redis: 6379
- Celery worker: async pipeline
- Data paths: data/entity_indexing/{frames,reports,transcripts,index.db}
- Share links are read-only, tokenized URLs stored in SQLite.
Pipeline walkthrough (stages + progress)
- Input: video file or URL.
- Output: sampled frames + metadata.
- Failure cases: unsupported codec, missing file, ffmpeg errors.
- Persisted: frames/ + job metadata.
- Input: extracted audio track.
- Output: transcript + timestamps.
- Failure cases: no audio, low quality; pipeline continues with empty transcript.
- Persisted: transcripts/ + JSON.
- Input: frames + model configs.
- Output: detected entities + confidence scores + OCR text.
- Failure cases: model load errors; pipeline records partial results.
- Persisted: detection JSON + frame-level indices.
- Input: detections + transcript.
- Output: time ranges, counts, evidence frames, summaries.
- Failure cases: partial stage data; reports still generated with warnings.
- Persisted: JSON/PDF/CSV reports.
- Input: report entities + transcript segments.
- Output: exact + semantic index for fast search.
- Failure cases: embedding model errors; falls back to exact search only.
- Persisted: index.db + metadata.
Engineering decisions
- Async workloads with Celery + Redis keep the UI responsive during heavy CV/NLP processing.
- CPU-first model choices (YOLOv8, MiniLM embeddings, faster-whisper base) enable offline, local deployment.
- Hybrid search combines exact entity matches with semantic similarity for recall.
- Entity aggregation into time ranges + confidence scoring makes timelines explainable.
- Verification pass (CLIP) + consecutive-frame filtering improves precision.
- OCR + audio cleanup improves downstream transcript/search quality.
- Deterministic outputs (JSON/PDF/CSV) make results auditable and repeatable.
- Dataset exporter to COCO + YOLO with video-level splits prevents leakage.
What this demonstrates
| Capability | Evidence | Where to look |
|---|---|---|
| Full-stack system design | UI + API + worker | ui/app.py, backend/ |
| Async processing | Celery task queue | backend/worker/ |
| CV/NLP pipeline | frames + detection + STT | backend/src/entity_indexing/ |
| FastAPI endpoints | upload + report APIs | backend/main.py |
| Dataset export | COCO + YOLO exporter | scripts/export_training_dataset.py |
Reproducibility
Docker quickstart:
docker-compose up --build
open http://localhost:5173
Local dev quickstart:
brew install ffmpeg tesseract
python -m venv .venv && source .venv/bin/activate
pip install -r backend/requirements.txt
redis-server
uvicorn backend.main:app --host 0.0.0.0 --port 8010
celery -A backend.worker.celery_app worker --loglevel=info
cd frontend && npm install && npm run dev
Reviewer tasks:
1) Upload a sample video
2) Open the timeline and click a range
3) Run a semantic search and export the reportDemo kit (no cloud)
Use a small MP4 sample (or a public MP4 URL) to keep demos fast. Run the pipeline locally, upload the sample, then review the timeline, search results, and exportable report.
If keys are required for any model downloads, set them via env vars locally only. Do not commit secrets.
Recorded demo (placeholder)
Limitations & roadmap
- CPU-bound performance; larger videos can take 1–2 minutes per minute of footage on a laptop.
- Single-node deployment; no distributed workers or shared cache yet.
- No auth/rate limiting if exposed publicly.
- URL uploads can fail (403) depending on the source host.
- GPU support for faster inference + larger Whisper models.
- Auth + rate limiting for shared deployments.
- Distributed workers + Postgres for multi-user scale.
- Object storage (S3-compatible) for large artifacts.
- Caching, retry/backoff, and improved search ranking.
- Expanded tests + CI for pipeline regressions.
Security & privacy
- Optional API keys must be injected via env/.env/Docker secrets and never committed.
- Share links are read-only tokens stored in SQLite.
- If deployed publicly, add auth + rate limiting and isolate storage.