One frame is a hypothesis. Evidence over time becomes an incident.
The naive approach is to ask the strongest model to inspect every frame of every camera. That is expensive, slow, and wasteful — most frames show ordinary traffic. VESU stages the work: low-cost checks always-on, lightweight perception casts a wide net, temporal reasoning waits for evidence, strong verification is reserved for credible candidates.
Each stage is narrower than the last.
VESU does not spend the strongest model on empty roads. It spends compute in proportion to evidence.
Deterministic CV: frozen feed, blur, exposure, obstruction, PTZ motion, view mismatch. A blind camera is worse than an empty alert.
Lightweight VLM reads sampled frames against the camera's learned scene model. Strict structured output. Hypotheses, not decisions.
Rules and open-set reasoning over a rolling window. Persistence, region stability, frame count, camera-health state — evidence, not confidence.
Strong multimodal model on a short clip or keyframe set. Confirms or retracts. Rare by design — only credible candidates reach here.
A stopped vehicle, from first observation to incident.
A stopped vehicle should not page an operator because it appeared in one frame. VESU waits for evidence; only then does it publish. Timings are representative.
Six possible states. All of them are explicit outputs.
"Candidate suppressed" and "not enough evidence" are valid outcomes, not silent dropouts.
Persistence/evidence threshold met. Not yet verified or published.
Stage 3 confirmed. Incident package built with clip, reason, provenance.
Delivered to the operator system. Lifecycle tracking begins.
Severity or extent change. Update delivered through same event path.
Cleared in the scene. Resolution event sent.
Evidence retracted before publish, or class did not meet a publication gate.
The road will always produce cases that do not fit a closed list.
Stage 2 carries an open-set path for evidence that looks safety-relevant but does not match a named incident type. The goal is not to invent a label; it is to route evidence for review and improve the ontology over time.
Stage 2 flags persistent visual evidence that the closed ontology cannot explain.
Stage 3 confirms the evidence is real and safety-relevant; describes it in plain English.
Surfaced for review with full evidence — not auto-published as an invented type.
The cascade is the part that earns operator trust.
We are happy to walk through how a credible candidate moves through the four stages — and how Stage 0 and the suppression path keep noise out of operator queues.