Cascade · detection architecture

One frame is a hypothesis. Evidence over time becomes an incident.

The naive approach is to ask the strongest model to inspect every frame of every camera. That is expensive, slow, and wasteful — most frames show ordinary traffic. VESU stages the work: low-cost checks always-on, lightweight perception casts a wide net, temporal reasoning waits for evidence, strong verification is reserved for credible candidates.

01 · The four stages

Each stage is narrower than the last.

VESU does not spend the strongest model on empty roads. It spends compute in proportion to evidence.

STAGE 00
Camera health

Deterministic CV: frozen feed, blur, exposure, obstruction, PTZ motion, view mismatch. A blind camera is worse than an empty alert.

freq
every frame
cost
< 1ms
STAGE 01
Frame perception

Lightweight VLM reads sampled frames against the camera's learned scene model. Strict structured output. Hypotheses, not decisions.

freq
every ~5s
cost
low
STAGE 02
Temporal reasoning

Rules and open-set reasoning over a rolling window. Persistence, region stability, frame count, camera-health state — evidence, not confidence.

freq
rolling ~30s
cost
medium
STAGE 03
Clip verification

Strong multimodal model on a short clip or keyframe set. Confirms or retracts. Rare by design — only credible candidates reach here.

freq
on escalation
cost
high
02 · Walkthrough

A stopped vehicle, from first observation to incident.

A stopped vehicle should not page an operator because it appeared in one frame. VESU waits for evidence; only then does it publish. Timings are representative.

t = 0s
STAGE 0
Camera healthy
Feed is decoding, exposure normal, no PTZ motion. Stage 1 may run.
t = 5s
STAGE 1
Possible stopped vehicle
Vehicle observed in shoulder region. Hypothesis recorded; no incident yet.
t = 30s
STAGE 1
Still in shoulder
Same region. Persistence builds. Camera health still OK.
t = 50s
STAGE 2
Candidate
Persistence threshold met across 8 frames. Region stable. Class-specific rule passes.
t = 51s
STAGE 3
Verified
Strong model reviews a short clip. Confirms class: stopped_vehicle. Severity: medium.
t = 51.4s
PUBLISH
Incident delivered
Incident package built. Coverage state OK. Posted to ATMS event queue.
03 · Incident lifecycle

Six possible states. All of them are explicit outputs.

"Candidate suppressed" and "not enough evidence" are valid outcomes, not silent dropouts.

01
Candidate

Persistence/evidence threshold met. Not yet verified or published.

02
Verified

Stage 3 confirmed. Incident package built with clip, reason, provenance.

03
Published

Delivered to the operator system. Lifecycle tracking begins.

04
Updated

Severity or extent change. Update delivered through same event path.

05
Resolved

Cleared in the scene. Resolution event sent.

06
Suppressed

Evidence retracted before publish, or class did not meet a publication gate.

04 · The open-set path

The road will always produce cases that do not fit a closed list.

Stage 2 carries an open-set path for evidence that looks safety-relevant but does not match a named incident type. The goal is not to invent a label; it is to route evidence for review and improve the ontology over time.

01
Detect

Stage 2 flags persistent visual evidence that the closed ontology cannot explain.

02
Verify

Stage 3 confirms the evidence is real and safety-relevant; describes it in plain English.

03
Route

Surfaced for review with full evidence — not auto-published as an invented type.

The cascade is the part that earns operator trust.

We are happy to walk through how a credible candidate moves through the four stages — and how Stage 0 and the suppression path keep noise out of operator queues.