Evaluation · model governance · drift

Model changes are releases.

A safety camera system cannot be judged by a single accuracy number. VESU measures coverage, evidence completeness, publication latency, false-positive burden, replay gate results, and operator burden. Strategies earn promotion through replay and shadow evidence — not because a demo looked better.

01 · Quality signals

No single accuracy number. Six categories, watched continuously.

Coverage availability

What fraction of monitored cameras delivered usable observation time, broken down by camera-health states.

Evidence completeness

Every published incident must carry clip, reason, coverage state, and provenance. Missing fields are a defect.

Publication latency

From Stage 2 candidate to ATMS delivery, by class. Operator-relevant — not just a model-call timer.

False-positive burden

Per-class FP per camera-day. Tracks whether the system is teaching operators to mute it.

Replay gate results

A candidate strategy must beat the incumbent across replay corpora — by class — before promotion.

Stage 2 / Stage 3 agreement

When the low-cost stage and the expensive stage disagree often, something is drifting. The disagreement rate is itself a signal.

02 · Strategy lifecycle

Five steps. Promotion is earned.

A VESU strategy is a versioned bundle, not a setting. The lifecycle below is the same for every change that affects what operators see.

Candidate

A new strategy version is built. Versioned bundle: prompt, schema, model, sampling, thresholds, scene model, gate policy.

Shadow

Runs beside production without paging operators. Outputs are recorded, not delivered.

Evaluated

Replayed against historical corpora and hard negatives. Compared per class against the incumbent.

Promoted

Earns operator delivery only after gate results — per class, not on average.

Rolled back

Any class regression triggers automatic rollback to the prior production strategy.

03 · Drift detection

Roads change. Cameras change. Models change.

VESU watches for that and degrades coverage early — rather than publishing alerts with the wrong reality model.

Scene similarity

The current frame's structure is compared to the learned scene. Divergence triggers re-learning.

Flow consistency

If observed motion no longer matches the learned direction, the scene model is flagged as stale.

Verification disagreement

If Stage 3 retracts a high rate of Stage 2 candidates, something has shifted in perception or the world.

Operator feedback

Confirmed, rejected, and ambiguous events feed the evaluation set. Patterns become hard negatives.

04 · The improvement loop

Observe. Replay. Compare. Promote. Monitor.

Reliability is what makes the rest credible.

We are happy to walk through how a strategy change reaches operators — and what would have to be true for it to be rolled back automatically.

Talk to Tollscopic →