Camera learning · the differentiator

VESU learns the camera before it claims the road.

A traffic camera is not useful to VESU until the system understands what it is looking at. Is the visible pavement mainline, shoulder, ramp, median, work zone, or gore area? Which direction does traffic move? What recurring objects should not be mistaken for hazards? VESU learns that context from the feed itself.

01 · The onboarding problem

Manual per-camera annotation does not scale.

Drawing lane masks and shoulder polygons for every camera is slow, expensive, and stale as soon as a camera moves or the work zone changes. A system that requires a human to draw every region before it can work will never cover the fleet.

Manual approach
Annotate every camera

Operators draw polygons for roadway, shoulder, gore, median. Configure lane masks. Set expected direction. Re-do it when the camera moves.

VESU approach
Learn the scene from the feed

Specialty segmentation, lane detection, motion analysis, and vision-language labeling produce a versioned scene model — automatically, with confidence and maturity.

02 · What VESU learns

A versioned per-camera scene model.

The learned scene model becomes the shared reference for perception prompts, temporal rules, verification, coverage checks, and incident routing.

Raw camera frame
cam_l2 · sidefire
REC
RTSP · 1080p · 25 fps
14:33:05 EDT
Learned scene model · overlay
ROADWAY SHOULDER GORE FLOW SIGN confounder
scene_model · scn-2026.04
ACTIVE
tier · MAINLINE
3 lanes · 1 shoulder · 1 gore
scene_model · v scn-2026.04
regions
roadway · shoulder · gore · median
lane_geometry
lane lines · sub-pixel
flow_direction
per region · from optical flow
lighting
per-hour appearance
confounders
recurring · learned
criticality
mainline / ramp / urban
ptz_behavior
preset list · drift threshold
maturity
learning → active
03 · How VESU learns

Focused perception. Not a VLM mega-prompt.

VESU does not ask a generic language model to infer pixel geometry from scratch. It uses each model for what that model is good at — and assigns meaning only when the visual evidence supports it.

01 Specialty segmentation

Identifies candidate regions. Roadway, shoulder, sky, structure. Specialty models, not a generic prompt.

02 Lane + motion signals

Lane detection and optical-flow analysis. Geometry and direction come from the right tools.

03 Vision-language labeling

Used where it is strongest: semantic labeling over structured visual evidence. Not pixel-level geometry guessing.

04 Versioned scene model

Maturity and confidence per field. Stored, versioned, and referenced by every downstream stage.

04 · Confidence and maturity

A scene model has states, not just a value.

Not every camera goes from feed to production-grade detection on day one. VESU exposes maturity explicitly — and uses it to gate what claims it is willing to make.

Learning
Scene model is being built. Detection runs in advisory mode only.
Advisory
Scene model has confidence in most fields. Incidents may surface for review but not for publication.
Active
Scene model is mature. Detection publishes to operator systems.
Stale / Re-learn
PTZ drift, work-zone change, or scene mismatch detected. Coverage degraded; re-learn triggered.
05 · PTZ and drift

Cameras move. VESU treats that as a state change, not a nuisance.

If the current view no longer matches the learned scene, VESU degrades coverage, triggers re-learning where appropriate, and records that previous claims no longer apply to the current view.

01
Scene-similarity check

Each new frame's structure is compared to the learned scene. A divergence above threshold flags a possible view change.

02
Coverage degraded

When divergence persists, coverage state moves to degraded or gap. Incidents are gated until the scene is reconciled.

03
Re-learn or revert

If the new view is a known preset, it switches to the matching scene model. Otherwise, re-learning begins; the old claims are not silently re-applied.

Bring a feed. We will learn the camera.

A scene model is not a luxury feature — it is what makes every downstream claim trustworthy. We are happy to walk through a learned scene on one of your cameras.