Camera learning · the differentiator

VESU learns the camera before it claims the road.

A traffic camera is not useful to VESU until the system understands what it is looking at. Is the visible pavement mainline, shoulder, ramp, median, work zone, or gore area? Which direction does traffic move? What recurring objects should not be mistaken for hazards? VESU learns that context from the feed itself.

01 · The onboarding problem

Manual per-camera annotation does not scale.

Drawing lane masks and shoulder polygons for every camera is slow, expensive, and stale as soon as a camera moves or the work zone changes. A system that requires a human to draw every region before it can work will never cover the fleet.

Manual approach

Annotate every camera

Operators draw polygons for roadway, shoulder, gore, median. Configure lane masks. Set expected direction. Re-do it when the camera moves.

VESU approach

Learn the scene from the feed

Specialty segmentation, lane detection, motion analysis, and vision-language labeling produce a versioned scene model — automatically, with confidence and maturity.

02 · What VESU learns

A versioned per-camera scene model.

The learned scene model becomes the shared reference for perception prompts, temporal rules, verification, coverage checks, and incident routing.

Raw camera frame

cam_l2 · sidefire

REC

RTSP · 1080p · 25 fps

14:33:05 EDT

Learned scene model · overlay

scene_model · scn-2026.04

ACTIVE

tier · MAINLINE

3 lanes · 1 shoulder · 1 gore

scene_model · v scn-2026.04

regions

roadway · shoulder · gore · median

lane_geometry

lane lines · sub-pixel

flow_direction

per region · from optical flow

lighting

per-hour appearance

confounders

recurring · learned

criticality

mainline / ramp / urban

ptz_behavior

preset list · drift threshold

maturity

learning → active

03 · How VESU learns

Focused perception. Not a VLM mega-prompt.

VESU does not ask a generic language model to infer pixel geometry from scratch. It uses each model for what that model is good at — and assigns meaning only when the visual evidence supports it.

01 Specialty segmentation

Identifies candidate regions. Roadway, shoulder, sky, structure. Specialty models, not a generic prompt.

02 Lane + motion signals

Lane detection and optical-flow analysis. Geometry and direction come from the right tools.

03 Vision-language labeling

Used where it is strongest: semantic labeling over structured visual evidence. Not pixel-level geometry guessing.

04 Versioned scene model

Maturity and confidence per field. Stored, versioned, and referenced by every downstream stage.

04 · Confidence and maturity

A scene model has states, not just a value.

Not every camera goes from feed to production-grade detection on day one. VESU exposes maturity explicitly — and uses it to gate what claims it is willing to make.

Learning

Scene model is being built. Detection runs in advisory mode only.

Advisory

Scene model has confidence in most fields. Incidents may surface for review but not for publication.

Active

Scene model is mature. Detection publishes to operator systems.

Stale / Re-learn

PTZ drift, work-zone change, or scene mismatch detected. Coverage degraded; re-learn triggered.

05 · PTZ and drift

Cameras move. VESU treats that as a state change, not a nuisance.

If the current view no longer matches the learned scene, VESU degrades coverage, triggers re-learning where appropriate, and records that previous claims no longer apply to the current view.

Scene-similarity check

Each new frame's structure is compared to the learned scene. A divergence above threshold flags a possible view change.

Coverage degraded

When divergence persists, coverage state moves to degraded or gap. Incidents are gated until the scene is reconciled.

Re-learn or revert

If the new view is a known preset, it switches to the matching scene model. Otherwise, re-learning begins; the old claims are not silently re-applied.

Bring a feed. We will learn the camera.

A scene model is not a luxury feature — it is what makes every downstream claim trustworthy. We are happy to walk through a learned scene on one of your cameras.

Talk to Tollscopic →