Catalog decoding · the intellectual spine

One wrong digit can name the wrong carrier.

Tollscopic DOT# preserves uncertainty across frames and fields, retrieves plausible FMCSA candidates, and scores the joint evidence before it names a carrier. The product is catalog-constrained decoding, not OCR-then-lookup.

01 · Naive vs catalog decoding

Same hard image. Different architecture. Different result.

A real side-fire frame can be hard to read even for a careful human. The naive path turns one fragile string into a lookup. The Tollscopic DOT# path keeps the evidence open until the catalog and the field agreement decide.

NAIVE · OCR-then-lookup

OCR · single frame · single field

USDOT 3 8 4 7 1 2 8

model committed to 8 · alternate readings discarded

EXACT LOOKUP · wrong carrier

Atlas Long-Haul LLC

Phoenix, AZ · MC mismatch · name mismatch · domicile mismatch

↳ result · wrong invoice target

DOT# · catalog-constrained decoding

TRACK · 2 of 6 kept

✓

MULTI-FIELD · multi-frame · with uncertainty

USDOT 3847128 · 3847129 · 3847139 last digit ambiguous

Name Northridge Freight Co consensus across 5 frames

MC MC-71458 clean read · 4 frames

City/state Toledo · OH 3 frames agree

CATALOG · ranked candidates

Northridge Freight Co

name + state + MC + DOT variant

0.94

Atlas Long-Haul LLC

literal DOT match · name mismatch

0.21

Northland Carriers Inc

partial name · weak agreement

0.08

↳ result · verified · gap 0.73 over runner-up

Synthetic carriers. The point is the architecture, not the specific carrier. The naive path commits to one fragile string; Tollscopic DOT# keeps multiple frames and multiple fields open until the catalog says which carrier explains the joint evidence best.

02 · The bad bets

Three places the naive architecture commits too early.

And one constraint the architecture forgets to use.

One string

A digit error changes the carrier. The literal lookup commits to a fragile string before any evidence supports it.

One frame

The truck is visible across time. Picking one frame and dropping the rest discards independent evidence about the same physical vehicle.

One field

The panel carries more than the USDOT number. Name, authority, and domicile have different error modes — they should be read together.

Open text

The carrier is not any arbitrary string in the world. It is a candidate in a public catalog, plus an explicit "not in catalog" option.

03 · Multi-field agreement

Fields don't need to be perfect individually. They need to agree.

A DOT digit can be hard to read while the legal name is clean. A city/state may be easy even when the authority line is blurred. The scorer rewards rare agreement across independent fields.

Field

Observed

Candidate · Northridge Freight

Agreement

USDOT

1-digit confusion · OCR plausible

3847128 · 3847129 · 3847139

3847139

0.45

Name

consensus across 5 frames

"NORTHRIDGE FREIGHT CO"

NORTHRIDGE FREIGHT CO

0.96

Authority

clean read · 4 frames

MC-71458

0.92

Domicile

tie-breaker · rare combination

TOLEDO · OH

Toledo, OH

0.88

JOINT SCORE

0.94

04 · The catalog shortlist

Retrieval before scoring.

Tollscopic DOT# does not score against millions of carriers per pass. It retrieves a plausible shortlist from each observed field, unions them, and includes an explicit out-of-catalog option.

Digit confusions

USDOT and authority numbers expand into close digit-substitution variants before retrieval.

Fuzzy name match

Legal and trade names are matched with edit-distance and tokenization to absorb noisy OCR.

Domicile support

City and state narrow the shortlist when other fields are ambiguous.

Out-of-catalog null

The "not in this catalog snapshot" option always competes with the candidates.

05 · Joint scoring

The scorer asks how likely the evidence is, given each candidate.

Common words and common digits do not get overvalued. Rare agreement across independent fields matters more. Near-identical frames do not double-count.

The intuition

carrier score = field agreement + catalog rarity + frame diversity − ambiguity

The real scorer is a record-linkage likelihood, not addition. This simplified form is how the system actually behaves: more agreement, rarer matches, and more diverse frames all push the score up. Ambiguity — multiple strong candidates — pulls it down.

WHAT IT MEANS

The runner-up gap matters.

A high score for the top candidate is not enough. If the runner-up scores almost as high, the evidence is ambiguous. The scorer surfaces both the top score and the gap. The decision layer uses both to decide between verified, probable, and ambiguous.

top · 0.94
runner-up · 0.21
gap · 0.73 → verified

06 · The refusal path

Tollscopic DOT# does not force the nearest carrier.

If the evidence is readable but unsupported by the catalog, the system emits a catalog-miss event. If multiple candidates remain plausible, it emits ambiguous. If the panel cannot be read, it emits unreadable. These states preserve truth for review and measurement.

catalog_miss

Evidence is readable. No carrier in the local FMCSA snapshot supports it strongly enough. The event records what was seen and which catalog version was checked.

ambiguous

Multiple carriers remain plausible. The system reports the top candidates with scores and gap, and routes to review rather than committing.

unreadable

The selected frames could not produce usable evidence. The pass is recorded; the identity field is empty by design.

The architecture is the product.

Single-frame OCR is the baseline Tollscopic DOT# is built to beat. Catalog-constrained, multi-field decoding is why DOT# resolves the cases the naive path drops.

Talk to Tollscopic →