Catalog decoding · the intellectual spine

One wrong digit can name the wrong carrier.

Tollscopic DOT# preserves uncertainty across frames and fields, retrieves plausible FMCSA candidates, and scores the joint evidence before it names a carrier. The product is catalog-constrained decoding, not OCR-then-lookup.

01 · Naive vs catalog decoding

Same hard image. Different architecture. Different result.

A real side-fire frame can be hard to read even for a careful human. The naive path turns one fragile string into a lookup. The Tollscopic DOT# path keeps the evidence open until the catalog and the field agreement decide.

NAIVE · OCR-then-lookup
OCR · single frame · single field
USDOT 3 8 4 7 1 2 8
model committed to 8 · alternate readings discarded
EXACT LOOKUP · wrong carrier
Atlas Long-Haul LLC
Phoenix, AZ · MC mismatch · name mismatch · domicile mismatch
↳ result · wrong invoice target
DOT# · catalog-constrained decoding
TRACK · 2 of 6 kept
MULTI-FIELD · multi-frame · with uncertainty
USDOT 3847128 · 3847129 · 3847139 last digit ambiguous
Name Northridge Freight Co consensus across 5 frames
MC MC-71458 clean read · 4 frames
City/state Toledo · OH 3 frames agree
CATALOG · ranked candidates
1
Northridge Freight Co
name + state + MC + DOT variant
0.94
2
Atlas Long-Haul LLC
literal DOT match · name mismatch
0.21
3
Northland Carriers Inc
partial name · weak agreement
0.08
↳ result · verified · gap 0.73 over runner-up
Synthetic carriers. The point is the architecture, not the specific carrier. The naive path commits to one fragile string; Tollscopic DOT# keeps multiple frames and multiple fields open until the catalog says which carrier explains the joint evidence best.
02 · The bad bets

Three places the naive architecture commits too early.

And one constraint the architecture forgets to use.

01
One string

A digit error changes the carrier. The literal lookup commits to a fragile string before any evidence supports it.

02
One frame

The truck is visible across time. Picking one frame and dropping the rest discards independent evidence about the same physical vehicle.

03
One field

The panel carries more than the USDOT number. Name, authority, and domicile have different error modes — they should be read together.

04
Open text

The carrier is not any arbitrary string in the world. It is a candidate in a public catalog, plus an explicit "not in catalog" option.

03 · Multi-field agreement

Fields don't need to be perfect individually. They need to agree.

A DOT digit can be hard to read while the legal name is clean. A city/state may be easy even when the authority line is blurred. The scorer rewards rare agreement across independent fields.

Field
Observed
Candidate · Northridge Freight
Agreement
USDOT
1-digit confusion · OCR plausible
3847128 · 3847129 · 3847139
3847139
0.45
Name
consensus across 5 frames
"NORTHRIDGE FREIGHT CO"
NORTHRIDGE FREIGHT CO
0.96
Authority
clean read · 4 frames
MC-71458
MC-71458
0.92
Domicile
tie-breaker · rare combination
TOLEDO · OH
Toledo, OH
0.88
JOINT SCORE
0.94
04 · The catalog shortlist

Retrieval before scoring.

Tollscopic DOT# does not score against millions of carriers per pass. It retrieves a plausible shortlist from each observed field, unions them, and includes an explicit out-of-catalog option.

01
Digit confusions

USDOT and authority numbers expand into close digit-substitution variants before retrieval.

02
Fuzzy name match

Legal and trade names are matched with edit-distance and tokenization to absorb noisy OCR.

03
Domicile support

City and state narrow the shortlist when other fields are ambiguous.

04
Out-of-catalog null

The "not in this catalog snapshot" option always competes with the candidates.

05 · Joint scoring

The scorer asks how likely the evidence is, given each candidate.

Common words and common digits do not get overvalued. Rare agreement across independent fields matters more. Near-identical frames do not double-count.

The intuition
carrier score = field agreement + catalog rarity + frame diversityambiguity

The real scorer is a record-linkage likelihood, not addition. This simplified form is how the system actually behaves: more agreement, rarer matches, and more diverse frames all push the score up. Ambiguity — multiple strong candidates — pulls it down.

WHAT IT MEANS
The runner-up gap matters.

A high score for the top candidate is not enough. If the runner-up scores almost as high, the evidence is ambiguous. The scorer surfaces both the top score and the gap. The decision layer uses both to decide between verified, probable, and ambiguous.

top · 0.94
runner-up · 0.21
gap · 0.73 → verified
06 · The refusal path

Tollscopic DOT# does not force the nearest carrier.

If the evidence is readable but unsupported by the catalog, the system emits a catalog-miss event. If multiple candidates remain plausible, it emits ambiguous. If the panel cannot be read, it emits unreadable. These states preserve truth for review and measurement.

01
catalog_miss

Evidence is readable. No carrier in the local FMCSA snapshot supports it strongly enough. The event records what was seen and which catalog version was checked.

02
ambiguous

Multiple carriers remain plausible. The system reports the top candidates with scores and gap, and routes to review rather than committing.

03
unreadable

The selected frames could not produce usable evidence. The pass is recorded; the identity field is empty by design.

The architecture is the product.

Single-frame OCR is the baseline Tollscopic DOT# is built to beat. Catalog-constrained, multi-field decoding is why DOT# resolves the cases the naive path drops.