Skip to content

Scoring

The scoring model defines how repository health is measured. Each dimension contributes a weighted score, and the overall result is a verdict of Pass, Warn, or Fail.

The core idea: make scoring functions concrete. The score system is deterministic and explainable — you should always be able to understand exactly why the score is what it is. There is no hidden heuristic. lexicon verify shows you precisely which dimensions contributed what.

Scoring supports weighted dimensions across areas like:

  • correctness
  • contract pass rate and coverage
  • conformance coverage
  • behavior pass rate
  • lint quality
  • documentation completeness
  • panic safety

The scoring model is stored at specs/scoring/model.toml.

A score model contains:

  • dimensions — A list of scoring dimensions, each with a weight and category
  • thresholds — Pass and warn thresholds (values between 0.0 and 1.0)

Each dimension has:

FieldTypeDescription
idstringUnique identifier (e.g., correctness)
labelstringHuman-readable label
weightu32Weight in total score calculation
categoryenumrequired, scored, or advisory
sourceenumWhere the value comes from: gate, test_suite, coverage, or manual
  • Required — Must pass. If any required dimension fails, the overall verdict is Fail regardless of the numeric score.
  • Scored — Contributes to the weighted numeric score.
  • Advisory — Informational only. Does not affect pass/fail or the numeric score. Advisory dimensions are excluded from the weighted calculation.
ThresholdDefaultMeaning
pass0.8Score at or above this value is Pass
warn0.6Score at or above this but below pass is Warn

Below the warn threshold, the verdict is Fail.

The verdict is determined in order:

  1. If any required dimension failed, verdict is Fail
  2. If total score >= pass threshold, verdict is Pass
  3. If total score >= warn threshold, verdict is Warn
  4. Otherwise, verdict is Fail

The total score is a weighted average of all non-advisory dimensions:

total = sum(dimension.value * dimension.weight) / sum(weights)

Advisory dimensions are excluded from both the numerator and denominator.

Each dimension value is between 0.0 and 1.0. For gate-sourced dimensions, a passing gate scores 1.0 and a failing gate scores 0.0.

The scoring system is designed to resist gaming — both by humans and AI. Silently loosening score thresholds, weakening assertions, or rewriting scoring dimensions without acknowledgment are policy violations that lexicon detects.

Score changes are tracked in audit records with before and after values, so any drift in scoring policy is visible and attributable.

The default model created by lexicon init includes six dimensions:

schema_version = "1.0"
[[dimensions]]
id = "correctness"
label = "Correctness"
weight = 30
category = "required"
source = "gate"
[[dimensions]]
id = "conformance-coverage"
label = "Conformance Coverage"
weight = 25
category = "scored"
source = "test_suite"
[[dimensions]]
id = "behavior-pass-rate"
label = "Behavior Pass Rate"
weight = 15
category = "scored"
source = "test_suite"
[[dimensions]]
id = "lint-quality"
label = "Lint Quality"
weight = 10
category = "scored"
source = "gate"
[[dimensions]]
id = "doc-completeness"
label = "Documentation Completeness"
weight = 10
category = "advisory"
source = "manual"
[[dimensions]]
id = "panic-safety"
label = "Panic Safety"
weight = 10
category = "scored"
source = "gate"
[thresholds]
pass = 0.8
warn = 0.6

Weights sum to 100 across all dimensions, but only non-advisory weights (90) are used in the denominator.

The scoring model is created automatically by lexicon init. Score results are displayed as part of lexicon verify output. For a detailed breakdown, run lexicon verify and review the score section.