Scoring

The scoring model defines how repository health is measured. Each dimension contributes a weighted score, and the overall result is a verdict of Pass, Warn, or Fail.

The core idea: make scoring functions concrete. The score system is deterministic and explainable — you should always be able to understand exactly why the score is what it is. There is no hidden heuristic. lexicon verify shows you precisely which dimensions contributed what.

Scoring supports weighted dimensions across areas like:

correctness
contract pass rate and coverage
conformance coverage
behavior pass rate
lint quality
documentation completeness
panic safety

The scoring model is stored at specs/scoring/model.toml.

Score Model Structure

A score model contains:

dimensions — A list of scoring dimensions, each with a weight and category
thresholds — Pass and warn thresholds (values between 0.0 and 1.0)

Dimensions

Each dimension has:

Field	Type	Description
`id`	string	Unique identifier (e.g., `correctness`)
`label`	string	Human-readable label
`weight`	u32	Weight in total score calculation
`category`	enum	`required`, `scored`, or `advisory`
`source`	enum	Where the value comes from: `gate`, `test_suite`, `coverage`, or `manual`

Thresholds

Threshold	Default	Meaning
`pass`	0.8	Score at or above this value is Pass
`warn`	0.6	Score at or above this but below pass is Warn

Below the warn threshold, the verdict is Fail.

Verdict Logic

The verdict is determined in order:

If any required dimension failed, verdict is Fail
If total score >= pass threshold, verdict is Pass
If total score >= warn threshold, verdict is Warn
Otherwise, verdict is Fail

Score Computation

The total score is a weighted average of all non-advisory dimensions:

total = sum(dimension.value * dimension.weight) / sum(weights)

Advisory dimensions are excluded from both the numerator and denominator.

Each dimension value is between 0.0 and 1.0. For gate-sourced dimensions, a passing gate scores 1.0 and a failing gate scores 0.0.

Safety Against Gaming

The scoring system is designed to resist gaming — both by humans and AI. Silently loosening score thresholds, weakening assertions, or rewriting scoring dimensions without acknowledgment are policy violations that lexicon detects.

Score changes are tracked in audit records with before and after values, so any drift in scoring policy is visible and attributable.

Default Model

The default model created by lexicon init includes six dimensions:

schema_version = "1.0"

[[dimensions]]
id = "correctness"
label = "Correctness"
weight = 30
category = "required"
source = "gate"

[[dimensions]]
id = "conformance-coverage"
label = "Conformance Coverage"
weight = 25
category = "scored"
source = "test_suite"

[[dimensions]]
id = "behavior-pass-rate"
label = "Behavior Pass Rate"
weight = 15
category = "scored"
source = "test_suite"

[[dimensions]]
id = "lint-quality"
label = "Lint Quality"
weight = 10
category = "scored"
source = "gate"

[[dimensions]]
id = "doc-completeness"
label = "Documentation Completeness"
weight = 10
category = "advisory"
source = "manual"

[[dimensions]]
id = "panic-safety"
label = "Panic Safety"
weight = 10
category = "scored"
source = "gate"

[thresholds]
pass = 0.8
warn = 0.6

Weights sum to 100 across all dimensions, but only non-advisory weights (90) are used in the denominator.

Usage

The scoring model is created automatically by lexicon init. Score results are displayed as part of lexicon verify output. For a detailed breakdown, run lexicon verify and review the score section.