API Extraction

API extraction is the process of scanning Rust source files to build a structured snapshot of the public API surface. This snapshot captures every public struct, enum, trait, function, constant, type alias, and module — along with their signatures, visibility, doc summaries, and source locations.

The purpose is twofold: detect unintended API drift, and classify changes as breaking, dangerous, or additive before they ship.

How It Works

The extraction engine uses the syn crate to parse Rust source files into an AST, then walks the tree with a visitor that collects public items. For each item it records:

Field	Description
`kind`	Struct, Enum, Trait, Function, Method, Module, Constant, TypeAlias, or Impl
`name`	The item identifier
`module_path`	Nested module path (e.g., `["outer", "inner"]`)
`signature`	The full type or function signature
`visibility`	Public, Crate, Restricted, or Private
`doc_summary`	First line of the doc comment, if present
`span_file`	Source file path
`span_line`	Line number in the source file

Extraction can target a single file, a single source string, or an entire directory (recursively walking all .rs files). Files that fail to parse are silently skipped so that a single syntax error does not block the entire scan.

The result is an ApiSnapshot — a JSON-serializable structure containing the crate name, optional version, all extracted items, and a timestamp.

Baseline Management

A baseline is a saved API snapshot that represents the “known good” state of your public API. Baselines are stored as JSON files in .lexicon/api/.

The workflow:

Scan — Extract the current API and save it as current.json
Baseline — Promote the current scan to baseline.json
Diff — Compare a new scan against the saved baseline

Baselines let you answer the question: “has the public API changed since the last time I explicitly approved it?”

API Drift Detection

The diff engine compares two snapshots item by item, keyed on (kind, name, module_path):

Added items — present in current but not in baseline
Removed items — present in baseline but not in current
Changed items — present in both but with different signatures or visibility

For changed items, the diff records the specific fields that changed (signature, visibility, or both).

Breaking Change Classification

Every change is classified into one of four levels:

Level	Meaning	Examples
Breaking	Downstream code will fail to compile	Removing a public item, narrowing visibility (e.g., `pub` to `pub(crate)`)
Dangerous	May break downstream code depending on usage	Changing a function signature
Additive	Safe for downstream code	Adding a new public item, widening visibility
Unchanged	No difference	Item is identical in both snapshots

Removed items are always classified as breaking. Visibility changes are breaking if the new visibility is more restrictive than the old one, and additive otherwise. Signature changes are classified as dangerous because they may or may not break callers depending on the specific change.

Diff Reports

The diff engine produces both human-readable and JSON reports. The human-readable report groups changes by breaking level:

API Diff Summary: 1 added, 1 removed, 1 changed
============================================================

[BREAKING] Removed items:
  - function old_helper (pub)

[DANGEROUS] Changed items:
  ~ function process
    signature: fn process(x: i32) -> fn process(x: i32, y: i32)

[ADDITIVE] Added items:
  + struct NewConfig (pub)

The JSON report contains the full structured diff, suitable for programmatic consumption in CI pipelines.

Integration with Verify

API extraction fits into the broader lexicon verification pipeline. API drift is automatically checked during lexicon verify. Use the API_SCAN and API_BASELINE actions in lexicon chat to scan your API and save baselines interactively.