Skip to content

API Extraction

API extraction is the process of scanning Rust source files to build a structured snapshot of the public API surface. This snapshot captures every public struct, enum, trait, function, constant, type alias, and module — along with their signatures, visibility, doc summaries, and source locations.

The purpose is twofold: detect unintended API drift, and classify changes as breaking, dangerous, or additive before they ship.

The extraction engine uses the syn crate to parse Rust source files into an AST, then walks the tree with a visitor that collects public items. For each item it records:

FieldDescription
kindStruct, Enum, Trait, Function, Method, Module, Constant, TypeAlias, or Impl
nameThe item identifier
module_pathNested module path (e.g., ["outer", "inner"])
signatureThe full type or function signature
visibilityPublic, Crate, Restricted, or Private
doc_summaryFirst line of the doc comment, if present
span_fileSource file path
span_lineLine number in the source file

Extraction can target a single file, a single source string, or an entire directory (recursively walking all .rs files). Files that fail to parse are silently skipped so that a single syntax error does not block the entire scan.

The result is an ApiSnapshot — a JSON-serializable structure containing the crate name, optional version, all extracted items, and a timestamp.

A baseline is a saved API snapshot that represents the “known good” state of your public API. Baselines are stored as JSON files in .lexicon/api/.

The workflow:

  1. Scan — Extract the current API and save it as current.json
  2. Baseline — Promote the current scan to baseline.json
  3. Diff — Compare a new scan against the saved baseline

Baselines let you answer the question: “has the public API changed since the last time I explicitly approved it?”

The diff engine compares two snapshots item by item, keyed on (kind, name, module_path):

  • Added items — present in current but not in baseline
  • Removed items — present in baseline but not in current
  • Changed items — present in both but with different signatures or visibility

For changed items, the diff records the specific fields that changed (signature, visibility, or both).

Every change is classified into one of four levels:

LevelMeaningExamples
BreakingDownstream code will fail to compileRemoving a public item, narrowing visibility (e.g., pub to pub(crate))
DangerousMay break downstream code depending on usageChanging a function signature
AdditiveSafe for downstream codeAdding a new public item, widening visibility
UnchangedNo differenceItem is identical in both snapshots

Removed items are always classified as breaking. Visibility changes are breaking if the new visibility is more restrictive than the old one, and additive otherwise. Signature changes are classified as dangerous because they may or may not break callers depending on the specific change.

The diff engine produces both human-readable and JSON reports. The human-readable report groups changes by breaking level:

API Diff Summary: 1 added, 1 removed, 1 changed
============================================================
[BREAKING] Removed items:
- function old_helper (pub)
[DANGEROUS] Changed items:
~ function process
signature: fn process(x: i32) -> fn process(x: i32, y: i32)
[ADDITIVE] Added items:
+ struct NewConfig (pub)

The JSON report contains the full structured diff, suitable for programmatic consumption in CI pipelines.

API extraction fits into the broader lexicon verification pipeline. API drift is automatically checked during lexicon verify. Use the API_SCAN and API_BASELINE actions in lexicon chat to scan your API and save baselines interactively.