Measuring code decoupling as a leading metric

A note for the team on how I'm thinking about scoring modularity

Where this thinking came from

Reference

Deconstructing the Monolith: Designing Software that Maximizes Developer Productivity

Kirsten Westeinde, Shopify Engineering — Feb 21, 2019

Shopify spent years on one of the largest Ruby on Rails codebases in existence, hit the ceiling of the monolithic model around 2016, and consciously chose not to break into microservices. Instead they pursued a modular monolith: one deployment unit, but with strictly enforced boundaries between business domains.

The part of the post that stuck with me is the tool they built called Wedge. Wedge hooks into Ruby tracepoints during CI to capture the full call graph, then sorts callers and callees by component. Cross-component calls, associations, and inheritance get classified as either OK (going through a public API) or a violation (reaching into private internals). Each component gets a score and a trend line. The breakthrough wasn’t the rules — it was that they made modularity measurable per build instead of an aspirational architecture goal.

The frame I’m borrowing: if you want modularity to compound, you have to score it like you score test coverage or build time. Otherwise the work drifts.

Three layers of metrics

1. Structural coupling — the “Wedge score” equivalent

For each domain / module / package:

Boundary violation rate — imports that reach into a module’s internals rather than going through its public entry point. Expressed as violations / total cross-module imports.
Data ownership purity — the share of a domain’s data schemas / tables / repositories that are referenced only from inside the owning domain.
Public API ratio — of all inbound calls into a module, the percentage that resolve through its declared public surface vs. deep imports. This is the closest direct analog to Wedge’s headline number.
Fan-in / fan-out — for each domain, how many other domains import it (fan-in) vs. how many it imports (fan-out). Plot it. High fan-in hubs are usually correct (auth, users); high fan-out leaf modules are usually a smell.
Inheritance & cross-module associations — Shopify flagged these as “always violations.” ORM associations or base classes that span module boundaries silently couple deployment, testing, and refactor surface area.

These are leading indicators because they move with every PR. They can be computed in CI and reported as a trend long before they need to be enforced as gates.

2. Behavioral coupling — the part Wedge missed

Static imports lie. Two modules can be import-clean and still be tangled through:

Shared database tables / collections / queries
Event handlers that mutate another module’s state without a contract
Tests that require booting unrelated services to pass

The metric I’d propose here: test isolation ratio — the percentage of a module’s tests that run green using only its own code plus mocks, with no need to boot the rest of the system. This catches the “clean on paper, tangled in practice” cases that purely static metrics miss.

3. Developer-experience outcomes — lagging, but they tell you it’s working

These shouldn’t drive the work, but they should be the proof:

Median files changed per PR (per team — should trend down)
Median number of CODEOWNERS teams blocking a PR (should trend toward one)
Time-to-first-PR for an engineer newly onboarded to a specific module
Build-graph size per PR — how much rebuild/retest a typical change triggers
Incident blast radius — the number of modules implicated per production incident over time

What I’d actually do

Short version

The infrastructure to detect coupling violations usually already exists in a mature codebase. The missing piece is almost always aggregation and a trend line.

Build a “Wedge dashboard.” One report per module with the four structural numbers above — boundary violation rate, data ownership purity, public API ratio, fan-in/fan-out. Wire it into CI output and pipe it to a dashboard so the trend is visible.
Anchor the leading metric on public API ratio. It is the single number that most directly corresponds to “we can change module X without coordinating with module Y.” Track it weekly per module. Set per-module targets, not a global one — some modules legitimately have high fan-in and that’s fine.
Pair it with a migration-progress metric. If the team is actively moving toward a target architecture (domain packages, repositories, service boundaries), track that completion percentage alongside the coupling score. Migration progress is the cause; coupling score is the effect. If progress climbs while score stays flat, the migration is checking boxes without actually decoupling — that’s a signal worth catching early.
Don’t gate yet. Shopify deliberately tracked Wedge before enforcing anything. Enforcement on a moving target punishes the wrong people. The sequence is score → trend → enforce, in that order.

The honest caveat

Any structural metric will undercount the worst kind of coupling: implicit coupling through shared mutable state, undocumented event-ordering assumptions, or “everyone-just-knows” conventions. The dashboard will always look healthier than reality. So treat it as necessary, not sufficient, and keep the incident-blast-radius signal as the honest backstop. If the structural numbers improve but incidents still cascade across modules, the metric is gaming itself.