Measuring code decoupling as a leading metric

A note for the team on how I'm thinking about scoring modularity

Where this thinking came from

Reference
Kirsten Westeinde, Shopify Engineering — Feb 21, 2019

Shopify spent years on one of the largest Ruby on Rails codebases in existence, hit the ceiling of the monolithic model around 2016, and consciously chose not to break into microservices. Instead they pursued a modular monolith: one deployment unit, but with strictly enforced boundaries between business domains.

The part of the post that stuck with me is the tool they built called Wedge. Wedge hooks into Ruby tracepoints during CI to capture the full call graph, then sorts callers and callees by component. Cross-component calls, associations, and inheritance get classified as either OK (going through a public API) or a violation (reaching into private internals). Each component gets a score and a trend line. The breakthrough wasn’t the rules — it was that they made modularity measurable per build instead of an aspirational architecture goal.

The frame I’m borrowing: if you want modularity to compound, you have to score it like you score test coverage or build time. Otherwise the work drifts.

Three layers of metrics

1. Structural coupling — the “Wedge score” equivalent

For each domain / module / package:

These are leading indicators because they move with every PR. They can be computed in CI and reported as a trend long before they need to be enforced as gates.

2. Behavioral coupling — the part Wedge missed

Static imports lie. Two modules can be import-clean and still be tangled through:

The metric I’d propose here: test isolation ratio — the percentage of a module’s tests that run green using only its own code plus mocks, with no need to boot the rest of the system. This catches the “clean on paper, tangled in practice” cases that purely static metrics miss.

3. Developer-experience outcomes — lagging, but they tell you it’s working

These shouldn’t drive the work, but they should be the proof:

What I’d actually do

Short version
The infrastructure to detect coupling violations usually already exists in a mature codebase. The missing piece is almost always aggregation and a trend line.
  1. Build a “Wedge dashboard.” One report per module with the four structural numbers above — boundary violation rate, data ownership purity, public API ratio, fan-in/fan-out. Wire it into CI output and pipe it to a dashboard so the trend is visible.
  2. Anchor the leading metric on public API ratio. It is the single number that most directly corresponds to “we can change module X without coordinating with module Y.” Track it weekly per module. Set per-module targets, not a global one — some modules legitimately have high fan-in and that’s fine.
  3. Pair it with a migration-progress metric. If the team is actively moving toward a target architecture (domain packages, repositories, service boundaries), track that completion percentage alongside the coupling score. Migration progress is the cause; coupling score is the effect. If progress climbs while score stays flat, the migration is checking boxes without actually decoupling — that’s a signal worth catching early.
  4. Don’t gate yet. Shopify deliberately tracked Wedge before enforcing anything. Enforcement on a moving target punishes the wrong people. The sequence is score → trend → enforce, in that order.

The honest caveat

Any structural metric will undercount the worst kind of coupling: implicit coupling through shared mutable state, undocumented event-ordering assumptions, or “everyone-just-knows” conventions. The dashboard will always look healthier than reality. So treat it as necessary, not sufficient, and keep the incident-blast-radius signal as the honest backstop. If the structural numbers improve but incidents still cascade across modules, the metric is gaming itself.