Where this thinking came from
Reference
Kirsten Westeinde, Shopify Engineering — Feb 21, 2019
Shopify spent years on one of the largest Ruby on Rails codebases in existence, hit the
ceiling of the monolithic model around 2016, and consciously chose not to break
into microservices. Instead they pursued a modular monolith: one
deployment unit, but with strictly enforced boundaries between business domains.
The part of the post that stuck with me is the tool they built called
Wedge. Wedge hooks into Ruby tracepoints during CI to capture the full
call graph, then sorts callers and callees by component. Cross-component calls,
associations, and inheritance get classified as either OK (going through a public API) or
a violation (reaching into private internals). Each component gets a score and a trend
line. The breakthrough wasn’t the rules — it was that they made modularity
measurable per build instead of an aspirational architecture goal.
The frame I’m borrowing: if you want modularity to compound, you have to score it
like you score test coverage or build time. Otherwise the work drifts.
Three layers of metrics
1. Structural coupling — the “Wedge score” equivalent
For each domain / module / package:
-
Boundary violation rate — imports that reach into a module’s
internals rather than going through its public entry point. Expressed as
violations / total cross-module imports.
-
Data ownership purity — the share of a domain’s data
schemas / tables / repositories that are referenced only from inside the owning domain.
-
Public API ratio — of all inbound calls into a module, the
percentage that resolve through its declared public surface vs. deep imports. This is
the closest direct analog to Wedge’s headline number.
-
Fan-in / fan-out — for each domain, how many other domains
import it (fan-in) vs. how many it imports (fan-out). Plot it. High fan-in hubs are
usually correct (auth, users); high fan-out leaf modules are usually a smell.
-
Inheritance & cross-module associations — Shopify flagged
these as “always violations.” ORM associations or base classes that span
module boundaries silently couple deployment, testing, and refactor surface area.
These are leading indicators because they move with every PR. They can
be computed in CI and reported as a trend long before they need to be enforced as gates.
2. Behavioral coupling — the part Wedge missed
Static imports lie. Two modules can be import-clean and still be tangled through:
- Shared database tables / collections / queries
- Event handlers that mutate another module’s state without a contract
- Tests that require booting unrelated services to pass
The metric I’d propose here: test isolation ratio — the
percentage of a module’s tests that run green using only its own code plus mocks,
with no need to boot the rest of the system. This catches the “clean on paper,
tangled in practice” cases that purely static metrics miss.
3. Developer-experience outcomes — lagging, but they tell you it’s working
These shouldn’t drive the work, but they should be the proof:
- Median files changed per PR (per team — should trend down)
- Median number of CODEOWNERS teams blocking a PR (should trend toward one)
- Time-to-first-PR for an engineer newly onboarded to a specific module
- Build-graph size per PR — how much rebuild/retest a typical change triggers
-
Incident blast radius — the number of modules implicated per production incident
over time
What I’d actually do
Short version
The infrastructure to detect coupling violations usually already exists in a mature
codebase. The missing piece is almost always
aggregation and a trend line.
-
Build a “Wedge dashboard.” One report per module with the
four structural numbers above — boundary violation rate, data ownership purity,
public API ratio, fan-in/fan-out. Wire it into CI output and pipe it to a dashboard so
the trend is visible.
-
Anchor the leading metric on public API ratio. It is the single
number that most directly corresponds to “we can change module X without
coordinating with module Y.” Track it weekly per module. Set
per-module targets, not a global one — some modules legitimately have
high fan-in and that’s fine.
-
Pair it with a migration-progress metric. If the team is actively
moving toward a target architecture (domain packages, repositories, service
boundaries), track that completion percentage alongside the coupling score. Migration
progress is the cause; coupling score is the effect. If progress climbs while score
stays flat, the migration is checking boxes without actually decoupling —
that’s a signal worth catching early.
-
Don’t gate yet. Shopify deliberately tracked Wedge before
enforcing anything. Enforcement on a moving target punishes the wrong people. The
sequence is score → trend → enforce, in that order.
The honest caveat
Any structural metric will undercount the worst kind of coupling: implicit coupling
through shared mutable state, undocumented event-ordering assumptions, or
“everyone-just-knows” conventions. The dashboard will always look healthier
than reality. So treat it as necessary, not sufficient, and keep the
incident-blast-radius signal as the honest backstop. If the structural numbers improve
but incidents still cascade across modules, the metric is gaming itself.