Skip to content

Root Cause Metrics

Theory: Newman 2004, graph community detection.

Measures how well the dependency graph decomposes into independent clusters. Compares actual intra-module edge density against a random graph with the same degree sequence.

Q = (1/m) × Σ [A_ij - k_out_i × k_in_j / m] × δ(c_i, c_j)
  • Range: [-0.5, 1.0]. Q > 0.3 = significant modular structure.
  • Ungameable: Adding useless edges moves the graph closer to random, which decreases Q. Only genuine modular restructuring improves Q.
  • Language-fair: Works on ANY graph. Uses both import edges and call edges.
  • Replaces: coupling + cohesion + god files + hotspots (all symptoms of low Q).

Theory: Martin 2003, Acyclic Dependencies Principle.

Measures absence of circular dependencies. Cycles make build order undefined, change propagation unpredictable, and testing difficult.

  • Computation: Tarjan’s SCC algorithm counts strongly connected components with >1 member.
  • Normalization: score = 1 / (1 + cycle_count) — sigmoid because count is unbounded.
  • Fundamental: A depends on B depends on A — neither can be understood or tested independently.

Theory: Lakos 1996, levelization and layered architecture.

Measures the longest dependency chain in the DAG. Deep chains mean a change at the bottom propagates through many layers.

  • Computation: Iterative longest-path DFS from entry points.
  • Normalization: score = 1 / (1 + depth / 8) — midpoint 8 (depth 8 = score 0.5).
  • Independent from Q: A graph can have perfect modularity but still have a chain of 20 modules depending sequentially.

Theory: Gini 1912, originally from economics (wealth inequality).

Measures how evenly complexity is distributed across functions. A codebase where one god function has CC=200 and all others have CC=2 has high Gini. A codebase where all functions have CC=5-10 has low Gini.

Sort values ascending.
G = Σ (2i - n - 1) × x_i / (n × Σ x_i)
  • Score: 1 - G (lower Gini = better equality = higher score).
  • Why it matters: God files are the #1 source of AI agent confusion. When 40% of complexity is in one file, the agent can’t reason about it effectively.

Theory: Kolmogorov complexity — the gap between actual code and minimum equivalent code.

Combines dead functions (unreferenced by any call site) and duplicate functions (identical body hashes).

R = (dead_count + duplicate_count) / total_functions
score = 1 - R
  • Fundamental: Every line of dead or duplicate code is structural waste — increases the AI agent’s search space without contributing to behavior.

A directed graph with attributed nodes has these independent structural properties:

DimensionWhat it capturesProperty of
ModularityEdge clusteringEdges
AcyclicityCircular edgesEdges
DepthEdge chain lengthEdges
EqualityNode property concentrationNodes
RedundancyUnnecessary nodesNodes

3 edge properties + 2 node properties = 5 total. Adding more would either overlap (entropy overlaps with Gini) or measure something outside static analysis (runtime behavior).