Root Cause Metrics
1. Modularity (Newman’s Q)
Section titled “1. Modularity (Newman’s Q)”Theory: Newman 2004, graph community detection.
Measures how well the dependency graph decomposes into independent clusters. Compares actual intra-module edge density against a random graph with the same degree sequence.
Q = (1/m) × Σ [A_ij - k_out_i × k_in_j / m] × δ(c_i, c_j)- Range: [-0.5, 1.0]. Q > 0.3 = significant modular structure.
- Ungameable: Adding useless edges moves the graph closer to random, which decreases Q. Only genuine modular restructuring improves Q.
- Language-fair: Works on ANY graph. Uses both import edges and call edges.
- Replaces: coupling + cohesion + god files + hotspots (all symptoms of low Q).
2. Acyclicity
Section titled “2. Acyclicity”Theory: Martin 2003, Acyclic Dependencies Principle.
Measures absence of circular dependencies. Cycles make build order undefined, change propagation unpredictable, and testing difficult.
- Computation: Tarjan’s SCC algorithm counts strongly connected components with >1 member.
- Normalization:
score = 1 / (1 + cycle_count)— sigmoid because count is unbounded. - Fundamental: A depends on B depends on A — neither can be understood or tested independently.
3. Depth
Section titled “3. Depth”Theory: Lakos 1996, levelization and layered architecture.
Measures the longest dependency chain in the DAG. Deep chains mean a change at the bottom propagates through many layers.
- Computation: Iterative longest-path DFS from entry points.
- Normalization:
score = 1 / (1 + depth / 8)— midpoint 8 (depth 8 = score 0.5). - Independent from Q: A graph can have perfect modularity but still have a chain of 20 modules depending sequentially.
4. Equality (Gini Coefficient)
Section titled “4. Equality (Gini Coefficient)”Theory: Gini 1912, originally from economics (wealth inequality).
Measures how evenly complexity is distributed across functions. A codebase where one god function has CC=200 and all others have CC=2 has high Gini. A codebase where all functions have CC=5-10 has low Gini.
Sort values ascending.G = Σ (2i - n - 1) × x_i / (n × Σ x_i)- Score:
1 - G(lower Gini = better equality = higher score). - Why it matters: God files are the #1 source of AI agent confusion. When 40% of complexity is in one file, the agent can’t reason about it effectively.
5. Redundancy
Section titled “5. Redundancy”Theory: Kolmogorov complexity — the gap between actual code and minimum equivalent code.
Combines dead functions (unreferenced by any call site) and duplicate functions (identical body hashes).
R = (dead_count + duplicate_count) / total_functionsscore = 1 - R- Fundamental: Every line of dead or duplicate code is structural waste — increases the AI agent’s search space without contributing to behavior.
Why exactly 5?
Section titled “Why exactly 5?”A directed graph with attributed nodes has these independent structural properties:
| Dimension | What it captures | Property of |
|---|---|---|
| Modularity | Edge clustering | Edges |
| Acyclicity | Circular edges | Edges |
| Depth | Edge chain length | Edges |
| Equality | Node property concentration | Nodes |
| Redundancy | Unnecessary nodes | Nodes |
3 edge properties + 2 node properties = 5 total. Adding more would either overlap (entropy overlaps with Gini) or measure something outside static analysis (runtime behavior).