Root Cause Metrics

1. Modularity (Newman’s Q)

Theory: Newman 2004, graph community detection.

Measures how well the dependency graph decomposes into independent clusters. Compares actual intra-module edge density against a random graph with the same degree sequence.

Q = (1/m) × Σ [A_ij - k_out_i × k_in_j / m] × δ(c_i, c_j)

Range: [-0.5, 1.0]. Q > 0.3 = significant modular structure.
Ungameable: Adding useless edges moves the graph closer to random, which decreases Q. Only genuine modular restructuring improves Q.
Language-fair: Works on ANY graph. Uses both import edges and call edges.
Replaces: coupling + cohesion + god files + hotspots (all symptoms of low Q).

2. Acyclicity

Theory: Martin 2003, Acyclic Dependencies Principle.

Measures absence of circular dependencies. Cycles make build order undefined, change propagation unpredictable, and testing difficult.

Computation: Tarjan’s SCC algorithm counts strongly connected components with >1 member.
Normalization: score = 1 / (1 + cycle_count) — sigmoid because count is unbounded.
Fundamental: A depends on B depends on A — neither can be understood or tested independently.

3. Depth

Theory: Lakos 1996, levelization and layered architecture.

Measures the longest dependency chain in the DAG. Deep chains mean a change at the bottom propagates through many layers.

Computation: Iterative longest-path DFS from entry points.
Normalization: score = 1 / (1 + depth / 8) — midpoint 8 (depth 8 = score 0.5).
Independent from Q: A graph can have perfect modularity but still have a chain of 20 modules depending sequentially.

4. Equality (Gini Coefficient)

Theory: Gini 1912, originally from economics (wealth inequality).

Measures how evenly complexity is distributed across functions. A codebase where one god function has CC=200 and all others have CC=2 has high Gini. A codebase where all functions have CC=5-10 has low Gini.

Sort values ascending.
G = Σ (2i - n - 1) × x_i / (n × Σ x_i)

Score: 1 - G (lower Gini = better equality = higher score).
Why it matters: God files are the #1 source of AI agent confusion. When 40% of complexity is in one file, the agent can’t reason about it effectively.

5. Redundancy

Theory: Kolmogorov complexity — the gap between actual code and minimum equivalent code.

Combines dead functions (unreferenced by any call site) and duplicate functions (identical body hashes).

R = (dead_count + duplicate_count) / total_functions
score = 1 - R

Fundamental: Every line of dead or duplicate code is structural waste — increases the AI agent’s search space without contributing to behavior.

Why exactly 5?

A directed graph with attributed nodes has these independent structural properties:

Dimension	What it captures	Property of
Modularity	Edge clustering	Edges
Acyclicity	Circular edges	Edges
Depth	Edge chain length	Edges
Equality	Node property concentration	Nodes
Redundancy	Unnecessary nodes	Nodes

3 edge properties + 2 node properties = 5 total. Adding more would either overlap (entropy overlaps with Gini) or measure something outside static analysis (runtime behavior).