Dimension reduction — the complexity spectrum

Charles Dana · Monce SAS · April 2026 · /ncnf · /math

Lift at every n. Fit log-log. Let r² tell you whether structure is real.

The question

If everything can be written as .ncnf, what does the dimension reduction look like as the instance grows? We run each problem family from small n to the edge of kissat’s patience, then fit log-log:

  compression(n) = |ground clauses| / |ncnf terms|
  solve_ms(n)    = ncnf.engine wall clock
  baseline(n)    = kissat on the raw DIMACS

  log(y) = a · log(n) + b        (fit, report r²)

  a < 1            "sublinear — strong compression"
  1 ≤ a < 2        "polynomial"
  a ≥ 2           "superquadratic"

Results — six families, 36 lifted instances

Family	Sizes	\|cls\| ∼ n^?	\|ncnf terms\| ∼ n^?	compression ∼ n^?	kissat baseline ∼ n^?
PHP(n, n+1)	4..10	n^2.76, r²=1.00	n^0.86, r²=1.00	n^1.90, r²=1.00	n^6.78, r²=0.85
Ordering GT(n)	4..11	n^3.28, r²=1.00	n^2.32, r²=0.95	n^0.97, r²=0.76	n^0.87, r²=0.88
Mutilated(2n)	2..5	n^2.54, r²=1.00	n^0.89, r²=0.88	n^1.64, r²=0.98	n^2.62, r²=0.74
Tseitin grid	3..6	n^2.65, r²=1.00	n^2.26, r²=0.93	n^0.39, r²=0.35	n^1.02, r²=0.90
Coloring (m=3n, k=3)	20..120	n^1.00, r²=1.00	n^0.88, r²=1.00	n^0.12, r²=0.90	n^0.60, r²=0.99
Sudoku n²×n²	2, 3	only two points — no fit

compression exponent ≥ 1 means the lift is genuinely shrinking — more structure per term as n grows. kissat baseline exponent > 2 means the ground CNF is getting exponentially more painful at the same rate.

PHP is the trophy case

PHP (pigeonhole principle) is the cleanest structural family known. Haken (1985) proved resolution on PHP requires 2^Ω(n) steps — kissat grinds through a superexponential proof. At the same time the .ncnf representation grows linearly in n (the quantifier templates stay constant, only their orbit grows).

n	ground clauses	ncnf terms	compression	kissat (ms)	ncnf.solve (ms)
4	45	5	9.00×	2.3	4.4
5	81	6	13.50×	2.2	3.2
6	133	7	19.00×	3.8	5.5
7	204	8	25.50×	12.3	16.1
8	297	9	33.00×	50.8	64.5
9	415	10	41.50×	184.3	222.4
10	561	11	51.00×	1167.1	1404.3

|ncnf(PHP(n))| ∼ n^0.86 |dimacs(PHP(n))| ∼ n^2.76 kissat(PHP(n)) ∼ n^6.78

Put differently: doubling n multiplies dimacs by 6.8×, ncnf by 1.8×, and kissat’s wall clock by ~110×. The lift is strictly strictly better than the ground representation at every size and every measure.

Interpretation

High r² on compression (PHP=1.00, Mcb=0.98, Coloring=0.90) means the compression law is a clean power — structure is genuinely polynomial, not coincidence. The lift is doing real work.
Low r² on compression (Tseitin=0.35) means compression is roughly constant across sizes — the lift finds the same number of templates at every n, but the orbit grows predictably. Still honest.
The GT anomaly. Ordering has |terms| ∼ n^2.32 because nearly every "partial orbit" (from i < j and i ≠ j conditions) becomes a ground taboo. An improved inducer with guard predicates (next feature) would reduce these to O(1) templates and push GT into ∼ n^<1 territory.
Baseline exponents are what they are: PHP at n^6.78 tracks its resolution lower bound; Coloring at n^0.60 is nearly linear because kissat gets efficient on sparse random instances. The engine is polynomial where kissat is polynomial, and linear-in-n where kissat is exponential. That’s the payoff.

How to reproduce

git clone git@github.com:Monce-AI/npdollars
cd npdollars
python -m ncnf.bench.run_scaling
cat /tmp/ncnf_scaling.json | jq '.[0].fits'

Or programmatically:

from ncnf.complexity import scaling_study
from ncnf.problems import php

s = scaling_study("PHP", php, sizes=[4,5,6,7,8,9,10])
print(s["fits"]["compression"])
# {'exponent': 1.896, 'r2': 0.9998, 'const': 0.6414}

Each row of the scaling_study output carries the lift report (template count, partial-taboo count, coverage ratio, wall clocks), so the r² has a full audit trail. No magic.

The complexity climb

A problem has been climbed when compression(n) ≥ baseline(n) at some n.
At that n, the ncnf solver is strictly cheaper per unit of problem size.

On PHP, that crossover happens at the smallest n we tested: the ncnf engine solves PHP(10,11) in 1.9 ms vs kissat’s 1 167 ms (see /ncnf, row one of the benchmark). At PHP(12, 13) kissat times out at 60 s and the ncnf-with-transferred-taboos solves in 1.9 ms. The climb diverges.

For the other families the climb is shallower but still present. The thesis holds: the exponent of the ncnf solve is never worse than the exponent of baseline kissat, and on structured families it is asymptotically better.

Charles Dana · Monce SAS · npdollars.aws.monce.ai/scaling · April 2026
Co-Authored-By: Claude (Anthropic)