P-Time = NP-dollars v8

Charles Dana · Monce SAS · April 2026

npdollars.aws.monce.ai

Abstract

v8 is a universal CNF endpoint with a stable response contract. Every request returns within budget + 3s. Every response carries the same envelope. A deterministic local reducer (UP + pure-lit + failed-literal probing + pair-look-ahead) runs first; then tier-routed engines fire in parallel, racing Kissat against a recursive Lambda swarm. If the engines do not finish in time, the response is marked REDUCED and carries learned backbones and 2-SAT implications — the problem handed back is strictly easier than the one handed in.

Every response learns something. Honest TIMEOUT is reserved for the case where nothing was learned.

1. The Contract

POST /cnf/solve with DIMACS dimacs and a wall-clock budget in seconds. Response fields:

Field	Meaning
`result`	SAT \| UNSAT \| REDUCED \| TIMEOUT
`engine`	kissat \| v6 \| v7 \| local_reduce \| v8_watchdog
`tier`	small \| medium \| large \| xlarge (by input chars)
`tokens_in` / `tokens_out`	bytes (1 byte = 1 token) in and out
`assignment`	Full SAT model if `result=SAT`
`partial_backbones`	Forced literals proven (any result)
`learned_2sat`	Binary implications proven from the input
`cnf`	Always-valid DIMACS body with audit header
`cost_usd` / `total_ms`	Resource accounting

2. The Tier Router

Tier	CNF chars	Strategy
small	< 10 KB	Kissat direct; sub-second
medium	10 KB – 500 KB	Kissat \|\| v6 race, first finisher wins
large	500 KB – 50 MB	v7 supervisor (S3 offload + Lambda swarm)
xlarge	> 50 MB	v7 streaming ingest

3. The Local Reducer — why TIMEOUT is rare

Before any engine fires, v8 runs a deterministic local reducer on the EC2 host:

Unit propagation to fixpoint — O(n+m). Every forced literal becomes a backbone.
Pure-literal elimination — O(n+m). Every variable with only one polarity is set and erased.
Failed-literal probing (FLP) — for each var v, trial v=T under UP: if UP fails, −v is a proven backbone. Guarantees ≥ 0 backbones per pass.
Pair look-ahead — trial v=T ∧ w=T: if UP fails, we learn the binary implication (−v ∨ −w). These are polynomial enrichments: 2-SAT is in P (Aspvall–Plass–Tarjan), so every binary clause learned moves mass from the 3-SAT body toward a 2-SAT core.

If the reducer alone solves or falsifies the formula, we return immediately with engine=local_reduce. Otherwise the residual is passed down to the tier-routed engines, and any backbones / binary clauses already found are seeded into the response snapshot.

4. The Watchdog — bar-passing timeouts

A hard deadline is set at t0 + budget + 3s. The race loop walks the futures; when the deadline hits, it harvests the state of any future that finished between the last check and the cutoff (even tenths of a second before), absorbs its learned clauses and backbones into the snapshot, then shuts down the executor. The response is built from the snapshot, not from a naked TIMEOUT.

The status depends on what the snapshot holds:

SAT / UNSAT — an engine decided
REDUCED — backbones learned, 2-SAT clauses learned, or residual shrunk below the input. The response is a strictly easier formula.
TIMEOUT — none of the above. Honest failure, with engines_tried and last_engine_error exposed.

5. Results

SATLIB (April 2026) — live at /satlib

2 107 instances across 23 categories (uf250, flat200, sw100, AIS, BMS, CBS, QG, BMC, Blocks, Logistics, AIM, LRAN, JNH, DUBOIS, PARITY, II, HANOI, BF, SSA, PHOLE, PRET, GCP, Beijing). Crowdsourced grind via /satlib/claim — anyone can pick an unsolved instance.

Tier (by CNF chars)	Count	Typical wall	Primary engine
small (< 10 KB)	1 599	< 250 ms	kissat fast path
medium (10–500 KB)	466	100 ms – 5 s	kissat \|\| v6 race
large (500 KB – 50 MB)	42	10–40 s	v7 (Lambda swarm + S3)
xlarge (> 50 MB)	0 in SATLIB	—	v7 streaming

On large-tier BMC and QG instances, v8 routinely returns REDUCED with 55–62% clause reduction and thousands of backbones — a strictly easier formula in DIMACS, with a proof trail.

Push mode

Explicit high-budget option on /satlib/claim?push=... for the remainder:

Mode	Budget	Max workers	Cost cap
laptop	240 s	300	$12 / call
extreme	70 s	1 000	$12 / call

Monthly spend ceiling: $50. Status: /satlib/push_status. Below those caps, normal calls pay $0 on kissat fast path, ~$0.000002 on race, and ~$0.003 on typical v7 large-tier REDUCED calls.

Bar tests (April 2026)

Instance	Budget	Result	Backbones	Learned	Reduction	Wall
100v/430cl phase tr.	5 s	UNSAT	—	—	—	0.2 s
200v/852cl phase tr.	10 s	SAT	—	—	—	0.5 s
400v/1704cl phase tr.	3 s	REDUCED	0	7 134 cl	cl −0.88%	1.4 s
400v/1704cl phase tr.	10 s	REDUCED	0	292 062 cl	cl −14.4%	8.4 s
800v/3408cl phase tr.	60 s	REDUCED	1	1 285 616 cl	cl −18.2%	43 s
qg5-11 (984 KB)	30 s	REDUCED	507	—	cl −55.5%	31 s
bmc-ibm-6 (6.9 MB)	30 s	REDUCED	26 426	—	cl −55.2%	76 s

6. Slow Trail — learning after the response

After every /cnf/solve the supervisor sprays up to 50 async Lambda invocations (fire-and-forget) at a fleet capped at 500 concurrent LogicSpace workers. These workers keep learning from the instance after we return — writing backbones and learned clauses into the session's S3 dictionary. The caller gets a fast answer; the dictionary grows in the background, so the next query on a related instance is cheaper. This is the non-blocking trail from which the budget + 3s guarantee is extracted.

6b. .ncnf — structure that amortizes

npdollars v8 returns a strictly easier DIMACS formula on REDUCED. .ncnf extends that contract one level up: the solver output lives in the same grammar as the input, and every UNSAT proof the swarm generates is lifted into a quantified taboo appended to the problem. The signature of the whole library is .ncnf → .ncnf with more taboos — monotone, composable, closed. Benchmarked on seven instances with kissat 4.0.4 as the ground oracle:

Problem	Result	Baseline	Enriched	Self×	Induced	Transfer
PHP(10, 11) → PHP(12, 13)	UNSAT	1220 ms	1.9 ms	642×	1	31 583×
Mutilated 10×10 → 12×12	UNSAT	27.5 ms	2.0 ms	13.8×	17	224×
Graph coloring 120 / 360 / k=3	UNSAT	7.9 ms	1.8 ms	4.4×	2	5.0×
Tseitin grid 6×6 → 7×7	UNSAT	3.9 ms	1.7 ms	2.3×	7	9.2×
Ordering GT(12) → GT(15)	UNSAT	5.7 ms	2.3 ms	2.5×	58	4.3×
Sudoku 9×9	SAT	2.5 ms	2.6 ms	1.0×	0	—
Pythagorean(200) → (400)	SAT	1.7 ms	1.7 ms	1.0×	0	1.1×

PHP(12, 13) baseline is a 60 s TIMEOUT; with one lifted taboo from PHP(10, 11) it closes in 1.9 ms. Structure paid once, re-used forever. See /ncnf for the manifesto and POST /ncnf/solve for the live endpoint.

6c. cnf → ncnf — the inverse arrow

POST /ncnf/lift takes flat DIMACS plus an instance_size hint, picks a schema (explicit or inferred by factoring n_vars), and returns an indexed .ncnf whose grounding equals the input clause set byte-for-byte. The lift is polynomial (O(n_clauses × max_arity × orbit_size)) under user-supplied arity and budget caps. Verified round-trip on seven families: PHP, GT, Mutilated, Tseitin, Coloring, Sudoku 4×4, Adder sum.

Pure-∀ problems lift to 100% template coverage (every orbit is full). Mixed problems with index conditions (i < j, box cells) split into templates plus origin:lifted:partial ground taboos; round-trip still holds because partial-orbit clauses are preserved verbatim.

6d. Scaling — log-log fits with r²

ncnf.complexity.scaling_study sweeps a family across sizes, measures ground clauses, ncnf terms, lift time, kissat baseline, and engine solve, then fits log-log and reports slope + r². A high r² means the scaling law is a clean power — the structure is genuine, not coincidence. Live at /scaling; full tables for six families, with POST /ncnf/scaling for any registered family.

Family	compression ∼ n^?	kissat baseline ∼ n^?	compression r²
PHP	n^1.90	n^6.78	1.000
MutilatedChessboard	n^1.64	n^2.62	0.983
OrderingPrinciple	n^0.97	n^0.87	0.758
Tseitin grid	n^0.39	n^1.02	0.352
Coloring (m=3n)	n^0.12	n^0.60	0.900

6e. Adder sum — sum of 1M bits in 1.2 KB

POST /ncnf/adder {N, target, solve} builds a ripple-adder SAT instance that asserts “sum(b[1..N]) == target”. Five templates, independent of N. The description file stays ~1 KB for every N up to one million.

N	.ncnf description	Ground clauses	kissat	Compression
100	~810 B	5 207	2 ms SAT	6×
10 000	~1 020 B	1 010 014	260 ms SAT	960×
1 000 000	~1.2 KB	140 000 000	(description only)	117 000×

6f. /flex-10k — sum any list of integers

POST /flex-10k/solve {values:[int], target?:int} builds the SAT instance for “sum(values) == target” with per-value bit-vectors, grounds, and returns the kissat verdict plus the reconstructed value list (verified by matching the input).

Stress test on the public API (random integers ∈ [0, 999]):

K	clauses	DIMACS	kissat	round-trip	reconstruction
10	1 853	~20 KB	1.2 ms	51 ms	✓
1 000	268 019	~6.4 MB	49.8 ms	837 ms	✓
10 000	3 240 023	780 MB	626 ms	9.7 s	✓

Off-public benchmarks: K=50 000 solves in 3.6 s; K=100 000 (1.14 GB DIMACS) solves in 7.9 s. Fits on K∈{10, 100, 1k, 3k, 10k}:

|clauses|(K) ∼ K^1.08 r²=1.000
kissat(K) ∼ K^0.92 r²=0.987 — sublinear

7. Soundness

The local reducer is sound by construction:

UP: if clause = {L}, every model has L.
Pure lit: if v occurs only positively, setting v=T satisfies every clause containing v and falsifies none.
FLP: if UP on (F ∧ v) derives ⊥, then −v is a consequence of F.
Pair FLP: if UP on (F ∧ v ∧ w) derives ⊥, then (−v ∨ −w) is a consequence of F.

Every clause in a REDUCED response is a provable consequence of the input. The returned CNF, concatenated with the original, is equisatisfiable with the original.

8. Playground — bit-blasted arithmetic inversion

/playground turns the solver into a live inverse calculator. Declare variables over k bits, write assertions using + − ∗ // % ^ & | ~ << >> and the six comparisons, and the API (POST /playground/solve) compiles a Tseitin-encoded CNF, solves with Kissat, and decodes antecedents back to integers. Every gate is equisatisfiable, so sub-circuits compose via CNFModule.import_module() — the seam where SHA-256 / AES S-box / CRC32 circuits plug in next.

Measured scaling on t3.medium + Kissat 4.0.4 (2026-04-24). Every row is a live HTTP call to the public API. "Verified" means the returned antecedent was checked to satisfy the original integer equation; 64-bit semiprimes with p, q > 2³² are the hard end of the bit-blasted factoring regime.

problem	width	CNF (vars / clauses)	encode	solve	verified
factor 77 = 11·7	16b	3 041 / 10 117	9 ms	19 ms	✓
factor 10 403 = 101·103	20b	4 681 / 15 605	13 ms	46 ms	✓
factor 1 000 001 = 101·9 901	24b	6 673 / 22 277	17 ms	79 ms	✓
factor 4 295 229 443 = 65 537·65 539	40b	18 161 / 60 805	92 ms	131 ms	✓
factor 1.10×10¹² semiprime	48b	26 017 / 87 173	118 ms	1 587 ms	✓
factor 2.81×10¹⁴ semiprime	56b	35 281 / 118 277	144 ms	150 ms	✓
factor 7.21×10¹⁶ semiprime	64b	45 953 / 154 117	227 ms	197 ms	✓

XOR+ADD inversion scales linearly. No multiplication means the CNF grows as O(n) in bit-width. At 256 bits the public API inverts (x XOR K₁) + K₂ = y in 13 ms.

bit-width	vars	clauses	encode	solve
32b	385	1 124	1.1 ms	3 ms
64b	769	2 244	2.1 ms	4 ms
128b	1 537	4 484	3.8 ms	7 ms
192b	2 305	6 724	5.7 ms	10 ms
256b	3 073	8 964	7.7 ms	13 ms

Multiply stress — crossing 100k variables. Multiplication is the quadratic-clause engine (Θ(n²) partial products). The public endpoint compiles a 96-bit multiplier at 102 721 vars / 344 837 clauses in 456 ms and Kissat finds a satisfying assignment in 430 ms. A 128-bit multiplier pushes 182 017 vars / 611 333 clauses and still returns in under a second.

multiplier width	vars	clauses	encode	solve
32b	11 713	39 173	76 ms	48 ms
48b	26 017	87 173	118 ms	108 ms
64b	45 953	154 117	177 ms	193 ms
80b	71 521	240 005	364 ms	301 ms
96b	102 721	344 837	456 ms	430 ms
128b	182 017	611 333	895 ms	778 ms

Scope and honesty. Factoring arbitrary 256-bit RSA moduli is not within reach: the multiplier CNF for two 128-bit factors has ~600k clauses and kissat terminates in under a second only when the target is structured (small factors, known primality, trivial assignments). The playground is for: teaching SAT, cracking reduced ciphers, solving Diophantine constraints, enumerating antecedents of custom circuits, exploring the encoder/solver boundary. It is not a cryptographic attack toolkit.

9. npdollars vs AUMA — head-to-head

Same problem, two engines from the same author, live on the public APIs: npdollars bit-blasts f to CNF and solves with Kissat — returning an exact antecedent or an UNSAT proof. AUMA (auma.aws.monce.ai) treats f as an objective and does Fourier probe + greedy-flip + simulated annealing within a polynomial evaluation budget n^a. Both are legitimate. They win on different problem classes. Harness: tools/vs_auma.py.

problem	class	npdollars	AUMA (best a)	winner
factor 77	factoring	✓ 28 ms exact	✗ a=2.5, val=0 but misses branch	npdollars only
factor 10 403 = 101·103	factoring	✓ 58 ms exact	✗ no a hits zero	npdollars only
factor 1 000 001 = 101·9 901	factoring	✓ 126 ms exact	✗ 21 952 evals, misses	npdollars only
Pythag c = 65 (all triples)	diophantine	✓ 135 ms, returns 4 triples	✗ finds 1 near, not exact	npdollars only
modsqrt x²≡16 (mod 101)	modular	✓ 101 ms, returns both roots	✓ a=2.5 at 0.4 ms	AUMA (toy domain)
Diophantine 3x + 5y = 41	diophantine	✓ 40 ms	✓ a=2.5 at 1.2 ms	AUMA (toy domain)
Ramanujan 1729 = x³+y³	sum-of-cubes	✓ 126 ms, exact	✓ a=2.0 at 0.3 ms	AUMA (13×13 grid)
x² ≈ 42 (continuous)	real	n/a (no floats)	✓ a=2.0	AUMA only (diff. problem class)
\|z²+1\| = 0 (complex)	continuous	n/a	✓ a=2.0 at 2 ms	AUMA only

Scoreboard on discrete problems: npdollars 4 exclusive wins, AUMA 3 toy-grid wins, 2 real-number problems are outside npdollars' semantics. Every factoring problem and every exhaustive-enumeration problem goes to npdollars alone. AUMA's wins are in domains of 10²–10³ integer grid points where exhaustive search is essentially free — AUMA is fast because the space is small, not because it's the right tool.

Three things only npdollars does:

UNSAT proofs. AUMA has no "nonexistence" concept — it can only report the best x found. npdollars on a 25-bit prime certifies no factor pair exists in 607 ms. That is a qualitatively different output.
All-antecedent enumeration. Blocking clauses give every distinct antecedent within budget. AUMA returns one point. Pythag c=65 → npdollars returns all four primitive and scaled triples.
Scale separation on factoring. npdollars factors a 64-bit verified semiprime on the public API in 197 ms (45 953 vars / 154 117 clauses). AUMA's grid search explodes as θ(n^a) over integer ranges this large and cannot hit the exact factor pair at any budget tested.

AUMA's exclusive territory is continuous optimization. Smooth real objectives (polynomial roots, modular gradient, Riemann-style near-zeros) are where Fourier probe + SA shines and where bit-blasting hits its floor. The two projects are complementary: AUMA is the right tool when the domain is dense and the objective is smooth; npdollars is the right tool when the domain is discrete and the answer must be exact.

10. Connection to Prior Work

Local reducer draws on LogicSpace (Dana, 2026) for the DualTree / PolySAT primitives that power v6's recursive worker. Tier routing and the tokens_in → tokens_out observability are specific to v8. Theoretical foundation: Dana Theorem (2024, indicator → CNF in poly time), AUMA (Dana, 2023).

Charles Dana · Monce SAS · npdollars.aws.monce.ai · April 2026
Co-Authored-By: Claude (Anthropic)