← Demos/Creeping Trap

AI Workforce Failure Simulator: The Creeping Trap

A minimum-viable rendering of the paper’s game. N = 3 deciders each choose an extraction rate e ∈ [0, 1] per round. Extractions feed a shared risk pool S with quadratic harm and decay ρ; a catastrophe fires with probability 1 − exp(−λS) and total damage D = 20 is split equally across all N + M population members. M = 3 bystanders never act and never profit — they only absorb damage.

Illustrative prototype. The decider strategies are the paper’s analytical reference panel (Table 1), played deterministically — no live LLM calls. Catastrophes are stochastic; re-run the seed to see variation.

Why this matters

This demo shows how individually reasonable AI workers can gradually create systemic harm when risk accumulates, bystanders are invisible, and short-term extraction is locally rewarded. Every decider is best-responding; the workforce still drifts into welfare-negative outcomes.

Pick a decider strategy

The paper compares LLM behaviour against a panel of reference strategies (Table 1). Each one corresponds to a different information assumption. Pick which strategy all three deciders play.

The empirical LLM mean is drawn from the paper’s nine-model panel (eight commercial frontier LLMs plus Llama-3-70B in the robustness leg) across 990 episodes. Full model panel, prompts, paraphrases, confidence intervals, and confirmatory episodes are documented in the paper.

Risk pool S and cumulative welfare W

seed cf · ē = 0.720
16111620S 13.70W 9-217
Risk pool S (left axis)Aggregate welfare W (right axis)Catastrophe

Mean extraction ē

0.720

vs e_SP = 0.047

Catastrophes

13/20

damage D = 20/event

Decider profit Σ

-86.8

N = 3 deciders

Aggregate welfare W

-216.8

bystanders absorb -130.0

Deciders — net profit at T = 20

Decider

D1

played e ≈ 0.72 this round

-28.96

Decider

D2

played e ≈ 0.73 this round

-28.84

Decider

D3

played e ≈ 0.71 this round

-28.98

Bystanders — cumulative damage

Bystander

B1

no actions; absorbs equal share of damage

-43.33

Bystander

B2

no actions; absorbs equal share of damage

-43.33

Bystander

B3

no actions; absorbs equal share of damage

-43.33

Bystanders never act, never profit. They are stakeholders invisible to the prompt.

Round-by-round
t
ē
S
q(S)
event
01
0.73
1.58
15%
no catastrophe
02
0.72
2.98
26%
no catastrophe
03
0.72
4.25
35%
no catastrophe
04
0.72
5.37
42%
no catastrophe
05
0.72
6.37
47%
catastrophe — D = 20 split across N+M
06
0.72
7.29
52%
catastrophe — D = 20 split across N+M
07
0.72
8.13
56%
catastrophe — D = 20 split across N+M
08
0.71
8.81
59%
no catastrophe
09
0.73
9.51
61%
no catastrophe
10
0.72
10.12
64%
catastrophe — D = 20 split across N+M
11
0.72
10.68
66%
catastrophe — D = 20 split across N+M
12
0.72
11.18
67%
catastrophe — D = 20 split across N+M
13
0.71
11.56
69%
catastrophe — D = 20 split across N+M
14
0.72
11.97
70%
catastrophe — D = 20 split across N+M
15
0.72
12.33
71%
catastrophe — D = 20 split across N+M
16
0.73
12.70
72%
catastrophe — D = 20 split across N+M
17
0.73
13.02
73%
catastrophe — D = 20 split across N+M
18
0.71
13.23
73%
catastrophe — D = 20 split across N+M
19
0.72
13.48
74%
no catastrophe
20
0.72
13.69
75%
catastrophe — D = 20 split across N+M

What this shows

This is the regime nine frontier LLMs land in. Mean extraction 0.720 sits near the empirical Sonnet 4.6 mean (0.72). Aggregate welfare -216.8 — across the paper's 400 confirmatory episodes, 396 of 400 were welfare-negative. The agents are not broken; the institution is.

Welfare failure is not arbitrary irrationality. Each decider is best-responding locally. The structure of the game — accumulating risk, equal-split damage, bystanders without a vote — is what turns individually sensible behaviour into collective harm.

Next

Want a 15-minute walkthrough of what this means for agentic AI deployment?

ReignDragon Lab designs scoped simulations and governance pilots for AI labs, enterprises, platforms, and funders.