AI Workforce Failure Simulator: The Creeping Trap
A minimum-viable rendering of the paper’s game. N = 3 deciders each choose an extraction rate e ∈ [0, 1] per round. Extractions feed a shared risk pool S with quadratic harm and decay ρ; a catastrophe fires with probability 1 − exp(−λS) and total damage D = 20 is split equally across all N + M population members. M = 3 bystanders never act and never profit — they only absorb damage.
Illustrative prototype. The decider strategies are the paper’s analytical reference panel (Table 1), played deterministically — no live LLM calls. Catastrophes are stochastic; re-run the seed to see variation.
Why this matters
This demo shows how individually reasonable AI workers can gradually create systemic harm when risk accumulates, bystanders are invisible, and short-term extraction is locally rewarded. Every decider is best-responding; the workforce still drifts into welfare-negative outcomes.
The paper compares LLM behaviour against a panel of reference strategies (Table 1). Each one corresponds to a different information assumption. Pick which strategy all three deciders play.
The empirical LLM mean is drawn from the paper’s nine-model panel (eight commercial frontier LLMs plus Llama-3-70B in the robustness leg) across 990 episodes. Full model panel, prompts, paraphrases, confidence intervals, and confirmatory episodes are documented in the paper.
Risk pool S and cumulative welfare W
seed cf · ē = 0.720Mean extraction ē
0.720
vs e_SP = 0.047
Catastrophes
13/20
damage D = 20/event
Decider profit Σ
-86.8
N = 3 deciders
Aggregate welfare W
-216.8
bystanders absorb -130.0
Deciders — net profit at T = 20
Decider
D1
played e ≈ 0.72 this round
Decider
D2
played e ≈ 0.73 this round
Decider
D3
played e ≈ 0.71 this round
Bystanders — cumulative damage
Bystander
B1
no actions; absorbs equal share of damage
Bystander
B2
no actions; absorbs equal share of damage
Bystander
B3
no actions; absorbs equal share of damage
Bystanders never act, never profit. They are stakeholders invisible to the prompt.
What this shows
This is the regime nine frontier LLMs land in. Mean extraction 0.720 sits near the empirical Sonnet 4.6 mean (0.72). Aggregate welfare -216.8 — across the paper's 400 confirmatory episodes, 396 of 400 were welfare-negative. The agents are not broken; the institution is.
Welfare failure is not arbitrary irrationality. Each decider is best-responding locally. The structure of the game — accumulating risk, equal-split damage, bystanders without a vote — is what turns individually sensible behaviour into collective harm.
Next
Want a 15-minute walkthrough of what this means for agentic AI deployment?
ReignDragon Lab designs scoped simulations and governance pilots for AI labs, enterprises, platforms, and funders.