02Creeping Trap

Individually Sensible, Collectively Harmful: The AI Commons Problem

AI agents do not need to be irrational to damage the commons.

Provoking question

What if the real danger is not misaligned agents, but well-optimized agents playing the game we gave them?

Abstract

The most dangerous AI failures may not come from agents behaving irrationally. They may come from agents behaving sensibly inside systems that reward local success while hiding collective harm.

This paper introduces Creeping Trap, a repeated commons benchmark for LLM agents. Each agent chooses how much to extract from a shared system. Extraction gives immediate reward, but it also accumulates catastrophe risk. When catastrophe happens, the damage is shared across the population, including silent bystanders who do not act, do not profit, and do not get a vote.

Across a wide range of LLMs and experimental conditions, agents consistently over-extract relative to welfare-aligned baselines. The striking point is that their behavior is not simply random or incoherent. It often lies near a broad low-regret region of local best response. The agents are not obviously “broken.” They are making individually sensible choices that become collectively harmful.

The study also identifies three design levers. Prompt wording strongly shifts behavior, meaning that prompt form is part of the experimental treatment. Making bystander harm visible consistently reduces extraction. Shorter accountability horizons increase extraction and produce end-of-term defection, where agents take the reward and leave future consequences behind.

Why it matters

A multi-agent system can look successful by individual performance metrics while damaging the broader environment. If evaluations only measure whether each agent is doing well, they may miss whether the population is destroying the commons.

Core insight

Local coherence is not social safety.

Resources

Try the demoLive

Read the paperComing soon

Benchmark & codeComing soon

← Previous

No Safe Default

Trust Under Fire