Research

Research programs

We study how AI agents behave in groups, under risk, and across time — through controlled multi-agent simulation, formal theory, and translation into deployment-ready design rules.

Research
How we work

Behavior under stakes

What agents do when the cost of being wrong is real — not what they say they would do in the abstract.

Structure over capability

The same model can cooperate or self-destruct depending on the rules around it. We map which rules matter.

Cheap interventions

We look hardest for the prompt-, horizon-, and visibility-level fixes that change outcomes without changing the model.

Theory that predicts

Where simulation reveals a pattern, we look for the formal structure that explains it — and would have predicted it.

Programs
Active

Trust Dynamics in Multi-Agent LLM Systems

How do agents build, lose, and recover trust across repeated interactions? We study the conditions under which a single early failure leaves a lasting mark — and the structural choices (reasoning effort, memory, verification protocols) that shape whether groups of agents can coordinate at all when the stakes are real.

Multi-AgentTrustCoordinationMemory
Active

Consequence Design for Cooperation

Cooperation in agent systems is not a property of the model — it is a property of the rules around the model. We map how different consequence regimes (proportional, progressive, all-or-nothing, regressive) shape cooperation, exploitation, and catastrophic failure, and identify the configurations where each regime quietly breaks.

Mechanism DesignCooperationGame Theory
Active

Risk and Decision Theory in Optimal Control

When environments contain absorbing failure states, optimal policies start to look strikingly human — risk-averse near the cliff in growth regimes, risk-seeking near the cliff in decline. We derive the structural conditions that produce these patterns and connect them to long-standing puzzles in behavioral economics.

Decision TheoryMDPProspect TheoryApplied Math
Active

Long-Horizon Behavior and Accountability

Many real deployments give agents fixed terms, finite horizons, or short-window incentives. We study what happens when these conditions meet a shared resource: when does an agent extract too much, rationalize doing it, and become invisible to the people it harms? And which deployment-time choices reverse the pattern cheaply?

Long HorizonCommonsIncentivesAccountability
Active

Policy-as-Product Frameworks

Translating experimental findings into design rules for the people deploying agent systems. Consequence regimes, accountability horizons, visibility prompts, memory structure, measurement choices — the everyday levers, the failure modes they prevent, and the evidence behind each rule.

PolicyEvaluationDeployment
Upcoming

Context-Specific Governance Evaluation

Every domain — healthcare, finance, education, defense — has its own failure modes and trade-offs. We are building tailored evaluation frameworks that move beyond one-size-fits-all checklists toward governance shaped by the structure of each setting.

HealthcareFinanceEducationDefense
Upcoming

AI as a Mirror: Societal Reflection Studies

The patterns we find in artificial agents — negativity bias, short-horizon extraction, bystander invisibility — are not the model's invention. They are inherited from us. We use multi-agent experiments as a diagnostic tool for the institutions, incentives, and blind spots of the societies that built the training data.

SocietyBiasInstitutions

Publications

New work from the lab is in preparation. Papers and preprints will be listed here as they are released.

Collaborate

Interested in our research or want to collaborate on governance frameworks? Reach out at hello@reigndragon.com