aises_7_2

<style type="text/css">
    table.prisonTable{
        margin: auto;
        border: 1px solid;
        border-collapse: collapse;
        border-spacing: 1px;
        caption-side: bottom;
    }

    table.prisonTable tr{
        border: 1px solid;
        border-collapse: collapse;
        padding: 5px;
    }

    table.prisonTable th{
        border: 1px solid;
        border-collapse: collapse;
        padding: 3px;
    }

    table.prisonTable td{
        border: 1px solid;
        padding: 5px;
    }
</style>


<style>
    .clearfix::after {
      content: "";
      clear: both;
      display: table;
    }
    .img-container {
      float: left;
      width: 48%;
      padding: 5px;
    }
</style>

<h1 id="game-theory">7.2 Game Theory</h1>
<h2 id="overview">7.2.1 Overview</h2>
<p>This chapter explores the dynamics that may arise when AI and human
agents interact. These interactions create risks distinct from those
generated by any individual AI agent acting in isolation. One way we can
study the strategic interdependence of agents is with the framework of
<em>game theory</em>. Using game theory, we can examine formal models of
how agents interact with each other under varying conditions and predict
the outcomes of these interactions.<p>
Here, we use game theory to present natural dynamics in biological and
social systems that involve multiple agents. In particular, we explore
what might cause agents to come into conflict with one another, rather
than cooperate. We show how these multi-agent dynamics can generate
undesirable outcomes, sometimes for all the agents involved. We consider
risks created by interactions within and between human and AI agents,
from human-directed companies and militaries engaging in perilous races
to autonomous AIs using threats for extortion. These risks can be reduced
 if mechanisms such as institutions are used ensure human agencies and AI 
agents are able to cooperate with one another and avoid conflict. We will 
be exploring means of overcoming commitment and information problems in 
the Conflict and Cooperation section of this chapter.</p>
<p><strong>We start with an overview of the fundamentals of game
theory.</strong> We begin this section by setting out the
characteristics of game theoretic agents. We also categorize the
different kinds of games we are exploring.</p>
<p><strong>We then focus on the Prisoner’s Dilemma.</strong> The
Prisoner’s Dilemma is a simple example of how an interaction between two
agents can generate an equilibrium state that is bad for both, even when
each acts rationally and in their own self-interest. We explore how
agents may arrive at the outcome where neither chooses to cooperate. We
use this to model real-world phenomena, such as negative political
campaigns. Finally, we examine ways we might foster rational cooperation
between self-interested AI agents, such as by altering the values in the
underlying payoff matrices. The key upshot is that intelligent and
rational agents do not always achieve good outcomes.</p>
<p><strong>We next add in the element of time by examining the
Iterated Prisoner’s Dilemma.</strong> AI agents are unlikely to interact
with others only once. When agents engage with each other multiple
times, this creates its own hazards. We begin by examining how iterating
the Prisoner’s Dilemma alters the agents’ incentives—when an agent’s
behavior in the present can influence that of their partner in the
future, this creates an opportunity for rational cooperation. We study
the effects of altering some of the variables in this basic model:
uncertainty about future engagement and the necessity to switch between
multiple different partners. We look at why the cooperative strategy
<em>tit-for-tat</em> is usually so successful, and in what circumstances
it is less so. Finally, we explore iterated multi-agent social dynamics
amongst humans, such as corporate AI races and military AI arms races.
The key upshot is that cooperation cannot be ensured merely by iterating
interactions through time.</p>
<p><strong>We then move on to consider group-level interactions.</strong> AI
agents might not interact with others in a neat, pairwise fashion, as
assumed by the models previously explored. In the real world, social
behavior is rarely so straightforward. Interactions can take place
between more than two agents at the same time. A group of agents creates
an environmental structure that may alter the incentives directing
individual behavior. Human societies are rife with dynamics generated by
group-level interactions that result in undesirable outcomes. We begin
by formalizing “collective action problems.” We consider real-world
examples such as anthropogenic climate change and fishery depletion.
Multi-agent dynamics such as these generate AI risk in several ways.
Races between human agents and agencies could trigger flash wars between
AI agents or the automation of economies to the point of human
enfeeblement. The key upshot is that achieving cooperation and ensuring
collectively good outcomes is even more difficult in interactions
involving more than two agents.</p>

<h2 id="game-theory-fundamentals">7.2.2 Game Theory Fundamentals</h2>
<p>In this section, we briefly run through some of the fundamental
principles of game theory. Game theory is the branch of mathematics
concerned with agents’ choices and strategies in multi-agent
interactions. Game theory is so-called because we reduce complex
situations to abstract games where agents maximize their payoffs. Using
game theory, we can study how altering incentives influences the
strategies that these agents use.</p>
<p><strong>Agents in game theory.</strong> We usually assume that the
agents in these games are self-interested and rational. Agents are
“self-interested” if they make decisions in view of their own utility,
regardless of the consequences to others. Agents are said to be
“rational” if they act as though they are maximizing their utility.</p>
<p><strong>Games can be “zero sum” or “non-zero sum.”</strong> We can
categorize the games we are studying in different ways. One distinction
is between zero sum and non-zero sum games. A
<strong>zero sum</strong> game is one where, in every outcome, the
agents’ payoffs all sum to zero. An example is “tug of war”: any benefit
to one party from their pull is necessarily a cost to the other.
Therefore, the total value of these wins and losses cancel out. In other
words, there is never any net change in total value. Poker is a zero sum
game if the players’ payoffs are the money they each finish with. The
total amount of money at a poker game’s beginning and end is the same —
it has simply been redistributed between the players.<p>
By contrast, many games are non-zero sum. In <em>non-zero</em> sum
games, the total amount of value is not fixed and may be changed by
playing the game. Thus, one agent’s win does not necessarily require
another’s loss. For instance, in cooperation games such as those where
players must meet at an undetermined location, players only get the
payoff together if they manage to find each other. As we shall see, the
Prisoner’s dilemma is a non-zero sum game, as the sum of payoffs changes
across different outcomes.</p>
<p><strong>Non-zero sum games can have “positive sum” or “negative sum”
outcomes.</strong> We can categorize the outcomes of non-zero sum games
as <em>positive sum</em> and <em>negative sum</em>. In a positive sum
outcome, the total gains and losses of the agents sum to greater than
zero. Positive sum outcomes can arise when particular interactions
result in an increase in value. This includes instances of
mutually-beneficial cooperation. For example, if one agent has flour and
another has water and heat, the two together can cooperate to make
bread, which is more valuable than the raw materials. As a real-world
example, many view the stock market as positive sum because the overall
value of the stock market tends to increase over time. Though gains are
unevenly distributed, and some investors lose money, the average
investor becomes richer. This demonstrates an important point: positive
sum outcomes are not necessarily “win-win.” Cooperating does not
guarantee a benefit to all involved. Even if extra total value is
created, its distribution between the agents involved in its creation
can take any shape, including one where some agents have negative
payoffs.<p>
In a negative sum outcome, some amount of value is lost by playing the
game. Many competitive interactions in the real world are negative sum.
For instance, consider “oil wars”—wars fought over a valuable
hydrocarbon resource. Oil wars are zero-sum with regards to oil since
only the distribution (not the amount) of oil changes. However, the
process of conflict itself incurs costs to both sides, such as loss of
life and infrastructure damage. This reduces the total amount of value.
If AI development has the potential to result in catastrophic outcomes
for humanity, then accelerating development to gain short-term profits
in exchange for long-term losses to everyone involved would be a
negative sum outcome.</p>
<h2 id="the-prisoners-dilemma">7.2.3 The Prisoner’s Dilemma</h2>
<p>Our aim in this section is to investigate how interactions between
rational agents, both human and AI, may negatively impact everyone
involved. To this end, we focus on a simple game: the Prisoner’s
Dilemma. We first explore how the game works, and its different possible
outcomes. We then examine why agents may choose not to cooperate even if
they know this will lead to a collectively suboptimal outcome. We run
through several real-world phenomena which we can model using the
Prisoner’s Dilemma, before exploring ways in which cooperation can be
promoted in these kinds of interactions. We end by briefly discussing
the risk of AI agents tending towards undesirable equilibrium
states.</p>
<h3 id="the-game-fundamentals">The Game Fundamentals</h3>
<p>In the Prisoner’s Dilemma, two agents must each decide whether or not
to cooperate. The costs and benefits are structured such that for each
agent, defection is the best strategy regardless of what their partner
chooses to do. This motivates both agents to defect.</p>
<p><strong>The Prisoner’s Dilemma.</strong> In game theory, the
<em>Prisoner’s Dilemma</em> is a classic example of the decisions of
rational agents leading to suboptimal outcomes. The basic setup is as
follows. The police have arrested two would-be thieves. We will call
them Alice and Bob. The suspects were caught breaking into a house. The
police are now detaining them in separate holding cells, so they cannot
communicate with each other. The police suspect that the pair were
planning <em>burglary</em> (which carries a lengthy jail sentence). But
they only have enough evidence to charge them with <em>trespassing</em>
(which carries a shorter jail sentence). However, the testimony of
either one of the suspects would be enough to charge the other with
burglary, so the police offer each suspect the following deal. If only
one of them rats out their partner by confessing that they had intended
to commit burglary, the confessor will be released with <em>no jail
time</em> and their partner will spend <em>eight years</em> in jail.
However, if they each attempt to rat out the other by both confessing,
they will both serve a medium prison sentence of <em>three years</em>.
If neither suspect confesses, they will both serve a short jail sentence
of only <em>one year</em>.</p>
<p><strong>The four possible outcomes.</strong> We assume that Alice and
Bob are both rational and self-interested: each only cares about
minimizing their own jail time. We define the decision facing each as
follows. They can either “cooperate” with their partner by remaining
silent or “defect” on their partner by confessing to burglary. Each
suspect faces four possible outcomes, which we can split into two
possible scenarios. Let’s term these “World 1” and “World 2”; see Figure
7.1. In World 1, their partner chooses to cooperate with them; in World 2,
their partner chooses to defect. In both scenarios, the suspect decides
whether to cooperate or defect themself. They do not know what their
partner will decide to do.<p>
</p>
<figure id="fig:pris-dillema">
<p><img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/world-one-and-two.png" alt="image"
        class="tb-img-full" style="width: 80%"/></p>
<p class="tb-caption">Figure 7.1: The possible outcomes for Alice in the Prisoner’s Dilemma.</p>
</figure>
<br> <br>
<p><strong>Defection is the dominant strategy.</strong> Alice does not
know whether Bob will choose to cooperate or defect. She does not know
whether she will find herself in World 1 or World 2; see Figure 7.1. She
can only decide whether to cooperate or defect herself. This means she
is making one of two possible decisions. If she defects, she is…<p>
</p>
<div class="blockquote">
<p>…in World 1: Bob cooperates and she goes free instead of spending a
year in jail.<p>
…in World 2: Bob defects and she gets a 3-year sentence instead of an
8-year one.<p>
</p>
</div>
<p>Alice only cares about minimizing her own jail time, so she can save
herself jail time in either scenario by choosing to defect. She saves
herself one year if her partner cooperates or five years if her partner
defects. A rational agent under these circumstances will do best if they
decide to defect, regardless of what they expect their partner to do. We
call this the <em>dominant strategy</em>: a rational agent playing the
Prisoner’s Dilemma should choose to defect <em>no matter what their
partner does</em>.<p>
One way to think about strategic dominance is through the following
thought experiment. Someone in the Arctic during winter is choosing what
to wear for that day’s excursion. They have only two options: a coat or
a t-shirt. The coat is thick and waterproof; the t-shirt is thin and
absorbent. Though this person cannot control or predict the weather,
they know there are only two possibilities: either rain or cold. If it
rains, the coat will keep them drier than the t-shirt. If it is cold,
the coat will keep them warmer than the t-shirt. Either way, the coat is
the better option, so “wearing the coat” is their dominant strategy.</p>
<p><strong>Defection is the dominant strategy for both agents.</strong>
Importantly, both the suspects face this decision in a symmetric
fashion. Each is deciding between identical outcomes, and each wishes to
minimize their own jail time. Let’s consider the four possible outcomes
now in terms of both the suspects’ jail sentences. We can
display this information in a <em>payoff matrix</em>, as shown in Table
7.1. Payoff matrices are commonly
used to visualize games. They show all the possible outcomes of a game
in terms of the value of that outcome for each of the agents involved.
In the Prisoner’s Dilemma, we show the decision outcomes as the payoffs
to each suspect: note that since more jail time is worse than less,
these payoffs are negative. Each cell of the matrix shows the outcome of
the two suspects’ decisions as the payoff to each suspect.<p>
</p>
<div id="tab:payoff-matrix">
<table class="prisonTable">
<caption>Table 7.1: Each cell in this payoff matrix represents a payoff. If Alice cooperates and Bob defects,
the top right cell tells us that Alice gets 8 years in jail while Bob goes free.</caption>
<thead>
<tr class="header">
<th style="text-align: center;"></th>
<th style="text-align: center;">Bob cooperates</th>
<th style="text-align: center;">Bob defects</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">Alice cooperates</td>
<td style="text-align: center;">-1, -1</td>
<td style="text-align: center;">-8, 0 </td>
</tr>
<tr class="even">
<td style="text-align: center;"> Alice defects</td>
<td style="text-align: center;">0, -8</td>
<td style="text-align: center;">-3, -3</td>
</tr>
</tbody>
</table>
</div>
<br>
<p><em>Each cell of the matrix quantifies the decision outcome in terms
of the payoff to each: the numbers are negatives, because more jail time
represents a worse payoff. The color-coding matches the payoff to the
agent. For example, if Alice cooperates, and Bob defects, the outcome
secured is shown in the top right cell (-8, 0): this means Alice gets 8
years in jail, and Bob gets no jail time.</em></p>
<br> <br>
<h3 id="nash-equilibria-and-pareto-efficiency">Nash Equilibria and
Pareto Efficiency</h3>
<p>The stable equilibrium state in the Prisoner’s Dilemma is for both
agents to defect. Neither agent would choose to go back in time and
change their decision (to switch to cooperating) if they could not also
alter their partner’s behavior by doing so. This is often considered
counterintuitive, as the agents would benefit if they were both to
switch to cooperating.</p>
<p><strong>Nash Equilibrium: both agents will choose to defect.</strong>
Defection is the best strategy for Alice, regardless of what Bob opts to
do. The same is true for Bob. Therefore, if both are behaving in a
rational and self-interested fashion, they will both defect. This will
secure the outcome of 3 years of jail time each (the bottom-right
outcome of the payoff matrix above). Neither would wish to change their
decision, even if their partner were to change theirs. This is known as
the <em>Nash equilibrium</em>: the strategy choices from which no agent
can benefit by unilaterally choosing a different strategy. When
interacting with one another, rational agents will tend towards picking
strategies that are part of Nash equilibria.</p>
<p><strong>Pareto improvement: both agents would do better if they
cooperated.</strong> As we can see in the payoff matrix, there is a
possible outcome that is better for both suspects. If both choose the
cooperate strategy, they will secure the top-left outcome of the payoff
matrix. Each would serve 2 years less jail time at no cost to the other.
Yet, as we have seen, selecting this strategy is irrational; the
<em>defect</em> strategy is dominant and so Alice and Bob each want to
defect instead. We call this outcome <em>Pareto inefficient</em>,
meaning that it could be altered to make some of those involved better
off without making anyone else worse off. In the Prisoner’s Dilemma, the
<em>both defect</em> outcome is Pareto inefficient because it is
suboptimal for both Alice and Bob, who would both be better off if they
both cooperated instead. Where there is an outcome that is better for
some or all agents involved, and not worse for any, we call the switch
to this more efficient outcome a <em>Pareto improvement</em>. In the
Prisoner’s Dilemma, the <em>both cooperate</em> outcome is better for
both agents than the Nash equilibrium of <em>both defect</em>; see
Figure 7.2. The only Pareto
improvement possible in this game is the move from the <em>both
defect</em> to the <em>both cooperate</em> outcome; see Figure 7.3.<p>
</p>
<figure id="fig:choices">
<p><img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/Alice and Bob choices red green.png"
alt="image" class="tb-img-full" style="width: 80%"/>
<p class="tb-caption">Figure 7.2: Looking at the possible outcomes for both suspects in the Prisoner’s Dilemma, we can
see that there is a possible Pareto improvement from the Nash equilibrium. The numbers represent
their payoffs (rather than the length of their jail sentence).</p>
<!--<figcaption>The possible outcomes for both suspects in the Prisoner’s-->
<!--Dilemma</figcaption>-->
</figure>
<p><em>A) Shown is the same decision tree as in Figure 7.1, but for both
suspects. Rather than jail sentences, we show payoffs (negative numbers,
rather than positive). B) The outcome where both suspects get the “-3”
payoff is the Nash equilibrium, since defection is the dominant strategy
for both. However, this outcome is Pareto inefficient, as both suspects
would do better if both chose instead to cooperate, securing the outcome
in which both get the “-1” payoff. Both switching to cooperation would
produce a Pareto improvement.</em><p>
</p>
<br> <br>
<figure id="fig:pareto-efficiency">
<img
src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/Alice-bob-payout-purple-green.png" class="tb-img-full" style="width: 80%"/>
<p class="tb-caption">Figure 7.3: Both suspects’ payoffs, in each of the four decision outcomes. Moving right increases
Alice’s payoff, and moving up improves Bob’s payoff. A Pareto improvement requires moving right
and up, as shown by the green arrow. <span class="citation"
data-cites="kuhn2019prisoner">[1]</span></p>
<!--<figcaption>The possible outcomes for both suspects in the Prisoner’s-->
<!--Dilemma - adapted from <span class="citation"-->
<!--data-cites="kuhn2019prisoner">[1]</span></figcaption>-->
</figure>
<p><em>Both suspects’ payoffs, in each of the four decision outcomes.
Movement right through the graphspace represents a better payoff for
Alice; movement up represents a better payoff for Bob. A Pareto
improvement must therefore be a movement both right and up. There is
only one such move possible, shown as a green arrow: from “-3,-3” (both
defect) to “-1,-1” (both cooperate).</em></p>
<br> <br>
<h3 id="real-world-examples-of-the-prisoners-dilemma">Real-World
Examples of the Prisoner’s Dilemma</h3>
<p>The Prisoner’s Dilemma has many simplifying assumptions.
Nevertheless, it can be a helpful lens through which to understand
social dynamics in the real world. Rational and self-interested parties
often produce states that are Pareto inefficient. There exist
alternative states that would be better for all involved, but reaching
these requires individually irrational action. To illustrate this, let’s
explore some real-world examples.</p>
<p><strong>Mud-slinging.</strong> Consider the practice of mud-slinging.
Competing political parties often use negative campaign tactics,
producing significant reputational costs. By running negative ads to
attack and undermine the public image of their opponents, all parties
end up with tarnished reputations. If we assume that politicians value
their reputation in an absolute sense, not merely in relation to their
contemporary competitors, then mud-slinging is undesirable for all. A
Pareto improvement to this situation would be switching to the outcome
where they all cooperate. With no one engaging in mud-slinging, all the
parties would have better reputations. The reason this does not happen
is that mud-slinging is the dominant strategy. If a party’s opponent
<em>doesn’t</em> use negative ads, the party will boost their reputation
relative to their opponent’s by using them. If their opponent
<em>does</em> use negative ads, the party will reduce the difference
between their reputations by using them too. Thus, both parties converge
on the Nash equilibrium of mutual mud-slinging, at avoidable detriment
to all.</p>
<p><strong>Shopkeeper price cuts.</strong> Another example is price
racing dynamics between different goods providers. Consider two rival
shopkeepers selling similar produce at similar prices. They are
competing for local customers. Each shopkeeper calculates that lowering
their prices below that of their rival will attract more customers away
from the other shop and result in a higher total profit for themselves.
If their competitor drops their prices and they do not, then the
competitor will gain extra customers, leaving the first shopkeeper with
almost none. Thus, “dropping prices” is the dominant strategy for both.
This leads to a Nash equilibrium in which both shops have low prices,
but the local custom is divided much the same as it would be if they had
both kept their prices high. If they were both to raise their prices,
they would both benefit by increasing their profits: this would be a
Pareto improvement. Note that, just as how the interests of the police
do not count in the Prisoner’s Dilemma, we are only considering the
interests of the shopkeepers in this example. We are ignoring the
interests of the customers and wider society.</p>
<p><strong>Arms races.</strong> Nations’ expenditure on military arms
development is another example. It would be better for all these
nations’ governments if they were all simultaneously to reduce their
military budgets. No nation would become more vulnerable if they were
all to do this, and each could then redirect these resources to areas
such as education and healthcare. Instead, we have widespread military
arms races. We might prefer for all the nations to turn some military
spending to their other budgets, but for any one nation to do so would
be irrational. Here, the dominant strategy for each nation is to opt for
high military expenditure. So we achieve a Nash equilibrium in which all
nations must decrease spending in other valuable sectors. It would be
more Pareto efficient for all to have lower military spending, freeing
money and resources for different domains. We will consider races in the
context of AI development in the following section.</p>
<h3 id="promoting-cooperation">Promoting Cooperation</h3>
<p>So far we have focused on the sources of undesirable multi-agent
dynamics in games like the Prisoner’s Dilemma. Here, we turn to the
mechanisms by which we can promote cooperation over defection.</p>
<p><strong>Reasons to cooperate.</strong> There are many reasons why
real-world agents might cooperate in situations which resemble the
Prisoner’s Dilemma <span class="citation"
data-cites="parfit1984reasons">[2]</span>, as shown in Figure 7.4. These can broadly be categorized
by whether the agents have a choice, or whether defection is impossible.
If the agents do have a choice, we can further divide the possibilities
into those where they act in their own self-interest, and those where
they do not (altruism). Finally, we can differentiate two reasons why self-interested agents may choose to cooperate: a tendency toward this, such as a conscience or guilt, and future reward/punishment. We will explore
two possibilities in this section — payoff changes and altruistic
dispositions — and then “future reward/punishment” in the next section.
Note that we effectively discuss “Defection is impossible” in the Single Agent Safety
chapter, and “AI consciences” in the Beneficial AI and Machine Ethics chapter.<p>
</p>
<figure id="fig:cooperate">
<img
src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/why_cooperate.png" class="tb-img-full"/>
<p class="tb-caption">Figure 7.4: Four possible reasons why agents may cooperate in prisoner’s Dilemma-like scenarios.
This section explores two: changes to the payoff matrix and increased agent altruism. <span class="citation" data-cites="parfit1984reasons">[2]</span></p>
</figure>

<p><em>Four possible reasons why agents may cooperate in prisoner’s
    Dilemma-like scenarios. As highlighted, this section explores only two: changes to the payoff matrix and increased agent altruism. </em></p>

<p><strong>External consideration: changing the payoffs to incentivize
cooperation.</strong> By adjusting the values in the payoff matrix, we
may more easily steer agents away from undesirable equilibria. As shown
in Table 7.2, incentive structures are important.
A Prisoner’s Dilemma-like scenario may arise wherever an individual
agent will do better to defect whether their partner cooperates (<span
class="math inline"><em>c</em> &gt; <em>a</em></span>) or defects (<span
class="math inline"><em>d</em> &gt; <em>b</em></span>). Avoiding this
situation requires altering these constants where they underlie critical
social interactions in the real world: changing the costs and benefits
associated with different activities so as to encourage cooperative
behavior.<p>
</p>
<div id="tab:abstract">
<table class="prisonTable">
<thead>
<tr class="header">
<th style="text-align: center;"></th>
<th style="text-align: center;">Agent B
cooperates</th>
<th style="text-align: center;">Agent B</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">Agent A
cooperates</td>
<td style="text-align: center;">a, a</td>
<td style="text-align: center;">b, c</td>
</tr>
<tr class="even">
<td style="text-align: center;">Agent A
defects</td>
<td style="text-align: center;">c, b</td>
<td style="text-align: center;">d, d</td>
</tr>
</tbody>
<caption>Table 7.2: if <span
        class="math inline"><em>c > a</em></span> and <span class="math inline"><em>d > b</em></span>,
        the highest payoff for either agent is to defect, regardless of what their opponent does:
        Defection is the dominant strategy. Fostering cooperation requires avoiding this structure.
</caption>
</table>
</div>
<br>
<p><em>Shown is the payoff matrix for the Prisoner’s Dilemma, in the
abstract. Notice that if <span
class="math inline"><em>c</em> &gt; <em>a</em></span> and <span
class="math inline"><em>d</em> &gt; <em>b</em></span>, the highest
payoff for either agent is to defect, regardless of what their opponent
does: Defection is the dominant strategy. Therefore, fostering
cooperation requires that we avoid structuring incentives such that
<span class="math inline"><em>c</em> &gt; <em>a</em></span> and <span
class="math inline"><em>d</em> &gt; <em>b</em></span>.</em><p>
There are two ways to reduce the expected value of defection: lower the
<em>probability</em> of defection success or lower the <em>benefit</em>
of a successful defection. Consider a strategy commonly used by
organized crime groups: threatening members with extreme punishment if
they ‘snitch’ to the police. In the Prisoner’s Dilemma game, we can
model this by adding a punishment equivalent to three years of jail time
for “snitching,” leading to the altered payoff matrix as shown in Figure
7.5. The Pareto efficient outcome
(-1,-1) is now also a Nash Equilibrium because snitching when the other
player cooperates is worse than mutually cooperating (<span
class="math inline"><em>c</em> &lt; <em>a</em></span>).<p>
</p>
<figure id="fig:snitches">
<img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/Alice-payoff-with-graphs.png" class="tb-img-half" style="width: 100%"/>
</figure>
<figure>
<img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/Alice-bob-payoff-graphs2.png" class="tb-img-half" style="width: 100%"/>
<p class="tb-caption">Figure 7.5: Altering the payoff matrix to punish snitches, we can move from a Prisoner’s Dilemma
(left) to a Stag Hunt (right), in which there is an additional Nash equilibrium. </p>
<!--<figcaption>Altering the payoff matrix to “punish snitches”.-->
<!--</figcaption>-->
</figure>

<p><em>A) The Prisoner’s Dilemma payoff matrix, with the single Nash
equilibrium highlighted. B) If we add a punishment of three years jail
time for being a “snitch,” the outcome (-1,-1) becomes a second Nash
Equilibrium. Note that this is known as the “Stag Hunt” in game
theory.</em><p>
<br><br>
<p><strong>Internal consideration: making agents more altruistic to promote
cooperation.</strong> A second potential mechanism to foster cooperation is to
make agents more altruistic. If each agent also values the outcome for
their partner, this effectively changes the payoff matrix. Now, the
length of their partner’s jail sentence matters to each of them. In the
Prisoner’s Dilemma payoff matrix, the <em>both cooperate</em> outcome
earns the lowest total jail time, so agents who valued their partners’
payoffs equally to their own would converge on cooperation.</p>
<p><strong>Parallels to AI safety.</strong> One possible example of such
a strategy would be to target the values held by AI companies
themselves. Improving corporate regulation effectively changes the
company’s expected payoffs from pursuing risky strategies. If
successful, it could encourage the company building AI systems to behave
in a less purely self-interested fashion. Rather than caring solely
about maximizing their shareholder’s financial interests, AI companies
might cooperate more with each other to steer away from Pareto
inefficient outcomes, and avoid corporate AI races. We explore this in
more detail in section 1.3</em>
below.</p>
<h3 id="summary">Summary</h3>
<p><strong>Cooperation is not always rational, so intelligence alone may
not ensure good outcomes.</strong> We have seen that rational and
self-interested agents may not interact in such a way as to achieve good
results, even for themselves. Under certain conditions, such as in the
Prisoner’s Dilemma, they will converge on a Nash equilibrium of both
defecting. Both agents would be better off if they both cooperated.
However, it is hard to secure this Pareto improvement because
cooperation is not rational when defection is the dominant strategy.</p>
<p><strong>Conflict with or between future AI agents may be extremely
harmful.</strong> One source of concern regarding future AI systems is
inter-agent conflict eroding the value of the future. Rational AI agents
faced with a Prisoner’s Dilemma-type scenario might end up in stable
equilibrium states that are far from optimal, perhaps for all the
parties involved. Possible avenues to reduce these risks include
restructuring the payoff matrices for the interactions in which these
agents may be engaged or altering the agents’ dispositions.<p>
</p>
<h2 id="the-iterated-prisoners-dilemma">7.2.4 The Iterated Prisoner’s
Dilemma</h2>
<p>In our discussion of the Prisoner’s Dilemma, we saw how rational
agents may converge to equilibrium states that are bad for all involved.
In the real world, however, agents rarely interact with one another only
once. Our aim in this section is to understand how cooperative behavior
can be promoted and maintained as multiple agents (both human and AI)
interact with each other over time, when they expect repeated future
interactions. We handle some common misconceptions in this section, such
as the idea that simply getting agents to interact repeatedly is
sufficient to foster cooperation, because “nice” and “forgiving”
strategies always win out. As we shall see, things are not so simple. We
explore how iterated interactions can lead to progressively worse
outcomes for all.<p>
In the real world, we can observe this in “AI races”, where businesses
cut corners on safety due to competitive pressures, and militaries adopt
and deploy potentially unsafe AI technologies, making the world less
safe. These AI races could produce catastrophic consequences, including
more frequent or destructive wars, economic enfeeblement, and the
potential for catastrophic accidents from malfunctioning or misused AI
weapons.</p>
<h3 id="introduction">Introduction</h3>
<p>Agents who engage with one another many times do not always coexist
harmoniously. Iterating interactions is not sufficient to ensure
cooperation. To see why, we explore what happens when rational,
self-interested agents play the Prisoner’ Dilemma game against each
other repeatedly. In a single-round Prisoner’s Dilemma, defection is
always the rational move. But understanding the success of different
strategies is more complicated when agents play multiple rounds.</p>
<p><strong>In the Iterated Prisoner’s Dilemma, agents play
repeatedly.</strong> The dominant strategy for a rational agent in a
one-off interaction such as the Prisoner’s Dilemma is to defect. The
seeming paradox is that both agents would prefer the cooperate-cooperate
outcome to the defect-defect one. An agent cannot influence their
partner’s actions in a one-off interaction, but in an iterated scenario,
one agent’s behavior in one round may influence how their partner
responds in the next. We call this the <em>Iterated Prisoner’s
Dilemma</em>; see Figure 7.6. This
provides an opportunity for the agents to cooperate with each other.</p>
<p><strong>Iterating the Prisoner’s Dilemma opens the door to rational
cooperation.</strong> In an Iterated Prisoner’s Dilemma, both agents can
achieve higher payoffs by fostering a cooperative relationship with each
other than they would if both were to defect every round. There are two basic mechanisms by which iteration can promote
cooperative behavior: punishing defection and rewarding cooperation. To
see why, let us follow an example game of the Iterated Prisoner’s
Dilemma in sequence.</p>
<p><strong>Punishment.</strong> Recall Alice and Bob from the previous
section, the two would-be thieves caught by the police. Alice decides to
defect in the first round of the Prisoner’s Dilemma, while Bob opts to
cooperate. This achieves a good outcome for Alice, and a poor one for
Bob, who punishes this behavior by choosing to defect himself in the
second round. What makes this a punishment is that Alice’s score will
now be lower than it would be if Bob had opted to cooperate instead,
whether Alice chooses to cooperate or defect.</p>
<p><strong>Reward.</strong> Alice, having been punished, decides to
cooperate in the third round. Bob rewards this action by cooperating in
turn in the fourth. What makes this a reward is that Alice’s score will
now be higher than if Bob had instead opted to defect, whether Alice
chooses to cooperate or defect. Thus, the expectation that their
defection will be punished and their cooperation rewarded incentivizes
both agents to cooperate with each other.<p>
</p>
<figure id="fig:iterated">
<img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/Tit-for-tat.png" class="tb-img-full"/>
<p class="tb-caption">Figure 7.6: Across six rounds, both players gain better payoffs if they consistently cooperate. But
defecting creates short-term gains.</p>
<!--<figcaption>If the agents cooperate more, they can both gain better-->
<!--payoffs</figcaption>-->
</figure>
<p>
<em>In Figure 7.6, each panel shows a six-round
Iterated Prisoner’s Dilemma, with purple squares for defection and blue
for cooperation. On the left is <em>Tit-for-tat</em>: An agent using
this strategy tends to score the same as or worse than its partners in
each match. On the right, <em>always defect</em> tends to score the same
as or better than its partner in each match. The average payoff attained
by using either strategy are shown at the bottom: <em>Tit-for-tat</em>
attains a better payoff (lower jail sentence) on average—and so is more
successful in a tournament—than <em>always defect</em>.</em></p>
<br><br>
<p><strong>Defection is still the dominant strategy if agents know how
many times they will interact.</strong> If the agents know when they are
about to play the Prisoner’s Dilemma with each other for the final time,
both will choose to defect in that final round. This is because their
defection is no longer punishable by their partner. If Alice defects in
the last round of the Iterated Prisoner’s Dilemma, Bob cannot punish her
by retaliating, as there are no future rounds in which to do so. The
same is of course true for Bob. Thus, <em>defection is the dominant
strategy for each agent in the final round</em>, just as it is in the
single-round version of the dilemma.<p>
Moreover, if each agent expects their partner to defect in the final
round, <em>then there is no incentive for them to cooperate in the
penultimate round either</em>. This is for the same reason: Defecting in
the penultimate round will not influence their partner’s behavior in the
final round. Whatever an agent decides to do, they expect that their
partner will choose to defect next round, so they might as well defect
now. We can extend this argument by reasoning backwards through all the
iterations. In each round, the certainty that their partner will defect
in the next round regardless of their own behavior in the current round
incentivizes each agent to defect. The reward for cooperation and
punishment of defection have been removed. Ultimately, this removal
pushes the agents to defect in every round of the Iterated Prisoner’s
Dilemma.</p>
<p><strong>Uncertainty about future engagement enables rational
cooperation.</strong> In the real world, an agent can rarely be sure
that they will never again engage with a given partner. Wherever there
is sufficient uncertainty about the future of their relationship,
rational agents may be more cooperative. This is for the simple reason
that uncooperative behavior may yield less valuable outcomes in the long
term, because others may retaliate in kind in the future. This tells us
that AIs interacting with each other repeatedly may cooperate, but only
if they are sufficiently uncertain about whether their interactions are
about to end.<p>
Other forms of uncertainty can also create opportunities for rational
cooperation, such as uncertainty about what strategies others will use.
These are most important where the Iterated Prisoner’s Dilemma involves
a population of more than two agents, in which each agent interacts
sequentially with multiple partners. We turn to examining the dynamics
of these more complicated games next.</p>
<h3 id="sec:tournaments">Tournaments</h3>
<p>So far, we have considered the Iterated Prisoner’s Dilemma between
only two agents: each plays repeatedly against a single partner.
However, in the real world, we expect AIs will engage with multiple
other agents. In this section, we consider interactions of this kind,
where each agent not only interacts with their partner repeatedly, but
also switches partners over time. Understanding the success of a
strategy is more complicated in repeated rounds against many partners.
Note that in this section, we define a “match” to mean repeated rounds
of the Prisoner’s Dilemma between the same two agents; see Figure 7.6. We define a “tournament” to mean a
population of more than two agents engaged in a set of pairwise
matches.</p>
<p><strong>In Iterated Prisoner Dilemma tournaments, each agent
interacts with multiple partners.</strong> In the 1970s, the political
scientist Robert Axelrod held a series of tournaments to pit different
agents against one another in the Iterated Prisoner’s Dilemma. The
tournament winner was whichever agent had the highest total payoff after
completing all matches. Each agent in an Iterated Prisoner’s Dilemma
tournament plays multiple rounds against multiple partners. These agents
employed a range of different strategies. For example, an agent using
the strategy named <em>random</em> would randomly determine whether to
cooperate or defect in each round, entirely independently of previous
interactions with a given partner. By contrast, an agent using the
<em>grudger</em> strategy would start out cooperating, but switch to
defecting for all future interactions if its partner defected even once.
See Table 7.3 for examples of these
strategies.<p>
</p>
<br>
<div id="tab:strategies">
<table class="prisonTable">
<caption>Table 7.3: Popular strategies’ descriptions.</caption>
<thead>
<tr class="header">
<th style="text-align: left;"><strong>Strategy</strong></th>
<th style="text-align: left;">Characteristics</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;"><em>Random</em></td>
<td style="text-align: left;">Randomly defect or cooperate, regardless
of your partner’s strategy</td>
</tr>
<tr class="even">
<td style="text-align: left;"><em>Always defect</em></td>
<td style="text-align: left;">Always choose to defect, regardless of
your partner’s strategy</td>
</tr>
<tr class="odd">
<td style="text-align: left;"><em>Always cooperate</em></td>
<td style="text-align: left;">Always choose to defect, regardless of
your partner’s strategy</td>
</tr>
<tr class="even">
<td style="text-align: left;"><em>Grudger</em></td>
<td style="text-align: left;">Start by cooperating, but if your partner
defects, defect in every subsequent round, regardless of your partner’s
subsequent behavior</td>
</tr>
<tr class="odd">
<td style="text-align: left;"><em>Tit-for-tat</em></td>
<td style="text-align: left;">Start cooperating; then always do whatever
your partner did last</td>
</tr>
<tr class="even">
<td style="text-align: left;"><em>Generous tit-for-tat</em></td>
<td style="text-align: left;">Same as <em>tit-for-tat</em>, but
occasionally cooperate in response to your partner’s defection</td>
</tr>
</tbody>
</table>
</div>
<br>
<p><strong>The strategy “<em>Tit-for-tat</em>” frequently won Axelrod’s
tournaments <span class="citation"
data-cites="axelrod1980effective">[3]</span>.</strong> The most famous
strategy used in Axelrod’s tournaments was <em>Tit-for-tat</em>. This
was the strategy of starting by cooperating, then repeating the
partner’s most recent move: if they cooperated, <em>Tit-for-tat</em>
cooperated too; if they defected, <em>Tit-for-tat</em> did likewise.
Despite its simplicity, this strategy was extremely successful, and very
frequently won tournaments. An agent playing <em>Tit-for-tat</em>
exemplified the two mechanisms for promoting cooperation, rewarding
cooperation, yet also punishing defection. Importantly,
<em>Tit-for-tat</em> did not hold a grudge—it forgave each defection
after it retaliated by defecting in return, only once. This process of
one defection for one defection is captured in the famous idiom “an eye
for an eye.” The <em>Tit-for-tat</em> strategy became emblematic as
being one way to escape the muck of defection.</p>
<p><strong>The success of <em>Tit-for-tat</em> is
counterintuitive.</strong> In any given match, an agent playing
<em>Tit-for-tat</em> will tend to score slightly worse than or the same
as their partner; see Figure 7.6. By
contrast, an agent who employs an uncooperative strategy such as
<em>always defect</em> usually scores the same as or better than its
partner. In a match between a cooperative
agent and an uncooperative one, the uncooperative agent tends to end up
with the better score.<p>
However, it is an agent’s <em>average</em> score which dictates its
success in a tournament, not its score in any particular match or with
any particular partner. Two uncooperative partners will score worse on
average than cooperative ones. Thus, the success of cooperative
strategies such as in Figure 7.6 depends on the population strategy
composition (the assortment of strategies used by the agents in the
population). If there are enough cooperative partners, cooperative
agents may be more successful than uncooperative ones.
<h3 id="sec:AI-races">AI Races</h3>
<p>Iterated interactions can generate “AI races.” We discuss two kinds
of races concerning AI development: corporate AI races and military AI
arms races. Both kinds center around competing parties participating in
races for individual, short-term gains at a collective, long-term
detriment. Where individual incentives clash with collective interests,
the outcome can be bad for all. As we discuss here, in the context of AI
races, these outcomes could even be catastrophic.</p>
<p><strong>AI races are the result of intense competitive
pressures.</strong> During the Cold War, the US and the Soviet Union
were involved in a costly nuclear arms race. The effects of their
competition persist today, leaving the world in a state of heightened
nuclear threat. Competitive races of this kind entail repeated
back-and-forth actions that can result in progressively worse outcomes
for all involved. We can liken this example to the Iterated Prisoner’s
Dilemma, where the nations must decide whether to increase (defect) or
decrease (cooperate) their nuclear spending. Both the US and the Soviet
Union often chose to increase spending. They would have created a safer
and less expensive world for both nations (as well as others) if they
had cooperated to reduce their nuclear stockpiles. We discuss this in
more detail in International Governance.</p>
<p><strong>Two kinds of AI races: corporate and military <span
class="citation" data-cites="hendrycks2023overview">[4]</span>.</strong>
Competition between different parties—nations or corporations—is
incentivizing each to develop, deploy, and adopt AIs rapidly, at the
expense of other values and safety precautions. Corporate AI races
consist of businesses prioritizing their own survival or power expansion
over ensuring that AIs are developed and released safely. Military AI
arms races consist of nations building and adopting powerful and
dangerous military applications of AI technologies to gain military
power, increasing the risks of more frequent or damaging wars, misuse,
or catastrophic accidents. We can understand these two kinds of AI races
using two game-theoretic models of iterated interactions. First, we use
the <em>Attrition</em> model to understand why AI corporations are
cutting corners on safety. Second, we’ll use the <em>Security
Dilemma</em> model to understand why militaries are escalating the use
of—and reliance on—AI in warfare.</p>
<h3 id="corporate-ai-races">Corporate AI Races</h3>
<p>Competition between AI research companies is promoting the creation
and use of more appealing and profitable systems, often at the cost of
safety measures. Consider the public release of large language
model-based chatbots. Some AI companies delayed releasing their chatbots
out of safety concerns, like avoiding the generation of harmful
misinformation. We can view the companies that released their chatbots
first as having switched from cooperating to defecting in an Iterated
Prisoner’s Dilemma. The defectors gained public attention and secured
future investment. This competitive pressure caused other companies to
rush their AI products to market, compromising safety measures in the
process.<p>
Corporate AI races arise because competitors sacrifice their values to
gain an advantage, even if this harms others. As the race heats up,
corporations might increasingly need to prioritize profits by cutting
corners on safety, in order to survive in a world where their
competitors are very likely to do the same. The worst outcome for an
agent in the Prisoner’s Dilemma is the one where only they cooperated
while their partner defected. Competitive pressures motivate AI
companies to avoid this outcome, even at the cost of exacerbating
large-scale risks.<p>
Ultimately, corporate AI races could produce societal-scale harms, such
as mass unemployment and dangerous dependence on AI systems. We consider
one such example in <em></em>. This risk is particularly vivid for
emerging industries like AI which lack the better-established safeguards
such as mature regulation and widespread awareness of the harm that
unsafe products can cause found in other industries like
pharmaceuticals.</p>
<p><strong>Attrition model: a multi-player game of “Chicken.”</strong>
We can model this corporate AI race using an “Attrition” model <span
class="citation" data-cites="smith1974theory">[5]</span>, which frames
the race as a kind of auction in which competitors bid against one
another for a valuable prize. Rather than bidding money, the competitors
bid for the risk level they are willing to tolerate. This is similar to
the game “Chicken,” in which two competitors drive headlong at each
other. Assuming one swerves out of the way, the winner is the one who
does not (demonstrating that they can tolerate a higher level of risk
than the loser). Similarly, in the Attrition model, each competitor bids
the level of risk—the probability of bringing about a catastrophic
outcome—they are willing to tolerate. Whichever competitor is willing to
tolerate the most risk will win the entire prize, as long as the
catastrophe they are risking does not actually happen. We can consider
this to be an “all pay” auction: both competitors must pay what they
bid, whether they win or not. This is because all of those involved must
bear the risk they are leveraging, and once they have made their bid
they cannot retract it.</p>
<p><strong>The Attrition model shows why AI corporations may cut corners
on safety.</strong> Let us assume that there are only two competitors
and that both of them have the same understanding of the state of their
competition. In this case, the Attrition model predicts that they will
race each other up to a loss of one-third in expected value <span
class="citation" data-cites="nisan2007algorithmic">[6]</span>. If the
value of the prize to one competitor is “X”, they will be willing to
risk a 33% chance of bringing about an outcome equally disvaluable (of
value “-X”) in order to win the race <span class="citation"
data-cites="dafoe2022governance">[7]</span>.<p>
As we have discussed previously, market pressures may motivate
corporations to behave as though they value what they are competing for
almost as highly as survival itself. According to this toy model, we
might then expect AI stakeholders engaged in a corporate race to risk a
33% chance of existential catastrophe in order to “win the prize” of
their continued existence. With multiple AI races, long time horizons,
and ever-increasing risks, the repeated erosion of safety assurances
down to only 66% generates a vast potential for catastrophe.</p>
<p><strong>Real-world actors may mistakenly erode safety precautions
even further.</strong> Moreover, real-world AI races could produce even
worse outcomes than the one predicted by the Attrition model <span
class="citation" data-cites="dafoe2022governance">[7]</span>. One reason
for this is that competing corporations may not have a correct
understanding of the state of the race. Precisely predicting these kinds
of risks can be extremely challenging: high-risk situations are
inherently difficult to predict accurately, even in fields far more
well-understood than AI. Incorrect risk calibration could cause the
competitors to take actions that accidentally exceed even the 33% risk
level. Like newcomers to an ’all pay’ auction who often overbid, uneven
comprehension or misinformation could motivate the competitors to take
even greater risks of bringing about catastrophic outcomes. In fact, we
might even expect selection for competitors who tend to underestimate
the risks of these races. All these factors may further erode safety
assurances.</p>
<h3 id="military-ai-arms-races">Military AI Arms Races</h3>
<p>Global interest in military applications for AI technologies is
increasing. Some hail this as the “third revolution in warfare” <span
class="citation" data-cites="lee2021visions">[8]</span>, predicting
impact at the scale of the historical development of gunpowder and
nuclear weapons. There are many causes for concern about the adoption of
AI technologies in military contexts. These include increased rates of
weapon development, lethal autonomous weapons usage, advanced
cyberattack execution, and automation of decision-making. These could
in turn produce more frequent and destructive wars, acts of terrorism,
and catastrophic accidents. Perhaps even more important than the
immediate dangers from military deployment of AI is the possibility that
nations will continue to race each other along a path towards ever
increased risks of catastrophe. In this section, we explore this
possibility using another game theoretic model.<p>
First, let us consider a few different sources of risk from military AI
<span class="citation"
data-cites="hendrycks2023overview">[4]</span>:</p>
<ol>
<li><p><strong>AI-developed weapons.</strong> AI technologies could be
used to engineer weapons. Military research and development offers many
opportunities for acceleration using AI tools. For instance, AI could be
used to expedite processes in dual-use biological and chemical research,
furthering the development of programs to build weapons of mass
destruction.</p></li>
<li><p><strong>AI-controlled weapons.</strong> AI might also be used to
control weapons directly. “Lethal autonomous weapons” have been in use
since March 2020, when a self-directing and armed drone “hunted down”
soldiers in Libya without human supervision. Autonomous weapons may be
faster or more reliable than human soldiers for certain tasks, as well
as being far more expendable. Autonomous weapons systems thus
effectively motivate militaries to reduce human oversight. In a context
as morally salient as warfare, the ethical implications of this could be
severe. Increasing AI weapon development may also impact international
warfare dynamics. The ability to deploy lethal autonomous weapons in
place of human soldiers could drastically lower the threshold for
nations to engage in war, by reducing the expected body count—of the
nation’s own citizens, at least. These altered warfare dynamics could
usher in a future with more frequent and destructive wars than has yet
been seen in human history.</p></li>
<li><p><strong>AI cyberwarfare.</strong> Another military application is
the use of AI in cyberwarfare. AI systems might be used to defend
against cyberattacks. However, we do not yet know whether this will
outweigh the offensive potential of AI in this context. Cyberattacks can
be used to wreak enormous harm, such as by damaging crucial systems and
infrastructure to disrupt supply chains. AIs could make cyberattacks
more effective in a number of ways, motivating more frequent attempts
and more destructive successes. For example, AIs could directly aid in
writing or improving offensive programs. They could also execute
cyberattacks at superhuman scales by implementing vast numbers of
offensive programs simultaneously. By democratizing the power to execute
large-scale cyberattacks, AIs would also increase the difficulty of
verification. With many more actors capable of carrying out attacks at
such scales, attributing attacks to perpetrators would be much more
challenging.</p></li>
<li><p><strong>Automated executive decision-making.</strong> Executive
control might be delegated to AIs at higher levels of military
procedures. The development of AIs with superhuman strategic
capabilities may incentivize nations to adopt these systems and
increasingly automate military processes. One example of this is
“automated retaliation.” AI systems that are granted the ability to
respond to offensive threats they identify with counterattacks, without
human supervision. Examples of this include the NSA cyber defense
program known as “MonsterMind.” When this program identified an
attempted cyberattack, it interrupted it and prevented its execution.
However, it would then launch an offensive cyberattack of its own in
return. It could take this retaliatory action without consulting human
supervisors. More powerful AI systems, more destructive weapons, and
greater automation or delegation of military control to AI systems,
would all deplete our ability to intervene.</p></li>
<li><p><strong>Catastrophic accidents.</strong> Lethal Autonomous
Weapons and automated decision-making systems both carry risks of
resulting in catastrophic accidents. If a nation were to lose control of
powerful military AI technologies, the outcome could be calamitous.
Outsourcing executive command of military procedures to AI — such as by
automating retaliatory action — would put powerful arsenals on
hair-trigger alert. If one of these AI systems were to make even a small
error, such as incorrectly identifying an offensive strike from another
nation, it might automatically “retaliate” to this non-existent threat.
This could in turn trigger automated retaliations from the AI systems of
other nations that detect this action. Thus, a small error could be
exacerbated into an increasingly escalated war. We consider how a “flash
war” such as this might come about in more detail in Section 1.5.4. Note that we can also
use the “Attrition” model in the case of military AI arms races to model
how military competitive pressures can motivate nations to cut corners
on safety.</p></li>
<li><p><strong>Co-option of military AI technologies.</strong> Military
AI arms races could also have catastrophic effects outside of
international warfare. New and more lethal weapons could be used
maliciously in other contexts. For instance, biological weapons were
originally created for military purposes. Even though we have since
halted the military use of these weapons, their existence has enabled
many acts of bioterrorism. Examples include the 2001 deployment of
anthrax letters to kill US senators and media executives. The creation
of knowledge of how to make and use these weapons is irreversible. Thus,
their existence and the risk they pose are permanent.</p></li>
<li><p><strong>Military AI risks may interact.</strong> Importantly, the
risks posed by military AI applications are not entirely independent of
one another. The increased potential for anonymity when executing
cyberattacks could increase the probability of wars. Where it is harder
to identify the perpetrators, misattribution could trigger conflict
between the target of the attack and an innocent party. The potential
for destructive cyberattacks might be increased by the scaled-up use of
autonomous weapons, as these could be co-opted by such attacks.
Similarly, the danger posed by a rogue AI with executive decision-making
power might be all the more serious if it has control over fleets of
autonomous weapons.</p></li>
</ol>
<p><strong>Security Dilemma model: mutual defensive concerns motivate
nations to increase risks.</strong> We can better understand military AI
arms races using the “Security Dilemma” model <span class="citation"
data-cites="herz1950idealist">[9]</span>. Consider the relationship
between two peaceful nations. Though they are not currently at war with
one another, each is sufficiently concerned about the possibility of
conflict to pay close attention to the other’s state of military
ability. One day, one of the two nations perceives that the other is
more militarily capable than they are due to their having stockpiled
more advanced weaponry. This incentivizes the first nation to build up
their own military capabilities until they match or exceed those of the
other nation. The second nation, perceiving this increase in military
investment and development, feels pressure to follow suit, once again
increasing their weapon capabilities. Neither wishes to be outmatched by
the other. This competitive pressure drives both to escalate the
situation. The ensuing arms race generates increasingly high risks for
both sides, such as increasing the probability or severity of accidents
and misuse.</p>
<p><strong>Example: the Cold War nuclear arms race.</strong> As
previously discussed, the Cold War nuclear arms race typifies this
process. Neither the US nor the Soviet Union wanted to risk being less
militarily capable than their rival, so each escalated their own
weaponized nuclear ability in an attempt to deter the other using the
threat of retaliation. Just as in the Iterated Prisoner’s Dilemma,
neither nation could afford to risk being the lone cooperator while
their rival defected. Thus, they achieve a Pareto inefficient outcome of
both defecting. Competitive pressure drove them to continue to worsen
this situation over time, resulting in today’s enormously heightened
state of nuclear vulnerability.</p>
<p><strong>Increased automation of warfare by one nation puts pressure
on others to follow suit.</strong> Just as with nuclear weapons, so with
military AI: the Security Dilemma model illustrates how defensive
concerns can force nations to go down a route which is against the long
term interests of all involved. This route leads to the competing
nations continually heightening the risks posed by military AI
applications, including more frequent and severe wars, and worse
accidents.</p>
<p>There are many incentives for nations to increase their development,
adoption, and deployment of military AI applications. With more AI
involvement, warfare can take place at an accelerated pace, and at a
more destructive scale. Nations that do not adopt and use military AI
technologies may therefore risk not being able to compete with nations
that do. As with nuclear mutually assured destruction, nations may also
employ automated retaliation as a signal of commitment, hoping to deter
attacks by demonstrating a plausible resolution to respond swiftly and
in kind. This process of automation and AI delegation would thus
perpetuate, despite it being increasingly against the collective
good.<p>
Ultimately, as with economic automation, military AI arms races could
result in humans being unable to keep up. The pace and complexity of
warfare could ascend out of human reach to where we are no longer able
to comprehend or intervene. This could be an irreversible step putting
us at high risk of catastrophic outcomes.</p>

<h3 id="sec:extortion">Extortion</h3>
<p>In this section, we examine one last risk that arises when agents interact repeatedly: the discovery of extortion.</p>

<p><strong><em>Extortion</em> strategies in the Iterated Prisoner’s
Dilemma.</strong> In the real world, we describe the use of threats to
force a victim to take an action they would otherwise not want to take
(such as to relinquish something valuable) as “extortion.” Examples
include criminal organizations ransoming those they have kidnapped to
extort their families for money in exchange for their safe return.<p>
In the Iterated Prisoner’s Dilemma, there is a set of <em>extortion</em>
strategies that bear similarity to this real-world phenomenon. An agent
playing the game can use an <em>extortion</em> strategy to ensure that
their payoff in any match is higher than their partner’s <span
class="citation" data-cites="press2012iterated">[19]</span>. The
extortionist achieves this by acting similarly to an agent using
<em>tit-for-tat</em>, responding to like with like. However, the
extortionist will occasionally defect even when their partner has been
cooperative. Extortionists effectively calculate the maximum number of
defections they can get away with without annihilating the motivation of
their partner to continue cooperating with them. They decide whether to
cooperate or defect using a set of probabilities. The most recent
interaction with their partner determines which probability they select.
An example strategy is shown in Figure 7.7. An
extortionist’s partner is incentivized to acquiesce to the extortion
since deviating in any way will yield them a lower payoff. However, in
maximizing their own score, they attain an even higher score for the
extortionist. An extortionist thus scores higher than most of its
partners in Iterated Prisoner’s Dilemma matches.<p>
</p>
<figure id="fig:extort-2">
<img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/The-extort-strategy.png" class="tb-img-full" width="40%"/>
<p class="tb-caption">Figure 7.7: The <em>Extort-2</em> strategy - <span class="citation"
data-cites="stewart2012extortion">[20]</span></p>
<!--<figcaption>The <em>Extort-2</em> strategy - <span class="citation"-->
<!--data-cites="stewart2012extortion">[20]</span></figcaption>-->
</figure>
<p><em>Shown is an extortion strategy called <em>Extort-2</em>, from the
point of view of the extortionist. “You” are the agent using the
<em>Extort-2</em> strategy, and “they” are your partner. As with all
<em>extortion</em> strategies, <em>Extort-2</em> involves reacting
probabilistically to the most recent interaction with a partner. As
an example, in the previous round, if the extortionist defected, but
their partner cooperated, the extortionist will cooperate with a
probability of 1/3 in this
round.</em></p>
<p><strong>Extortion strategies rarely win tournaments but seldom die
out altogether.</strong> Many uncooperative strategies may gain a higher score than most of their
partners in head-to-head matches, and yet still lose in tournaments. By
contrast, extortionists can be somewhat successful in tournaments under
certain conditions. Extortionists are vulnerable to the same problem as
many other uncooperative strategies: they gain low payoffs in matches
against other extortionists. Each will therefore perform less well as
the frequency of extortionists in the population increases. Thus, extortionists can persist for longer if they are sufficiently
unlikely to meet one another. For instance, where a sufficiently small
population of agents is engaged in a tournament, a single extortionist
can achieve very high payoffs by exploiting cooperative strategies.</p>
<p><strong>AI agents may use extortion: evidence from the Iterated
Prisoner’s Dilemma.</strong> AI agents could use extortion in order to
gain resources or power. As we have seen, agents can succeed in the
Iterated Prisoner’s Dilemma by using <em>extortion</em> strategies. This
is particularly true if the extortionist is part of a small group, if
the social dynamics mirror evolution, or after major environmental
changes. These findings are extremely worrying as they could describe
future AI scenarios. Relationships might form among a small number of
powerful AI agents. These agents may undulate through desirable and
undesirable behaviors, or they might switch opportunistically to using
extortion tactics in the wake of changes to their environment. However,
since there are some fragile assumptions in these simple models, we must
also consider evidence from real-world agents.</p>
<p><strong>AI agents may use extortion: evidence from the real
world.</strong> The widespread use of extortion among humans outside the
world of game theoretic models suggests there is still a major cause for
concern. Real-world extortion can still yield results even when the
target is perfectly aware that it is taking place. The use of ransomware
schemes to extort private individuals and companies is increasing
rapidly. In fact, cybersecurity experts estimate that the annual
economic cost of ransomware activity is in the billions of US dollars.
Terrorist organizations such as ISIS rely on extortion through
hostage-taking for a large portion of their total income. The ubiquity
of successful extortion in so many contexts sets a powerful historical
precedent for its efficacy.</p>
<h3 id="tail-risk-extortion-with-digital-minds">Tail Risk: Extortion
With Digital Minds</h3>
<p>Here we examine the possibility of AI agents engaging in extortion to
pursue their goals. Though the probability of AI extortion may be low,
the impact could be immense. As an example, we consider the potential
for extortionist AIs to simulate and torture sentient digital minds as
leverage.</p>
<p><strong>Real-world extortion is a form of optimized
disvalue.</strong> An extortionist chooses to threaten their target
using a personalized source of concern. They optimize their extortion to
be prioritized over their target’s other concerns. Often, the worse the
outcome being threatened, the more likely the target is to acquiesce.
This incentivizes extortionists to threaten to bring about extremely
disvaluable scenarios. In order to be effective, extortionist AIs might
therefore leverage the threat of huge amounts of harm—far more than
would likely come about incidentally, without design. If the disvaluable
outcome the extortionist has designed for their target is also
disvaluable to wider society, then we will share the potentially
enormous costs of any executed threats.</p>
<p><strong>AI extortion could entail torturing vast numbers of sentient
digital minds.</strong> Human extortionists often threaten to inflict
excruciating pain or death on those their victim cares about. AI
extortionists might engage in similar behaviors, threatening to induce
extreme levels of suffering, but on a vastly larger scale. This scale
could potentially exceed any in human history. This is because
extortionist AIs with greater-than-human technological capabilities
might be able to simulate sentient digital minds. The potential for
optimized disvalue in these simulations suggests near-unimaginable
horrors. Vast numbers of digital people in these simulated environments
could be subjected to immeasurably agonizing experiences.</p>
<p><strong>Simulated torture at this scale could make the future more
disvaluable than valuable.</strong> Simulations designed for the purpose
of extortion would likely be far more disvaluable than simulations which
contain disvalue unintentionally. The simulation’s designer would likely
be able to choose what kinds of objects to simulate, so they could avoid
wasting energy simulating non-sentient entities such as inanimate
objects. Moreover, the designer could ensure that these sentient
entities experience the greatest amount of suffering possible for the
timespan of the simulation. They might even be able to simulate minds
capable of more disvaluable experiences than have ever existed
previously, deliberately designing the digital entities to be able to
suffer as greatly as possible. Put together, a simulation optimized for
disvalue could produce several orders of magnitude more disvalue than
anything in history. This would be unprecedented in humanity’s history,
and could make a horrifying—even net negative—future.</p>
<p><strong>AI agents may be superhumanly proficient at wielding
extortion.</strong> Future AI agents may far exceed humans in their
ability to wield threats. One reason for this could be that they have
superhuman tactical capabilities, as some do already in competitive
games. Superior strategic intelligence could allow AI agents to conceive
and execute far more advanced programs of extortion than that of which
humans are generally capable. A second reason why AI agents may be
especially adept at employing threats is if they have superhuman
longevity or goal-preservation capabilities. With greater timespans
available, the action space for extorting targets is larger. Finally,
AIs may have technological capabilities that exceed those of current and
historical humans. This could widen the option space for AI extortion
still further.</p>
<p><strong>Extortion may be exceptionally effective against
AIs.</strong> Two goals of machine ethics are: 1) to foster in AI an
intrinsic value for humanity (and humanity’s values); 2) to make AI
agents that are impartial. Both goals could result in AI agents being
more vulnerable to extortion than humans tend to be. Let us examine an
example of this for each goal.<p>
<em>Goal 1: Foster in AI an intrinsic value for humanity (and humanity’s
values).</em><p>
AI agents that value individual humans highly may be less prone to
“scope insensitivity.” This is the human bias of failing to “feel”
changes in the size of some value appropriately. Very small or very
large numbers often appear to us to be of similar size to other very
small or very large numbers, even when they actually differ by orders of
magnitude. Human scope insensitivity may provide some protection against
larger-scale extortion, as it lowers the motivation of extortionists to
increase the scale of their threats. It is possible that AI agents may
prioritize outcomes more accurately in accordance with their expected
value. If this is the case, they would likely be more responsive to high
stakes, and more vulnerable to large-scale extortion attempts.<p>
<em>Goal 2: Make AI agents that are impartial.</em><p>
Impartial AI agents may have far more altruistic values than any human
or institution. These agents may be extremely vulnerable to extortion in
the form of threats against their impartial moral codes. Extortionist AI
agents could leverage the threat of extreme torture of countless digital
sentients in simulated environments to extort more morally impartial AI
targets. The execution of any such threat could immensely degrade the
value of the future.</p>
<p><strong>AI extortionists may execute higher-stakes threats more
frequently than humans.</strong> A successful act of extortion is the
deliberate creation of a state in which both the extortionists and their
targets prefer the outcome the extortionist demands. In some sense, both
parties therefore <em>want</em> the target to acquiesce to the extortion
and the extortionist not to follow through on their threat. In this way,
both usually have some incentive to avoid the threat being executed.
However, out of a desire to signal credibility in future interactions,
extortionists must follow through on threats occasionally. Consider
examples such as hostage ransoming or criminal syndicate protection
rackets. Successful future extortion requires a signal of commitment,
such as destroying the property of those who defy the
extortionists.<p>
AIs may carry out more frequent and more severe threats than humans tend
to. One reason for this is that they may have different value systems
which tolerate higher risks, reducing their motivation to acquiesce to
extortion. For example, an AI agent that sufficiently values the far
future may prefer to demotivate future extortionists from trying to
extort them. They may therefore defy a current extortion attempt,
tolerating even very large costs to them and others, for the long-term
benefit of credibly signaling that future extortion attempts will not
work either.<p>
More generally, with a greater variety of value systems, a greater
number of agents, and a greater action space size, miscalibrated
extortion attempts are more likely. Where the threat is insufficient to
force compliance, the aforementioned need to signal credibility
incentivizes the extortionist to execute their threat as punishment for
their target’s refusal to submit.</p>
<p><strong>AI agents extorting humans.</strong> AI agents might also
extort human targets. One example scenario would be an AI developing
both a weaponized biological pathogen, and an effective cure <span class="citation"
data-cites="patel2023takeover">[27]</span>. If the
pathogen is slow-acting, the AI agent could then extort humans by
deploying the bioweapon, and leveraging the promise of its cure to force
those infected into complying with its demands. Pathogens that are
sufficiently fast to spread and difficult to detect could infect a very
large number of human targets, so this tactic could enable extremely
large-scale threats to be wielded effectively.</p>

<h3 id="summary-1">Summary</h3>
<p>The Iterated Prisoner’s Dilemma involves repeated rounds of the
Prisoner’s Dilemma game. This iteration offers a chance for agent
cooperation but doesn’t ensure it. There are different strategies by
which agents can attempt to maximize their overall payoffs. These
strategies can be studied by competing agents against one another in
tournaments, where each agent competes against others in multiple rounds
before switching partners.<p>
This provides cause for concern about a future containing many AI agents. One example of this is the phenomenon of 
“races” between AI stakeholders. These races strongly influence the speed and direction of AI technological production, deployment and adoption, in both corporate and military settings, and have the
potential to exacerbate many of the intrinsic risks from AI. The
dynamics we have explored in this section might cause competing agencies
to cut corners on safety, escalate weaponized AI applications and
automate warfare. These are two examples of how competitive pressures,
modeled as iterated interactions between agents, can generate races
which increase the risk of catastrophe for everyone. Fostering
cooperation between different parties—human individuals, corporations,
nations, and AI agents—is vital for ensuring our collective safety.</p>
<h2 id="collective-action-problems">7.2.5 Collective Action Problems</h2>
<p>We began our exploration of game theory by looking at a very simple
game, the Prisoner’s Dilemma. We have so far considered two ways to
model real-world social scenarios in more detail. First, we explored
what happens when two agents interact <em>multiple times</em> (such as
an Iterated Prisoner’s Dilemma match). Second, we introduced a
population of <em>more than two</em> agents, where each agent switches
partners over time (such as an Iterated Prisoner’s Dilemma tournament).
Now we move beyond pairwise interactions, to interactions that
simultaneously involve more than two agents. We consider what happens
when an agent engages in repeated rounds of the Prisoner’s Dilemma
against multiple opponents at the same time.<p>
One class of scenarios that can be described by such a model is
<em>collective action problems</em>. Throughout this section, we first
discuss the core characteristics of collective action problems. Then, we
introduce a series of real-world examples to highlight the ubiquity of
these problems in human society and show how AI races can be
modeled in this way. Following this, we transition to a brief discussion
of common pool resource problems to further illustrate the difficulty
with which rational agents, especially AI agents, may secure
collectively good outcomes. Finally, we conclude with a detailed
discussion of flash wars and autonomous economies to show how in a
multi-agent setting, AIs might pursue behaviors or tactics that result
in catastrophic or existential risks to humans.</p>
<h3 id="introduction-1">Introduction</h3>
<p>This first section explores the nature of collective action problems.
We begin with a simple example of a collaborative group project. Through
this, we explore how individual incentives can sometimes clash with what
is in the best interests of the group as a whole. These situations can
motivate individuals to act in ways that negatively impact all of the
population.</p>
<p><strong>A collective action problem is like a group-level Iterated
Prisoner’s Dilemma.</strong> In the Iterated Prisoner’s Dilemma, we saw
how a pair of rational agents can tend towards outcomes that are
undesirable for both. Now let us consider social interactions between
more than two agents. When an individual engages with multiple partners
simultaneously, they may still converge on Pareto inefficient Nash
equilibria. In fact, with more than two agents, cooperation can be even
harder to secure. We can therefore model collective action problems as
an Iterated Prisoner’s Dilemma in which more than two prisoners have
been arrested: If enough of them decide to defect on their partners, all
of them will suffer the consequences.</p>
<p><strong>Example: group projects.</strong> A typical example of a
collective action problem is that of a collaborative project. A group
working together towards a shared goal often encounters a problem: not
everyone pitches in. Some group members take advantage of the rest,
benefiting from the work others are doing without committing as much
effort themselves. The implicit reasoning behind the behavior of these
“slackers” is as follows. They want the group’s goal to be achieved, but
they would prefer this to happen without costing them much personal
effort. Just as with the Prisoner’s Dilemma, “slacking” is their
dominant strategy. If the others work hard and the project is completed,
they get to enjoy the benefits of this success without expending too
much effort themselves. If the others fail to work hard and the project
is not completed, they at least save themselves the effort they might
otherwise have wasted.<p>
As groups increase in size and heterogeneity, complexity increases
accordingly. Agents in a population may have a diverse set of goals.
Even if the population can agree on a common goal, aligning diverse
agents with this goal can be difficult. For example, even when the
public expresses strong and widespread support for a political measure,
their representatives often fail to carry it out.</p>
<h3 id="sec:formalization">Formalization</h3>
<p>Here, we formalize our model of collective action problems. We look
more closely at the incentives governing individual choices, and the
effects these have at the group level. We examine how the behavior of
others in the group can alter the incentives facing any individual, and
how we can (and do) use these mechanisms to promote cooperative behavior
in our societies.</p>
<p><strong>Each agent must choose whether to contribute to the common
good.</strong> As in the Prisoner’s Dilemma, each agent must choose
which of two actions to take. An agent can choose to
<strong>contribute</strong> to the common good, at some cost to
themselves. The alternative is for the agent to choose to <strong>free
ride</strong>, benefiting from others’ contributions at no personal
cost. Free riders impose <strong>negative
externalities</strong>—collateral damage for others in pursuit of
private benefit—on the group as a whole by choosing not to pitch in.</p>
<p><strong>Free riding is the dominant strategy.</strong> For now, let
us assume that free riding increases an agent’s own personal benefit,
regardless of whether the others contribute or free ride: it is the
dominant strategy. If an agent’s contribution to the common good is
small, then choosing <em>not</em> to contribute does not significantly
diminish the collective good, meaning that an agent’s decision to free
ride has essentially no negative consequences for the agent themself.
Thus, the agent is choosing between two outcomes. The first outcome is
where they gain their portion of the collective benefit, and pay the
small cost of being a contributor. The other outcome is where they gain
this same benefit, but save themselves the cost of contributing.</p>
<p><strong>Free riding can produce Pareto inefficient outcomes.</strong>
Just as how both agents defecting in the Prisoner’s Dilemma produces
Pareto inefficiency, free riding in a collective action problem can
result in an outcome that is bad for all. In many cases, some agents can
free ride without imposing significant externalities on everyone else.
However, if sufficiently many agents free ride, this diminishes the
collective good by leading to no provision of a public good, for
instance. With sufficient losses, the agents will all end up worse than
if they had each paid the small individual cost of contributing and
received their share of the public benefit. Importantly, however, even
in this Pareto inefficient state, free riding might still be the
dominant strategy for each individual, since the cost of contributing
outweighs the trivial increase in collective good they would contribute
by contributing. Thus, escaping undesirable equilibria in a collective
action problem can be exceedingly difficult; see Figure 7.8.<p>
</p>
<figure id="fig:collective">
<img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/freeriding.png" class="tb-img-full"/>
<p class="tb-caption">Figure 7.8: In this abstract collective action problem, we can move from everyone (contributes) to
right (no one contributes). As more people free ride, the collective good disappears, leaving everyone
in a state where they would all prefer to collectively contribute instead.
</p>
<!--<figcaption>An abstract collective action problem</figcaption>-->
</figure>
<p><em>A) All agents contribute, maximizing the collective good. B) A
minority of agents switch to free riding, gaining higher payoffs at no
cost to the collective good. C) More agents switch to free riding. Now
the collective good begins to diminish, lowering the payoffs for
contributors and free riders alike. D) Almost everyone free rides. All
have lower payoffs than if they were all to contribute
instead.</em><p>
<br><br>
We can illustrate a collective action problem using the simple payoff
matrix below. In the matrix, “b” represents the payoff an agent receives
when everyone else cooperates (the collective good divided between the
number of agents) and “c” represents the personal cost of cooperation.
As the matrix illustrates, the dominant strategy for a rational agent
(“you”) here is to free ride whether everyone else contributes or free
rides.<p>
</p>
<table class="prisonTable">
<caption>Table 7.4: Free riding is always better for an individual: it is a dominant strategy</caption>
<thead>
<tr class="header">
<th style="text-align: center;"></th>
<th style="text-align: left;"><strong>The rest of the group
contributes</strong></th>
<th style="text-align: left;"><strong>The rest of the group free
rides</strong></th>
<th style="text-align: left;"><strong>Some contribute; others free
ride</strong></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;"><strong>You contribute</strong></td>
<td style="text-align: center;">b - c</td>
<td style="text-align: center;">-c</td>
<td style="text-align: center;">&lt; b - c</td>
</tr>
<tr class="even">
<td style="text-align: center;"><strong>You free ride</strong></td>
<td style="text-align: center;">b</td>
<td style="text-align: center;">0</td>
<td style="text-align: center;">&lt;b</td>
</tr>
</tbody>
</table>
<br>
<p><strong>Agents’ incentives depend on the behavior of other
agents.</strong> Agents in collective action problems can be aware of
the choices other agents make, which can affect their strategies and
behavior over time. For example, the ratio of defectors to cooperators
in a population can affect the degree to which cooperation is achieved.
When rational agents interact with each other, they may be inclined to
shift their strategies to more favorable ones with higher individual
payoffs: they may realize that other agents are utilizing more
successful strategies, and thus choose to adopt them. If defectors
dominate the population initially, and the initial individual costs of
cooperation outweigh the collective benefits of cooperation, then the
population may tend towards an uncooperative state. In simple terms,
collective action problems cannot be solved without cooperation.</p>
<p><strong>Mutual and external coercion.</strong> We can increase the
probability of cooperation by generating incentives that lower the
individual cost of cooperation and increase the individual cost of
defection. There are two ways we may go about this: mutual or external
coercion. <em>Mutual coercion</em> generates cooperative incentives by
establishing communal, societal, and reputational norms. <em>External
coercion</em> generates cooperative incentives through external
intervention, by developing regulations that incentivize collective
action through mandates, sanctions, and legislature, making cooperation
a necessity in certain cases. Below, we illustrate some real-world
scenarios in further detail.</p>
<h3 id="real-world-examples-of-collective-action-problems">Real-World
Examples of Collective Action Problems</h3>
<p>Many large-scale societal issues can be understood as collective
action problems. This section explores collective action problems in the
real world: climate change, public health, and democratic voting. We end
by briefly looking at AI races through this same lens.</p>
<p><strong>Public health.</strong> We can model some public health
emergencies, such as disease epidemics, as collective action problems.
The COVID-19 pandemic took the lives of millions worldwide. Some of these
deaths could have been avoided with stricter compliance with public
health measures such as social distancing, frequent testing, and
vaccination. We can model those adhering to these measures as
“contributing” (by incurring a personal cost for public benefit) and
those violating them as “free riding.”<p>
Assume that everyone wished the pandemic to be controlled and ultimately
eradicated, that complying with the suggested health measures would have
helped hasten this goal, and that the benefits of collectively
shortening the pandemic timespan would have outweighed the personal
costs of compliance with these measures (such as social isolation).
Everyone would prefer the outcome where they all complied with the
health measures over the one where few of them did. Yet, each person
would prefer still better the outcome where <em>everyone else</em>
adhered to the health measures, and <em>they alone</em> were able to
free ride. Violating the health measures was therefore the dominant
strategy, and so many people chose to do this, imposing the negative
externalities of excessive disease burden on the rest of their
community.<p>
We used both mutual and external mechanisms to coerce people to comply
with public health measures in the pandemic. For example, some
communities adjusted their social norms (mutual coercion) such that
non-compliance with public health measures would result in damage to
one’s reputation. We also required proof of vaccination for entry into
desirable social spaces (external coercion), among many other
requirements.</p>
<p><strong>Anthropogenic climate change.</strong> In 2021, a majority of
those surveyed worldwide reported wanting to avert catastrophic
anthropogenic climate change. Most, however, chose not to act in
accordance with what they believed necessary to achieve this goal. The
consumption of animal products typically entails far higher greenhouse
gas emissions and environmental damage than plant-based alternatives.
The use of public over private transport similarly reduces personal
carbon footprints dramatically. To avoid the costs of taking these
actions, such as changing routines and compromising on speed or ease,
most people do not change their diets or transport habits. Various
behaviors that increase pollution can be viewed as “free riding.” Since
this is the dominant strategy for each agent, most choose to do this,
resulting in ever-worsening climate change, imposing risks on the global
population.<p>
We could disincentivize excessive meat eating and private transport
using external and mutual coercion. In this example, external coercion
could include lowering bus and train fares and enhancing existing
infrastructure through government subsidies, as well as implementing
fuel taxes on private vehicles. Mutual coercion could include changing
social norms to consider excessive meat eating or short-haul flying
unacceptable.</p>
<p><strong>Democracy.</strong> We can model the maintenance of a
democracy as a set of collective action problems. There are many
situations in which certain actions might provide an individual with
immediate benefits, but would incur longer-term costs on the larger
group if more people were to take these actions. For example, a voting
population must maintain certain norms in order to keep its democracy
functioning. One of these norms is to vote only for candidates who will
not undermine democratic processes, even if others have desirable
traits.<p>
Choosing whether or not to participate in an election at all can
similarly be viewed as a collective action problem. The outcome of an
election is determined by the votes of individuals, each of which has a
choice to either vote or abstain. The results of the election are
determined by the votes of those who choose to participate, and the
costs of participating in the election are carried by citizens
themselves, such as the time and effort required to register and cast a
vote. When large enough numbers of citizens decide to abstain from
voting, the collective outcome of an election may not accurately reflect
the preferences of the population: by acting in accordance with their
rational self interest, citizens may contribute to a suboptimal
collective outcome.</p>
<p><strong>Common pool resource problem.</strong> Rational agents are
incentivized to take more than a sustainable amount of a shared
resource. This is called a <em>common pool resource problem</em> or
<em>tragedy of the commons problem</em>. We refer to a common pool
resource becoming catastrophically depleted as collapse. Collapse occurs
when rational agents, driven by their incentive to maximize personal
gain, tip the available supply of the shared resource below its
sustainability equilibrium <span class="citation"
data-cites="diamond2011collapse">[10]</span>. Below, we further
illustrate how complicated it is to secure collectively good outcomes,
especially when rational agents act in accordance with their
self-interest. Such problems are prevalent at the societal level, and
often bear catastrophic consequences. Thus, we should not eliminate the
possibility that they may also occur with AI agents in a multi-agent
setting.<p>
For example, rainforests around the world have been diminished greatly
by deforestation practices. While these forests still exist as a home to
millions of different species and many local communities, they may reach
a point at which they will no longer be able to rejuvenate themselves.
If these practices are sustained, the entire ecosystem these forests
support could collapse. Common pool resource problems exemplify how
agents may bring about catastrophes even when they behave rationally and
in their self-interest, with perfect knowledge of the looming
catastrophe, and despite the seeming ability to prevent it. They further
illustrate how complicated it can be to secure collectively good
outcomes and how rational agents can act to the detriment of their own
group. As with many other collective action problems, we can’t expect to
solve common pool resource problems by having AIs manage them. If we
simply pass the buck to AI representatives, the AIs will inherit the
same incentive structure that produces the common pool resource problem,
and so the problem will likely remain.</p>
<h3 id="sec:AI-races-Multiple">AI Races Between More Than Two
Competitors</h3>
<p>In the previous section, we looked at how corporations and militaries
may compete with one another in “AI races.” We used a two-player
“attrition” bidding model to see why AI companies cut corners on safety
when developing and deploying their technologies. We used another
two-player “security dilemma” model to understand how security concerns
motivate nations to escalate their military capabilities, even while
increasing the risks imposed on all by increasingly automating warfare
in this manner.</p>
<p>Here, we extend our models of these races to consider more than two
parties, allowing us to see them as collective action problems. First,
we look at how military AI arms races increase the risk of catastrophic
outcomes such as a <em>flash war</em>: a war that is triggered by
autonomous AI agents that quickly spirals out of human control <span
class="citation" data-cites="hendrycks2023overview">[4]</span>. Second,
we explore how ever-increasing job automation could result in an
<em>autonomous economy</em>: an economy in which humans no longer have
leverage or control.</p>
<p><strong>Military AI arms race outcome: flash war.</strong> The
security dilemma model we explored in the previous section can be
applied to more than two agents. In this context, we can see it as a
collective action problem. Though all nations would be at lower risk if
all were to cooperate with one another (“contribute” to their collective
safety), each will individually do better instead to escalate their own
military capabilities (“free ride” on the contributions of the other
nations). Here, we explore one potentially catastrophic outcome of this
collective action problem: a flash war.<p>
As we saw previously, military AI arms races motivate nations to
automate military procedures. In particular, there are strong incentives
to integrate “automated retaliation” protocols. Consider a scenario in
which several nations have constructed an autonomous AI military defense
system to gain a defensive military advantage. These AIs must be able to
act on perceived threats without human intervention. Additionally, each
is aligned with a common goal: “defend our nation from attack.” Even if
these systems are nearly perfect, a single erroneous detection of a
perceived threat could trigger a decision cascade that launches the
nation into a “flash war.” Once one AI system hallucinates a threat and
issues responses, the AIs of the nations being targeted by these
responses will follow suit, and the situation could escalate rapidly. A
flash war would be catastrophic for humanity, and might prove impossible
to recover from.<p>
A flash war is triggered and amplified by successive interactions
between autonomous AI agents such that humans lose control of weapons of
mass destruction <span class="citation"
data-cites="critch2021multipolar">[11]</span>. Any single military
defense AI could trigger it, and the process could continue without
human intervention and at great speed. Importantly, having humans in the
loop will not necessarily ensure our safety. Even if AIs only provide
human operators with instructions to retaliate, our collective safety
would rest on the chance that soldiers would willfully disobey their
instructions.<p>
Collective action between nations could avoid these and other dire
outcomes. Limiting the capabilities of their military AIs by decreasing
funding and halting or slowing down development would require that each
nation give up a potential military advantage. In a high stakes
scenario such as this one, rational agents (nations) may be unwilling to
give up such an advantage because it dramatically increases the
vulnerability of their nation to attack. The individual cost of
cooperation is high while the individual cost of defection is low, and
as agents continue to invest in military capabilities, competitive
pressures increase, which further exacerbate costs of cooperation —
thereby disincentivizing collective action. While the collective
benefits of cooperation would drastically reduce the catastrophic risks
of this scenario in the long-term, they may not outweigh the
self-interest of rational agents in the short-term.</p>
<p><strong>Corporate AI race outcome: autonomous economy.</strong> As
AIs become increasingly effective at carrying out human goals, they may
begin to out-perform the average human at an increasing number and range
of jobs, from personal assistants to executive decision-makers. To reap
the benefits of these faster and more effective workers, companies will
likely continue to automate economically valuable functions by
delegating them to AI agents. Ultimately, this could lead to the global
economy becoming “autonomous,” with humans no longer able to steer or
intervene <span class="citation"
data-cites="alexander2016ascended">[12]</span>.<p>
Such an autonomous economy would be a catastrophe for humanity. Like
passengers in an autonomous vehicle, our safety and destination would
rest with the AI systems now acting without our supervision. Our future
would be determined by the behavior and outputs of this autonomous
economy. If the AI agents engaged in this economy were to have
undesirable goals or evolve selfish traits—a possibility we examine in
the next section of this chapter — humanity would be unable to prevent
the harms they cause. Even if the AIs themselves are well-aligned to our
goals, the economic system itself may produce extremely undesirable
outcomes. In this section, we have examined many examples of how
macrobehavior can differ dramatically from micromotives. A population of
individuals can tend towards states that are bad for everyone and yet be
in stable equilibria. This could happen just the same with AI
representatives acting on humanity’s behalf in an autonomous
economy.<p>
Just as with the military AI arms race, we can model how an autonomous
economy might be brought about using the security dilemma model. As in
the previous example, if we expand this model to more than two agents,
we can see it as a collective action problem in which competitive
pressures drive different parties to automate economic functions out of
the need to “keep up” with their competitors. Under this model, we can
see how companies must choose whether to maintain human labor
(“contributing”) or automate these jobs using AI (“free riding”).
Although all would prefer the outcome in which the calamity of an
autonomous economy is avoided, each would individually prefer to have a
competitive advantage and not risk being outperformed by rivals who reap
the short-term benefit of using AIs. Thus, economic automation is the
dominant strategy for each competitor. Repeated rounds of this game in
which a sufficient number of agents free ride would drive us towards
this disaster. In each successive round, it would become progressively
more difficult to turn back, as we come to rely increasingly on more
capable AI agents.</p>
<p><strong>Increasing AI autonomy increases the risk of catastrophic
outcomes.</strong> As AIs become more autonomous, humans may delegate
more decision-making power to them. If AIs are able to successfully and
consistently attain the high-level objectives given to them by humans,
we may be more inclined to begin providing them with open-ended goals.
If AIs achieve these goals, humans might not be privy to the process
they follow and may overlook potential harms, as we saw in both the
autonomous economy and flash war examples. Moreover, adaptive
AIs—systems that actively adjust their computational design,
architecture and behavior in response to new information or changes in
the environment—could adapt at a much faster rate than humans. The
possibility of self-improvement among such AIs would further exacerbate
this problem. Adaptive AIs could develop unanticipated emergent
behaviors and strategies, making them deeply unpredictable. Humans could
be inclined to accept these negative behaviors in order to maintain a
competitive advantage in the short-term.</p>
<p><strong>Reducing competitive pressures could foster collective
action.</strong> The security dilemma model shows how nations can be
motivated to escalate their offensive capabilities out of the perception
that their competitors are doing the same. However, by signaling the
opposite, we might be able to produce the reverse effect, such as
military de-escalation or an increase in AI safety standards. For
instance, whether different nations will acquiesce to a shared
international standard for AI regulation may depend on whether the
nations are individually signaling their willingness to regulate in
their own jurisdiction in the first place. If one nation perceives that
others are engaging in strict domestic regulation, they might see this
as a credible signal of commitment to an international standard. By
easing the competitive pressures, we might be able to foster collective
action to avoid driving up the collective risk level.</p>
<h2 id="summary-2">Summary</h2>
<p>We observe important and seemingly intractable collective action problems in
many domains of life, such as environmental degradation, pandemic
responses, maintenance of democracies, and common pool resource
depletion. We can understand these as Iterated Prisoner’s Dilemmas with
many more than two agents interacting simultaneously in each round of
the game. As before, we see that “free riding” can be the dominant
strategy for an individual agent, and this can lead to Pareto
inefficient outcomes for the group as a whole. We can use the mechanisms
of mutual and external coercion to incentivize agents to cooperate with
each other and achieve collectively good outcomes.<p>
If we expand our models of AI races to include more than two agents, we
can understand the races themselves as collective action problems, and
examine how they exacerbate the risk of catastrophe. One example is how
increasingly automating military protocols increases the risk of a
“flash war.” Similar dynamics of automation in the economic sphere could
lead to an “autonomous economy.” Either outcome would be disastrous and
potentially irreversible, yet we can see how competitive pressures can
drive rational and self-interested agents (such as nations or companies)
down a path towards these calamities.</p>

<p>In this section, we examined some simple, formal models of how rational agents may interact with each other under varying conditions. We used these game theoretic models to understand the natural dynamics in multi-agent biological and social systems. We explored how these multi-agent dynamics can generate undesirable outcomes for all those involved. We considered some tails risks posed by interactions between human and AI agents. These included human-directed companies and militaries engaging in perilous races, as well as autonomous AIs using threats for extortion.</p>

<p>These risks can be reduced if mechanisms such as institutions are used to ensure human agencies and AI agents are able to cooperate with one another and avoid conflict. We explore some means of achieving cooperative interactions in the next section of this chapter, 7.3.</p>

<br>
<br>
<h3>References</h3>
<div id="refs" class="references csl-bib-body" data-entry-spacing="0"
role="list">
<div id="ref-kuhn2019prisoner" class="csl-entry" role="listitem">
<div class="csl-left-margin">[1] S.
Kuhn, <span>“Prisoner’s Dilemma.”</span> Accessed: Sep. 29, 2023.
[Online]. Available: <a
href="https://plato.stanford.edu/archives/win2019/entries/prisoner-dilemma/">https://plato.stanford.edu/archives/win2019/entries/prisoner-dilemma/</a></div>
</div>
<div id="ref-parfit1984reasons" class="csl-entry" role="listitem">
<div class="csl-left-margin">[2] D.
Parfit, <em>Reasons and persons</em>. Oxford, GB: Oxford University
Press, 1984.</div>
</div>
<div id="ref-axelrod1980effective" class="csl-entry" role="listitem">
<div class="csl-left-margin">[3] R.
Axelrod, <span>“More effective choice in the prisoner’s dilemma,”</span>
<em>The Journal of Conflict Resolution</em>, vol. 24, no. 3, pp.
379–403, 1980, Accessed: Sep. 28, 2023. [Online]. Available: <a
href="http://www.jstor.org/stable/173638">http://www.jstor.org/stable/173638</a></div>
</div>
<div id="ref-hendrycks2023overview" class="csl-entry" role="listitem">
<div class="csl-left-margin">[4] D.
Hendrycks, M. Mazeika, and T. Woodside, <span>“An overview of
catastrophic AI risks.”</span> 2023. Available: <a
href="https://arxiv.org/abs/2306.12001">https://arxiv.org/abs/2306.12001</a></div>
</div>
<div id="ref-smith1974theory" class="csl-entry" role="listitem">
<div class="csl-left-margin">[5] J.
Maynard Smith, <span>“The theory of games and the evolution of animal
conflicts,”</span> <em>Journal of Theoretical Biology</em>, vol. 47, no.
1, pp. 209–221, 1974, doi: <a
href="https://doi.org/10.1016/0022-5193(74)90110-6">https://doi.org/10.1016/0022-5193(74)90110-6</a>.</div>
</div>
<div id="ref-nisan2007algorithmic" class="csl-entry" role="listitem">
<div class="csl-left-margin">[6] N.
Nisan, T. Roughgarden, E. Tardos, and V. Vazirani, <em>Algorithmic game
theory</em>. Cambridge University Press, 2007. doi: <a
href="https://doi.org/10.1017/CBO9780511800481">10.1017/CBO9780511800481</a>.</div>
</div>
<div id="ref-dafoe2022governance" class="csl-entry" role="listitem">
<div class="csl-left-margin">[7] A.
Dafoe, <span>“<span class="nocase">AI Governance: Overview and
Theoretical Lenses</span>,”</span> in <em><span class="nocase">The
Oxford Handbook of AI Governance</span></em>, Oxford University Press,
2022. doi: <a
href="https://doi.org/10.1093/oxfordhb/9780197579329.013.2">10.1093/oxfordhb/9780197579329.013.2</a>.</div>
</div>
<div id="ref-lee2021visions" class="csl-entry" role="listitem">
<div class="csl-left-margin">[8] </div><div
class="csl-right-inline">K.-F. Lee and C. Qiufan, <em>AI 2041 ten
visions for our future</em>. Currency, 2021.</div>
</div>
<div id="ref-herz1950idealist" class="csl-entry" role="listitem">
<div class="csl-left-margin">[9] J.
H. Herz, <span>“Idealist internationalism and the security
dilemma,”</span> <em>World Politics</em>, vol. 2, no. 2, pp. 157–180,
1950, doi: <a
href="https://doi.org/10.2307/2009187">10.2307/2009187</a>.</div>
</div>
<div id="ref-diamond2011collapse" class="csl-entry" role="listitem">
<div class="csl-left-margin">[10] J.
Diamond, <em>Collapse: How societies choose to fail or succeed: Revised
edition</em>. Penguin Books (London), 2005.</div>
</div>
<div id="ref-critch2021multipolar" class="csl-entry" role="listitem">
<div class="csl-left-margin">[11] A.
Critch, <span>“What multipolar failure looks like, and robust
agent-agnostic processes (RAAPs).”</span> Accessed: Sep. 29, 2023.
[Online]. Available: <a
href="https://www.alignmentforum.org/posts/LpM3EAakwYdS6aRKf/what-multipolar-failure-looks-like-and-robust-agent-agnostic">link</a></div>
</div>
<div id="ref-alexander2016ascended" class="csl-entry" role="listitem">
<div class="csl-left-margin">[12] S.
Alexander, <span>“Ascended economy?”</span> [Online]. Available: <a
href="https://slatestarcodex.com/2016/05/30/ascended-economy/">https://slatestarcodex.com/2016/05/30/ascended-economy/</a></div>
</div>
<div id="ref-axelrod1981evolution" class="csl-entry" role="listitem">
<div class="csl-left-margin">[13] R.
Axelrod and W. D. Hamilton, <span>“The evolution of cooperation,”</span>
<em>Science</em>, vol. 211, no. 4489, pp. 1390–1396, 1981, doi: <a
href="https://doi.org/10.1126/science.7466396">10.1126/science.7466396</a>.</div>
</div>
<div id="ref-nowak1992titfortat" class="csl-entry" role="listitem">
<div class="csl-left-margin">[14] M.
Nowak and K. Sigmund, <span>“Tit-for-tat in heterogeneous
populations,”</span> <em>Nature</em>, vol. 355, pp. 250–253, Jan. 1992,
doi: <a
href="https://doi.org/10.1038/355250a0">10.1038/355250a0</a>.</div>
</div>
<div id="ref-Nowak2006FiveRF" class="csl-entry" role="listitem">
<div class="csl-left-margin">[15] M.
A. Nowak, <span>“Five rules for the evolution of cooperation,”</span>
<em>Science</em>, 2006.</div>
</div>
<div id="ref-garcia2018strategy" class="csl-entry" role="listitem">
<div class="csl-left-margin">[16] J.
García and M. van Veelen, <span>“No strategy can win in the repeated
prisoner’s dilemma: Linking game theory and computer
simulations,”</span> <em>Frontiers in Robotics and AI</em>, vol. 5, p.
102, Aug. 2018, doi: <a
href="https://doi.org/10.3389/frobt.2018.00102">10.3389/frobt.2018.00102</a>.</div>
</div>
<div id="ref-Smith1982EvolutionAT" class="csl-entry" role="listitem">
<div class="csl-left-margin">[17] J.
M. Smith, <span>“Evolution and the theory of games,”</span> <em>American
scientist</em>, 1982.</div>
</div>
<div id="ref-dawkins2015evolutionarily" class="csl-entry"
role="listitem">
<div class="csl-left-margin">[18] R.
Dawkins, <span>“Evolutionarily stable strategies ft. Richard
dawkins.”</span> [Online]. Available: <a
href="https://www.youtube.com/watch?v=mUxt--mMjwA&amp;ab_channel=Veritasium">https://www.youtube.com/watch?v=mUxt--mMjwA&amp;ab_channel=Veritasium</a></div>
</div>
<div id="ref-press2012iterated" class="csl-entry" role="listitem">
<div class="csl-left-margin">[19] W.
Press and F. Dyson, <span>“Iterated prisoners dilemma contains
strategies that dominate any evolutionary opponent,”</span>
<em>Proceedings of the National Academy of Sciences of the United States
of America</em>, vol. 109, pp. 10409–13, May 2012, doi: <a
href="https://doi.org/10.1073/pnas.1206569109">10.1073/pnas.1206569109</a>.</div>
</div>
<div id="ref-stewart2012extortion" class="csl-entry" role="listitem">
<div class="csl-left-margin">[20] A.
Stewart and J. Plotkin, <span>“Extortion and cooperation in the
prisoner’s dilemma,”</span> <em>Proceedings of the National Academy of
Sciences</em>, vol. 109, pp. 10134–10135, Jun. 2012, doi: <a
href="https://doi.org/10.1073/pnas.1208087109">10.1073/pnas.1208087109</a>.</div>
</div>
<div id="ref-hilbe2013evolution" class="csl-entry" role="listitem">
<div class="csl-left-margin">[21] C.
Hilbe, M. Nowak, and K. Sigmund, <span>“The evolution of extortion in
iterated prisoner’s dilemma games,”</span> <em>Proceedings of the
National Academy of Sciences of the United States of America</em>, vol.
110, Apr. 2013, doi: <a
href="https://doi.org/10.1073/pnas.1214834110">10.1073/pnas.1214834110</a>.</div>
</div>
<div id="ref-hamblin2009evolution" class="csl-entry" role="listitem">
<div class="csl-left-margin">[22] S.
Hamblin and P. L. Hurd, <span>“When will evolution lead to deceptive
signaling in the sir philip sidney game?”</span> <em>Theoretical
Population Biology</em>, vol. 75, no. 2, pp. 176–182, 2009, doi: <a
href="https://doi.org/10.1016/j.tpb.2009.02.002">https://doi.org/10.1016/j.tpb.2009.02.002</a>.</div>
</div>
<div id="ref-greene1981coral" class="csl-entry" role="listitem">
<div class="csl-left-margin">[23] H.
W. Greene and R. W. McDiarmid, <span>“Coral snake mimicry: Does it
occur?”</span> <em>Science</em>, vol. 213, no. 4513, pp. 1207–1212,
1981, doi: <a
href="https://doi.org/10.1126/science.213.4513.1207">10.1126/science.213.4513.1207</a>.</div>
</div>
<div id="ref-jackson1998spider" class="csl-entry" role="listitem">
<div class="csl-left-margin">[24] R.
R. Jackson and R. S. Wilcox, <span>“Spider-eating spiders: Despite the
small size of their brain, jumping spiders in the genus portia outwit
other spiders with hunting techniques that include trial and
error,”</span> <em>American Scientist</em>, vol. 86, no. 4, pp. 350–357,
1998, Accessed: Sep. 28, 2023. [Online]. Available: <a
href="http://www.jstor.org/stable/27857059">http://www.jstor.org/stable/27857059</a></div>
</div>
<div id="ref-schiestl2005success" class="csl-entry" role="listitem">
<div class="csl-left-margin">[25] F.
Schiestl, <span>“On the success of a swindle: Pollination by deception
in orchids,”</span> <em>Die Naturwissenschaften</em>, vol. 92, pp.
255–64, Jul. 2005, doi: <a
href="https://doi.org/10.1007/s00114-005-0636-y">10.1007/s00114-005-0636-y</a>.</div>
</div>

<div id="ref-mokkonen2015evolutionary" class="csl-entry"
role="listitem">
<div class="csl-left-margin">[26] M.
Mokkonen and L. Carita, <span>“The evolutionary ecology of
deception,”</span> <em>Biological Reviews</em>, vol. 91, pp. n/a–n/a,
Jun. 2015, doi: <a
href="https://doi.org/10.1111/brv.12208">10.1111/brv.12208</a>.</div>
</div>
<div id="ref-patel2023takeover" class="csl-entry" role="listitem">
<div class="csl-left-margin">[27] D. Patel (Host) and Carl Shulman, <span>“AI Takeover, Bio &amp; Cyber Attacks, Detecting Deception, & Humanity's Far Future.,”</span>
    <em>Dwarkesh Podcast</em>, 2023, <a href="https://www.youtube.com/watch?v=KUieFuV1fuo&ab_channel=DwarkeshPatel">Audio podcast episode</a></div>
</div>
</div>