-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathaises_7_2
1945 lines (1928 loc) · 116 KB
/
aises_7_2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<style type="text/css">
table.prisonTable{
margin: auto;
border: 1px solid;
border-collapse: collapse;
border-spacing: 1px;
caption-side: bottom;
}
table.prisonTable tr{
border: 1px solid;
border-collapse: collapse;
padding: 5px;
}
table.prisonTable th{
border: 1px solid;
border-collapse: collapse;
padding: 3px;
}
table.prisonTable td{
border: 1px solid;
padding: 5px;
}
</style>
<style>
.clearfix::after {
content: "";
clear: both;
display: table;
}
.img-container {
float: left;
width: 48%;
padding: 5px;
}
</style>
<h1 id="game-theory">7.2 Game Theory</h1>
<h2 id="overview">7.2.1 Overview</h2>
<p>This chapter explores the dynamics that may arise when AI and human
agents interact. These interactions create risks distinct from those
generated by any individual AI agent acting in isolation. One way we can
study the strategic interdependence of agents is with the framework of
<em>game theory</em>. Using game theory, we can examine formal models of
how agents interact with each other under varying conditions and predict
the outcomes of these interactions.<p>
Here, we use game theory to present natural dynamics in biological and
social systems that involve multiple agents. In particular, we explore
what might cause agents to come into conflict with one another, rather
than cooperate. We show how these multi-agent dynamics can generate
undesirable outcomes, sometimes for all the agents involved. We consider
risks created by interactions within and between human and AI agents,
from human-directed companies and militaries engaging in perilous races
to autonomous AIs using threats for extortion. These risks can be reduced
if mechanisms such as institutions are used ensure human agencies and AI
agents are able to cooperate with one another and avoid conflict. We will
be exploring means of overcoming commitment and information problems in
the Conflict and Cooperation section of this chapter.</p>
<p><strong>We start with an overview of the fundamentals of game
theory.</strong> We begin this section by setting out the
characteristics of game theoretic agents. We also categorize the
different kinds of games we are exploring.</p>
<p><strong>We then focus on the Prisoner’s Dilemma.</strong> The
Prisoner’s Dilemma is a simple example of how an interaction between two
agents can generate an equilibrium state that is bad for both, even when
each acts rationally and in their own self-interest. We explore how
agents may arrive at the outcome where neither chooses to cooperate. We
use this to model real-world phenomena, such as negative political
campaigns. Finally, we examine ways we might foster rational cooperation
between self-interested AI agents, such as by altering the values in the
underlying payoff matrices. The key upshot is that intelligent and
rational agents do not always achieve good outcomes.</p>
<p><strong>We next add in the element of time by examining the
Iterated Prisoner’s Dilemma.</strong> AI agents are unlikely to interact
with others only once. When agents engage with each other multiple
times, this creates its own hazards. We begin by examining how iterating
the Prisoner’s Dilemma alters the agents’ incentives—when an agent’s
behavior in the present can influence that of their partner in the
future, this creates an opportunity for rational cooperation. We study
the effects of altering some of the variables in this basic model:
uncertainty about future engagement and the necessity to switch between
multiple different partners. We look at why the cooperative strategy
<em>tit-for-tat</em> is usually so successful, and in what circumstances
it is less so. Finally, we explore iterated multi-agent social dynamics
amongst humans, such as corporate AI races and military AI arms races.
The key upshot is that cooperation cannot be ensured merely by iterating
interactions through time.</p>
<p><strong>We then move on to consider group-level interactions.</strong> AI
agents might not interact with others in a neat, pairwise fashion, as
assumed by the models previously explored. In the real world, social
behavior is rarely so straightforward. Interactions can take place
between more than two agents at the same time. A group of agents creates
an environmental structure that may alter the incentives directing
individual behavior. Human societies are rife with dynamics generated by
group-level interactions that result in undesirable outcomes. We begin
by formalizing “collective action problems.” We consider real-world
examples such as anthropogenic climate change and fishery depletion.
Multi-agent dynamics such as these generate AI risk in several ways.
Races between human agents and agencies could trigger flash wars between
AI agents or the automation of economies to the point of human
enfeeblement. The key upshot is that achieving cooperation and ensuring
collectively good outcomes is even more difficult in interactions
involving more than two agents.</p>
<h2 id="game-theory-fundamentals">7.2.2 Game Theory Fundamentals</h2>
<p>In this section, we briefly run through some of the fundamental
principles of game theory. Game theory is the branch of mathematics
concerned with agents’ choices and strategies in multi-agent
interactions. Game theory is so-called because we reduce complex
situations to abstract games where agents maximize their payoffs. Using
game theory, we can study how altering incentives influences the
strategies that these agents use.</p>
<p><strong>Agents in game theory.</strong> We usually assume that the
agents in these games are self-interested and rational. Agents are
“self-interested” if they make decisions in view of their own utility,
regardless of the consequences to others. Agents are said to be
“rational” if they act as though they are maximizing their utility.</p>
<p><strong>Games can be “zero sum” or “non-zero sum.”</strong> We can
categorize the games we are studying in different ways. One distinction
is between zero sum and non-zero sum games. A
<strong>zero sum</strong> game is one where, in every outcome, the
agents’ payoffs all sum to zero. An example is “tug of war”: any benefit
to one party from their pull is necessarily a cost to the other.
Therefore, the total value of these wins and losses cancel out. In other
words, there is never any net change in total value. Poker is a zero sum
game if the players’ payoffs are the money they each finish with. The
total amount of money at a poker game’s beginning and end is the same —
it has simply been redistributed between the players.<p>
By contrast, many games are non-zero sum. In <em>non-zero</em> sum
games, the total amount of value is not fixed and may be changed by
playing the game. Thus, one agent’s win does not necessarily require
another’s loss. For instance, in cooperation games such as those where
players must meet at an undetermined location, players only get the
payoff together if they manage to find each other. As we shall see, the
Prisoner’s dilemma is a non-zero sum game, as the sum of payoffs changes
across different outcomes.</p>
<p><strong>Non-zero sum games can have “positive sum” or “negative sum”
outcomes.</strong> We can categorize the outcomes of non-zero sum games
as <em>positive sum</em> and <em>negative sum</em>. In a positive sum
outcome, the total gains and losses of the agents sum to greater than
zero. Positive sum outcomes can arise when particular interactions
result in an increase in value. This includes instances of
mutually-beneficial cooperation. For example, if one agent has flour and
another has water and heat, the two together can cooperate to make
bread, which is more valuable than the raw materials. As a real-world
example, many view the stock market as positive sum because the overall
value of the stock market tends to increase over time. Though gains are
unevenly distributed, and some investors lose money, the average
investor becomes richer. This demonstrates an important point: positive
sum outcomes are not necessarily “win-win.” Cooperating does not
guarantee a benefit to all involved. Even if extra total value is
created, its distribution between the agents involved in its creation
can take any shape, including one where some agents have negative
payoffs.<p>
In a negative sum outcome, some amount of value is lost by playing the
game. Many competitive interactions in the real world are negative sum.
For instance, consider “oil wars”—wars fought over a valuable
hydrocarbon resource. Oil wars are zero-sum with regards to oil since
only the distribution (not the amount) of oil changes. However, the
process of conflict itself incurs costs to both sides, such as loss of
life and infrastructure damage. This reduces the total amount of value.
If AI development has the potential to result in catastrophic outcomes
for humanity, then accelerating development to gain short-term profits
in exchange for long-term losses to everyone involved would be a
negative sum outcome.</p>
<h2 id="the-prisoners-dilemma">7.2.3 The Prisoner’s Dilemma</h2>
<p>Our aim in this section is to investigate how interactions between
rational agents, both human and AI, may negatively impact everyone
involved. To this end, we focus on a simple game: the Prisoner’s
Dilemma. We first explore how the game works, and its different possible
outcomes. We then examine why agents may choose not to cooperate even if
they know this will lead to a collectively suboptimal outcome. We run
through several real-world phenomena which we can model using the
Prisoner’s Dilemma, before exploring ways in which cooperation can be
promoted in these kinds of interactions. We end by briefly discussing
the risk of AI agents tending towards undesirable equilibrium
states.</p>
<h3 id="the-game-fundamentals">The Game Fundamentals</h3>
<p>In the Prisoner’s Dilemma, two agents must each decide whether or not
to cooperate. The costs and benefits are structured such that for each
agent, defection is the best strategy regardless of what their partner
chooses to do. This motivates both agents to defect.</p>
<p><strong>The Prisoner’s Dilemma.</strong> In game theory, the
<em>Prisoner’s Dilemma</em> is a classic example of the decisions of
rational agents leading to suboptimal outcomes. The basic setup is as
follows. The police have arrested two would-be thieves. We will call
them Alice and Bob. The suspects were caught breaking into a house. The
police are now detaining them in separate holding cells, so they cannot
communicate with each other. The police suspect that the pair were
planning <em>burglary</em> (which carries a lengthy jail sentence). But
they only have enough evidence to charge them with <em>trespassing</em>
(which carries a shorter jail sentence). However, the testimony of
either one of the suspects would be enough to charge the other with
burglary, so the police offer each suspect the following deal. If only
one of them rats out their partner by confessing that they had intended
to commit burglary, the confessor will be released with <em>no jail
time</em> and their partner will spend <em>eight years</em> in jail.
However, if they each attempt to rat out the other by both confessing,
they will both serve a medium prison sentence of <em>three years</em>.
If neither suspect confesses, they will both serve a short jail sentence
of only <em>one year</em>.</p>
<p><strong>The four possible outcomes.</strong> We assume that Alice and
Bob are both rational and self-interested: each only cares about
minimizing their own jail time. We define the decision facing each as
follows. They can either “cooperate” with their partner by remaining
silent or “defect” on their partner by confessing to burglary. Each
suspect faces four possible outcomes, which we can split into two
possible scenarios. Let’s term these “World 1” and “World 2”; see Figure
7.1. In World 1, their partner chooses to cooperate with them; in World 2,
their partner chooses to defect. In both scenarios, the suspect decides
whether to cooperate or defect themself. They do not know what their
partner will decide to do.<p>
</p>
<figure id="fig:pris-dillema">
<p><img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/world-one-and-two.png" alt="image"
class="tb-img-full" style="width: 80%"/></p>
<p class="tb-caption">Figure 7.1: The possible outcomes for Alice in the Prisoner’s Dilemma.</p>
</figure>
<br> <br>
<p><strong>Defection is the dominant strategy.</strong> Alice does not
know whether Bob will choose to cooperate or defect. She does not know
whether she will find herself in World 1 or World 2; see Figure 7.1. She
can only decide whether to cooperate or defect herself. This means she
is making one of two possible decisions. If she defects, she is…<p>
</p>
<div class="blockquote">
<p>…in World 1: Bob cooperates and she goes free instead of spending a
year in jail.<p>
…in World 2: Bob defects and she gets a 3-year sentence instead of an
8-year one.<p>
</p>
</div>
<p>Alice only cares about minimizing her own jail time, so she can save
herself jail time in either scenario by choosing to defect. She saves
herself one year if her partner cooperates or five years if her partner
defects. A rational agent under these circumstances will do best if they
decide to defect, regardless of what they expect their partner to do. We
call this the <em>dominant strategy</em>: a rational agent playing the
Prisoner’s Dilemma should choose to defect <em>no matter what their
partner does</em>.<p>
One way to think about strategic dominance is through the following
thought experiment. Someone in the Arctic during winter is choosing what
to wear for that day’s excursion. They have only two options: a coat or
a t-shirt. The coat is thick and waterproof; the t-shirt is thin and
absorbent. Though this person cannot control or predict the weather,
they know there are only two possibilities: either rain or cold. If it
rains, the coat will keep them drier than the t-shirt. If it is cold,
the coat will keep them warmer than the t-shirt. Either way, the coat is
the better option, so “wearing the coat” is their dominant strategy.</p>
<p><strong>Defection is the dominant strategy for both agents.</strong>
Importantly, both the suspects face this decision in a symmetric
fashion. Each is deciding between identical outcomes, and each wishes to
minimize their own jail time. Let’s consider the four possible outcomes
now in terms of both the suspects’ jail sentences. We can
display this information in a <em>payoff matrix</em>, as shown in Table
7.1. Payoff matrices are commonly
used to visualize games. They show all the possible outcomes of a game
in terms of the value of that outcome for each of the agents involved.
In the Prisoner’s Dilemma, we show the decision outcomes as the payoffs
to each suspect: note that since more jail time is worse than less,
these payoffs are negative. Each cell of the matrix shows the outcome of
the two suspects’ decisions as the payoff to each suspect.<p>
</p>
<div id="tab:payoff-matrix">
<table class="prisonTable">
<caption>Table 7.1: Each cell in this payoff matrix represents a payoff. If Alice cooperates and Bob defects,
the top right cell tells us that Alice gets 8 years in jail while Bob goes free.</caption>
<thead>
<tr class="header">
<th style="text-align: center;"></th>
<th style="text-align: center;">Bob cooperates</th>
<th style="text-align: center;">Bob defects</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">Alice cooperates</td>
<td style="text-align: center;">-1, -1</td>
<td style="text-align: center;">-8, 0 </td>
</tr>
<tr class="even">
<td style="text-align: center;"> Alice defects</td>
<td style="text-align: center;">0, -8</td>
<td style="text-align: center;">-3, -3</td>
</tr>
</tbody>
</table>
</div>
<br>
<p><em>Each cell of the matrix quantifies the decision outcome in terms
of the payoff to each: the numbers are negatives, because more jail time
represents a worse payoff. The color-coding matches the payoff to the
agent. For example, if Alice cooperates, and Bob defects, the outcome
secured is shown in the top right cell (-8, 0): this means Alice gets 8
years in jail, and Bob gets no jail time.</em></p>
<br> <br>
<h3 id="nash-equilibria-and-pareto-efficiency">Nash Equilibria and
Pareto Efficiency</h3>
<p>The stable equilibrium state in the Prisoner’s Dilemma is for both
agents to defect. Neither agent would choose to go back in time and
change their decision (to switch to cooperating) if they could not also
alter their partner’s behavior by doing so. This is often considered
counterintuitive, as the agents would benefit if they were both to
switch to cooperating.</p>
<p><strong>Nash Equilibrium: both agents will choose to defect.</strong>
Defection is the best strategy for Alice, regardless of what Bob opts to
do. The same is true for Bob. Therefore, if both are behaving in a
rational and self-interested fashion, they will both defect. This will
secure the outcome of 3 years of jail time each (the bottom-right
outcome of the payoff matrix above). Neither would wish to change their
decision, even if their partner were to change theirs. This is known as
the <em>Nash equilibrium</em>: the strategy choices from which no agent
can benefit by unilaterally choosing a different strategy. When
interacting with one another, rational agents will tend towards picking
strategies that are part of Nash equilibria.</p>
<p><strong>Pareto improvement: both agents would do better if they
cooperated.</strong> As we can see in the payoff matrix, there is a
possible outcome that is better for both suspects. If both choose the
cooperate strategy, they will secure the top-left outcome of the payoff
matrix. Each would serve 2 years less jail time at no cost to the other.
Yet, as we have seen, selecting this strategy is irrational; the
<em>defect</em> strategy is dominant and so Alice and Bob each want to
defect instead. We call this outcome <em>Pareto inefficient</em>,
meaning that it could be altered to make some of those involved better
off without making anyone else worse off. In the Prisoner’s Dilemma, the
<em>both defect</em> outcome is Pareto inefficient because it is
suboptimal for both Alice and Bob, who would both be better off if they
both cooperated instead. Where there is an outcome that is better for
some or all agents involved, and not worse for any, we call the switch
to this more efficient outcome a <em>Pareto improvement</em>. In the
Prisoner’s Dilemma, the <em>both cooperate</em> outcome is better for
both agents than the Nash equilibrium of <em>both defect</em>; see
Figure 7.2. The only Pareto
improvement possible in this game is the move from the <em>both
defect</em> to the <em>both cooperate</em> outcome; see Figure 7.3.<p>
</p>
<figure id="fig:choices">
<p><img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/Alice and Bob choices red green.png"
alt="image" class="tb-img-full" style="width: 80%"/>
<p class="tb-caption">Figure 7.2: Looking at the possible outcomes for both suspects in the Prisoner’s Dilemma, we can
see that there is a possible Pareto improvement from the Nash equilibrium. The numbers represent
their payoffs (rather than the length of their jail sentence).</p>
<!--<figcaption>The possible outcomes for both suspects in the Prisoner’s-->
<!--Dilemma</figcaption>-->
</figure>
<p><em>A) Shown is the same decision tree as in Figure 7.1, but for both
suspects. Rather than jail sentences, we show payoffs (negative numbers,
rather than positive). B) The outcome where both suspects get the “-3”
payoff is the Nash equilibrium, since defection is the dominant strategy
for both. However, this outcome is Pareto inefficient, as both suspects
would do better if both chose instead to cooperate, securing the outcome
in which both get the “-1” payoff. Both switching to cooperation would
produce a Pareto improvement.</em><p>
</p>
<br> <br>
<figure id="fig:pareto-efficiency">
<img
src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/Alice-bob-payout-purple-green.png" class="tb-img-full" style="width: 80%"/>
<p class="tb-caption">Figure 7.3: Both suspects’ payoffs, in each of the four decision outcomes. Moving right increases
Alice’s payoff, and moving up improves Bob’s payoff. A Pareto improvement requires moving right
and up, as shown by the green arrow. <span class="citation"
data-cites="kuhn2019prisoner">[1]</span></p>
<!--<figcaption>The possible outcomes for both suspects in the Prisoner’s-->
<!--Dilemma - adapted from <span class="citation"-->
<!--data-cites="kuhn2019prisoner">[1]</span></figcaption>-->
</figure>
<p><em>Both suspects’ payoffs, in each of the four decision outcomes.
Movement right through the graphspace represents a better payoff for
Alice; movement up represents a better payoff for Bob. A Pareto
improvement must therefore be a movement both right and up. There is
only one such move possible, shown as a green arrow: from “-3,-3” (both
defect) to “-1,-1” (both cooperate).</em></p>
<br> <br>
<h3 id="real-world-examples-of-the-prisoners-dilemma">Real-World
Examples of the Prisoner’s Dilemma</h3>
<p>The Prisoner’s Dilemma has many simplifying assumptions.
Nevertheless, it can be a helpful lens through which to understand
social dynamics in the real world. Rational and self-interested parties
often produce states that are Pareto inefficient. There exist
alternative states that would be better for all involved, but reaching
these requires individually irrational action. To illustrate this, let’s
explore some real-world examples.</p>
<p><strong>Mud-slinging.</strong> Consider the practice of mud-slinging.
Competing political parties often use negative campaign tactics,
producing significant reputational costs. By running negative ads to
attack and undermine the public image of their opponents, all parties
end up with tarnished reputations. If we assume that politicians value
their reputation in an absolute sense, not merely in relation to their
contemporary competitors, then mud-slinging is undesirable for all. A
Pareto improvement to this situation would be switching to the outcome
where they all cooperate. With no one engaging in mud-slinging, all the
parties would have better reputations. The reason this does not happen
is that mud-slinging is the dominant strategy. If a party’s opponent
<em>doesn’t</em> use negative ads, the party will boost their reputation
relative to their opponent’s by using them. If their opponent
<em>does</em> use negative ads, the party will reduce the difference
between their reputations by using them too. Thus, both parties converge
on the Nash equilibrium of mutual mud-slinging, at avoidable detriment
to all.</p>
<p><strong>Shopkeeper price cuts.</strong> Another example is price
racing dynamics between different goods providers. Consider two rival
shopkeepers selling similar produce at similar prices. They are
competing for local customers. Each shopkeeper calculates that lowering
their prices below that of their rival will attract more customers away
from the other shop and result in a higher total profit for themselves.
If their competitor drops their prices and they do not, then the
competitor will gain extra customers, leaving the first shopkeeper with
almost none. Thus, “dropping prices” is the dominant strategy for both.
This leads to a Nash equilibrium in which both shops have low prices,
but the local custom is divided much the same as it would be if they had
both kept their prices high. If they were both to raise their prices,
they would both benefit by increasing their profits: this would be a
Pareto improvement. Note that, just as how the interests of the police
do not count in the Prisoner’s Dilemma, we are only considering the
interests of the shopkeepers in this example. We are ignoring the
interests of the customers and wider society.</p>
<p><strong>Arms races.</strong> Nations’ expenditure on military arms
development is another example. It would be better for all these
nations’ governments if they were all simultaneously to reduce their
military budgets. No nation would become more vulnerable if they were
all to do this, and each could then redirect these resources to areas
such as education and healthcare. Instead, we have widespread military
arms races. We might prefer for all the nations to turn some military
spending to their other budgets, but for any one nation to do so would
be irrational. Here, the dominant strategy for each nation is to opt for
high military expenditure. So we achieve a Nash equilibrium in which all
nations must decrease spending in other valuable sectors. It would be
more Pareto efficient for all to have lower military spending, freeing
money and resources for different domains. We will consider races in the
context of AI development in the following section.</p>
<h3 id="promoting-cooperation">Promoting Cooperation</h3>
<p>So far we have focused on the sources of undesirable multi-agent
dynamics in games like the Prisoner’s Dilemma. Here, we turn to the
mechanisms by which we can promote cooperation over defection.</p>
<p><strong>Reasons to cooperate.</strong> There are many reasons why
real-world agents might cooperate in situations which resemble the
Prisoner’s Dilemma <span class="citation"
data-cites="parfit1984reasons">[2]</span>, as shown in Figure 7.4. These can broadly be categorized
by whether the agents have a choice, or whether defection is impossible.
If the agents do have a choice, we can further divide the possibilities
into those where they act in their own self-interest, and those where
they do not (altruism). Finally, we can differentiate two reasons why self-interested agents may choose to cooperate: a tendency toward this, such as a conscience or guilt, and future reward/punishment. We will explore
two possibilities in this section — payoff changes and altruistic
dispositions — and then “future reward/punishment” in the next section.
Note that we effectively discuss “Defection is impossible” in the Single Agent Safety
chapter, and “AI consciences” in the Beneficial AI and Machine Ethics chapter.<p>
</p>
<figure id="fig:cooperate">
<img
src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/why_cooperate.png" class="tb-img-full"/>
<p class="tb-caption">Figure 7.4: Four possible reasons why agents may cooperate in prisoner’s Dilemma-like scenarios.
This section explores two: changes to the payoff matrix and increased agent altruism. <span class="citation" data-cites="parfit1984reasons">[2]</span></p>
</figure>
<p><em>Four possible reasons why agents may cooperate in prisoner’s
Dilemma-like scenarios. As highlighted, this section explores only two: changes to the payoff matrix and increased agent altruism. </em></p>
<p><strong>External consideration: changing the payoffs to incentivize
cooperation.</strong> By adjusting the values in the payoff matrix, we
may more easily steer agents away from undesirable equilibria. As shown
in Table 7.2, incentive structures are important.
A Prisoner’s Dilemma-like scenario may arise wherever an individual
agent will do better to defect whether their partner cooperates (<span
class="math inline"><em>c</em> > <em>a</em></span>) or defects (<span
class="math inline"><em>d</em> > <em>b</em></span>). Avoiding this
situation requires altering these constants where they underlie critical
social interactions in the real world: changing the costs and benefits
associated with different activities so as to encourage cooperative
behavior.<p>
</p>
<div id="tab:abstract">
<table class="prisonTable">
<thead>
<tr class="header">
<th style="text-align: center;"></th>
<th style="text-align: center;">Agent B
cooperates</th>
<th style="text-align: center;">Agent B</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">Agent A
cooperates</td>
<td style="text-align: center;">a, a</td>
<td style="text-align: center;">b, c</td>
</tr>
<tr class="even">
<td style="text-align: center;">Agent A
defects</td>
<td style="text-align: center;">c, b</td>
<td style="text-align: center;">d, d</td>
</tr>
</tbody>
<caption>Table 7.2: if <span
class="math inline"><em>c > a</em></span> and <span class="math inline"><em>d > b</em></span>,
the highest payoff for either agent is to defect, regardless of what their opponent does:
Defection is the dominant strategy. Fostering cooperation requires avoiding this structure.
</caption>
</table>
</div>
<br>
<p><em>Shown is the payoff matrix for the Prisoner’s Dilemma, in the
abstract. Notice that if <span
class="math inline"><em>c</em> > <em>a</em></span> and <span
class="math inline"><em>d</em> > <em>b</em></span>, the highest
payoff for either agent is to defect, regardless of what their opponent
does: Defection is the dominant strategy. Therefore, fostering
cooperation requires that we avoid structuring incentives such that
<span class="math inline"><em>c</em> > <em>a</em></span> and <span
class="math inline"><em>d</em> > <em>b</em></span>.</em><p>
There are two ways to reduce the expected value of defection: lower the
<em>probability</em> of defection success or lower the <em>benefit</em>
of a successful defection. Consider a strategy commonly used by
organized crime groups: threatening members with extreme punishment if
they ‘snitch’ to the police. In the Prisoner’s Dilemma game, we can
model this by adding a punishment equivalent to three years of jail time
for “snitching,” leading to the altered payoff matrix as shown in Figure
7.5. The Pareto efficient outcome
(-1,-1) is now also a Nash Equilibrium because snitching when the other
player cooperates is worse than mutually cooperating (<span
class="math inline"><em>c</em> < <em>a</em></span>).<p>
</p>
<figure id="fig:snitches">
<img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/Alice-payoff-with-graphs.png" class="tb-img-half" style="width: 100%"/>
</figure>
<figure>
<img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/Alice-bob-payoff-graphs2.png" class="tb-img-half" style="width: 100%"/>
<p class="tb-caption">Figure 7.5: Altering the payoff matrix to punish snitches, we can move from a Prisoner’s Dilemma
(left) to a Stag Hunt (right), in which there is an additional Nash equilibrium. </p>
<!--<figcaption>Altering the payoff matrix to “punish snitches”.-->
<!--</figcaption>-->
</figure>
<p><em>A) The Prisoner’s Dilemma payoff matrix, with the single Nash
equilibrium highlighted. B) If we add a punishment of three years jail
time for being a “snitch,” the outcome (-1,-1) becomes a second Nash
Equilibrium. Note that this is known as the “Stag Hunt” in game
theory.</em><p>
<br><br>
<p><strong>Internal consideration: making agents more altruistic to promote
cooperation.</strong> A second potential mechanism to foster cooperation is to
make agents more altruistic. If each agent also values the outcome for
their partner, this effectively changes the payoff matrix. Now, the
length of their partner’s jail sentence matters to each of them. In the
Prisoner’s Dilemma payoff matrix, the <em>both cooperate</em> outcome
earns the lowest total jail time, so agents who valued their partners’
payoffs equally to their own would converge on cooperation.</p>
<p><strong>Parallels to AI safety.</strong> One possible example of such
a strategy would be to target the values held by AI companies
themselves. Improving corporate regulation effectively changes the
company’s expected payoffs from pursuing risky strategies. If
successful, it could encourage the company building AI systems to behave
in a less purely self-interested fashion. Rather than caring solely
about maximizing their shareholder’s financial interests, AI companies
might cooperate more with each other to steer away from Pareto
inefficient outcomes, and avoid corporate AI races. We explore this in
more detail in section 1.3</em>
below.</p>
<h3 id="summary">Summary</h3>
<p><strong>Cooperation is not always rational, so intelligence alone may
not ensure good outcomes.</strong> We have seen that rational and
self-interested agents may not interact in such a way as to achieve good
results, even for themselves. Under certain conditions, such as in the
Prisoner’s Dilemma, they will converge on a Nash equilibrium of both
defecting. Both agents would be better off if they both cooperated.
However, it is hard to secure this Pareto improvement because
cooperation is not rational when defection is the dominant strategy.</p>
<p><strong>Conflict with or between future AI agents may be extremely
harmful.</strong> One source of concern regarding future AI systems is
inter-agent conflict eroding the value of the future. Rational AI agents
faced with a Prisoner’s Dilemma-type scenario might end up in stable
equilibrium states that are far from optimal, perhaps for all the
parties involved. Possible avenues to reduce these risks include
restructuring the payoff matrices for the interactions in which these
agents may be engaged or altering the agents’ dispositions.<p>
</p>
<h2 id="the-iterated-prisoners-dilemma">7.2.4 The Iterated Prisoner’s
Dilemma</h2>
<p>In our discussion of the Prisoner’s Dilemma, we saw how rational
agents may converge to equilibrium states that are bad for all involved.
In the real world, however, agents rarely interact with one another only
once. Our aim in this section is to understand how cooperative behavior
can be promoted and maintained as multiple agents (both human and AI)
interact with each other over time, when they expect repeated future
interactions. We handle some common misconceptions in this section, such
as the idea that simply getting agents to interact repeatedly is
sufficient to foster cooperation, because “nice” and “forgiving”
strategies always win out. As we shall see, things are not so simple. We
explore how iterated interactions can lead to progressively worse
outcomes for all.<p>
In the real world, we can observe this in “AI races”, where businesses
cut corners on safety due to competitive pressures, and militaries adopt
and deploy potentially unsafe AI technologies, making the world less
safe. These AI races could produce catastrophic consequences, including
more frequent or destructive wars, economic enfeeblement, and the
potential for catastrophic accidents from malfunctioning or misused AI
weapons.</p>
<h3 id="introduction">Introduction</h3>
<p>Agents who engage with one another many times do not always coexist
harmoniously. Iterating interactions is not sufficient to ensure
cooperation. To see why, we explore what happens when rational,
self-interested agents play the Prisoner’ Dilemma game against each
other repeatedly. In a single-round Prisoner’s Dilemma, defection is
always the rational move. But understanding the success of different
strategies is more complicated when agents play multiple rounds.</p>
<p><strong>In the Iterated Prisoner’s Dilemma, agents play
repeatedly.</strong> The dominant strategy for a rational agent in a
one-off interaction such as the Prisoner’s Dilemma is to defect. The
seeming paradox is that both agents would prefer the cooperate-cooperate
outcome to the defect-defect one. An agent cannot influence their
partner’s actions in a one-off interaction, but in an iterated scenario,
one agent’s behavior in one round may influence how their partner
responds in the next. We call this the <em>Iterated Prisoner’s
Dilemma</em>; see Figure 7.6. This
provides an opportunity for the agents to cooperate with each other.</p>
<p><strong>Iterating the Prisoner’s Dilemma opens the door to rational
cooperation.</strong> In an Iterated Prisoner’s Dilemma, both agents can
achieve higher payoffs by fostering a cooperative relationship with each
other than they would if both were to defect every round. There are two basic mechanisms by which iteration can promote
cooperative behavior: punishing defection and rewarding cooperation. To
see why, let us follow an example game of the Iterated Prisoner’s
Dilemma in sequence.</p>
<p><strong>Punishment.</strong> Recall Alice and Bob from the previous
section, the two would-be thieves caught by the police. Alice decides to
defect in the first round of the Prisoner’s Dilemma, while Bob opts to
cooperate. This achieves a good outcome for Alice, and a poor one for
Bob, who punishes this behavior by choosing to defect himself in the
second round. What makes this a punishment is that Alice’s score will
now be lower than it would be if Bob had opted to cooperate instead,
whether Alice chooses to cooperate or defect.</p>
<p><strong>Reward.</strong> Alice, having been punished, decides to
cooperate in the third round. Bob rewards this action by cooperating in
turn in the fourth. What makes this a reward is that Alice’s score will
now be higher than if Bob had instead opted to defect, whether Alice
chooses to cooperate or defect. Thus, the expectation that their
defection will be punished and their cooperation rewarded incentivizes
both agents to cooperate with each other.<p>
</p>
<figure id="fig:iterated">
<img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/Tit-for-tat.png" class="tb-img-full"/>
<p class="tb-caption">Figure 7.6: Across six rounds, both players gain better payoffs if they consistently cooperate. But
defecting creates short-term gains.</p>
<!--<figcaption>If the agents cooperate more, they can both gain better-->
<!--payoffs</figcaption>-->
</figure>
<p>
<em>In Figure 7.6, each panel shows a six-round
Iterated Prisoner’s Dilemma, with purple squares for defection and blue
for cooperation. On the left is <em>Tit-for-tat</em>: An agent using
this strategy tends to score the same as or worse than its partners in
each match. On the right, <em>always defect</em> tends to score the same
as or better than its partner in each match. The average payoff attained
by using either strategy are shown at the bottom: <em>Tit-for-tat</em>
attains a better payoff (lower jail sentence) on average—and so is more
successful in a tournament—than <em>always defect</em>.</em></p>
<br><br>
<p><strong>Defection is still the dominant strategy if agents know how
many times they will interact.</strong> If the agents know when they are
about to play the Prisoner’s Dilemma with each other for the final time,
both will choose to defect in that final round. This is because their
defection is no longer punishable by their partner. If Alice defects in
the last round of the Iterated Prisoner’s Dilemma, Bob cannot punish her
by retaliating, as there are no future rounds in which to do so. The
same is of course true for Bob. Thus, <em>defection is the dominant
strategy for each agent in the final round</em>, just as it is in the
single-round version of the dilemma.<p>
Moreover, if each agent expects their partner to defect in the final
round, <em>then there is no incentive for them to cooperate in the
penultimate round either</em>. This is for the same reason: Defecting in
the penultimate round will not influence their partner’s behavior in the
final round. Whatever an agent decides to do, they expect that their
partner will choose to defect next round, so they might as well defect
now. We can extend this argument by reasoning backwards through all the
iterations. In each round, the certainty that their partner will defect
in the next round regardless of their own behavior in the current round
incentivizes each agent to defect. The reward for cooperation and
punishment of defection have been removed. Ultimately, this removal
pushes the agents to defect in every round of the Iterated Prisoner’s
Dilemma.</p>
<p><strong>Uncertainty about future engagement enables rational
cooperation.</strong> In the real world, an agent can rarely be sure
that they will never again engage with a given partner. Wherever there
is sufficient uncertainty about the future of their relationship,
rational agents may be more cooperative. This is for the simple reason
that uncooperative behavior may yield less valuable outcomes in the long
term, because others may retaliate in kind in the future. This tells us
that AIs interacting with each other repeatedly may cooperate, but only
if they are sufficiently uncertain about whether their interactions are
about to end.<p>
Other forms of uncertainty can also create opportunities for rational
cooperation, such as uncertainty about what strategies others will use.
These are most important where the Iterated Prisoner’s Dilemma involves
a population of more than two agents, in which each agent interacts
sequentially with multiple partners. We turn to examining the dynamics
of these more complicated games next.</p>
<h3 id="sec:tournaments">Tournaments</h3>
<p>So far, we have considered the Iterated Prisoner’s Dilemma between
only two agents: each plays repeatedly against a single partner.
However, in the real world, we expect AIs will engage with multiple
other agents. In this section, we consider interactions of this kind,
where each agent not only interacts with their partner repeatedly, but
also switches partners over time. Understanding the success of a
strategy is more complicated in repeated rounds against many partners.
Note that in this section, we define a “match” to mean repeated rounds
of the Prisoner’s Dilemma between the same two agents; see Figure 7.6. We define a “tournament” to mean a
population of more than two agents engaged in a set of pairwise
matches.</p>
<p><strong>In Iterated Prisoner Dilemma tournaments, each agent
interacts with multiple partners.</strong> In the 1970s, the political
scientist Robert Axelrod held a series of tournaments to pit different
agents against one another in the Iterated Prisoner’s Dilemma. The
tournament winner was whichever agent had the highest total payoff after
completing all matches. Each agent in an Iterated Prisoner’s Dilemma
tournament plays multiple rounds against multiple partners. These agents
employed a range of different strategies. For example, an agent using
the strategy named <em>random</em> would randomly determine whether to
cooperate or defect in each round, entirely independently of previous
interactions with a given partner. By contrast, an agent using the
<em>grudger</em> strategy would start out cooperating, but switch to
defecting for all future interactions if its partner defected even once.
See Table 7.3 for examples of these
strategies.<p>
</p>
<br>
<div id="tab:strategies">
<table class="prisonTable">
<caption>Table 7.3: Popular strategies’ descriptions.</caption>
<thead>
<tr class="header">
<th style="text-align: left;"><strong>Strategy</strong></th>
<th style="text-align: left;">Characteristics</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;"><em>Random</em></td>
<td style="text-align: left;">Randomly defect or cooperate, regardless
of your partner’s strategy</td>
</tr>
<tr class="even">
<td style="text-align: left;"><em>Always defect</em></td>
<td style="text-align: left;">Always choose to defect, regardless of
your partner’s strategy</td>
</tr>
<tr class="odd">
<td style="text-align: left;"><em>Always cooperate</em></td>
<td style="text-align: left;">Always choose to defect, regardless of
your partner’s strategy</td>
</tr>
<tr class="even">
<td style="text-align: left;"><em>Grudger</em></td>
<td style="text-align: left;">Start by cooperating, but if your partner
defects, defect in every subsequent round, regardless of your partner’s
subsequent behavior</td>
</tr>
<tr class="odd">
<td style="text-align: left;"><em>Tit-for-tat</em></td>
<td style="text-align: left;">Start cooperating; then always do whatever
your partner did last</td>
</tr>
<tr class="even">
<td style="text-align: left;"><em>Generous tit-for-tat</em></td>
<td style="text-align: left;">Same as <em>tit-for-tat</em>, but
occasionally cooperate in response to your partner’s defection</td>
</tr>
</tbody>
</table>
</div>
<br>
<p><strong>The strategy “<em>Tit-for-tat</em>” frequently won Axelrod’s
tournaments <span class="citation"
data-cites="axelrod1980effective">[3]</span>.</strong> The most famous
strategy used in Axelrod’s tournaments was <em>Tit-for-tat</em>. This
was the strategy of starting by cooperating, then repeating the
partner’s most recent move: if they cooperated, <em>Tit-for-tat</em>
cooperated too; if they defected, <em>Tit-for-tat</em> did likewise.
Despite its simplicity, this strategy was extremely successful, and very
frequently won tournaments. An agent playing <em>Tit-for-tat</em>
exemplified the two mechanisms for promoting cooperation, rewarding
cooperation, yet also punishing defection. Importantly,
<em>Tit-for-tat</em> did not hold a grudge—it forgave each defection
after it retaliated by defecting in return, only once. This process of
one defection for one defection is captured in the famous idiom “an eye
for an eye.” The <em>Tit-for-tat</em> strategy became emblematic as
being one way to escape the muck of defection.</p>
<p><strong>The success of <em>Tit-for-tat</em> is
counterintuitive.</strong> In any given match, an agent playing
<em>Tit-for-tat</em> will tend to score slightly worse than or the same
as their partner; see Figure 7.6. By
contrast, an agent who employs an uncooperative strategy such as
<em>always defect</em> usually scores the same as or better than its
partner. In a match between a cooperative
agent and an uncooperative one, the uncooperative agent tends to end up
with the better score.<p>
However, it is an agent’s <em>average</em> score which dictates its
success in a tournament, not its score in any particular match or with
any particular partner. Two uncooperative partners will score worse on
average than cooperative ones. Thus, the success of cooperative
strategies such as in Figure 7.6 depends on the population strategy
composition (the assortment of strategies used by the agents in the
population). If there are enough cooperative partners, cooperative
agents may be more successful than uncooperative ones.
<h3 id="sec:AI-races">AI Races</h3>
<p>Iterated interactions can generate “AI races.” We discuss two kinds
of races concerning AI development: corporate AI races and military AI
arms races. Both kinds center around competing parties participating in
races for individual, short-term gains at a collective, long-term
detriment. Where individual incentives clash with collective interests,
the outcome can be bad for all. As we discuss here, in the context of AI
races, these outcomes could even be catastrophic.</p>
<p><strong>AI races are the result of intense competitive
pressures.</strong> During the Cold War, the US and the Soviet Union
were involved in a costly nuclear arms race. The effects of their
competition persist today, leaving the world in a state of heightened
nuclear threat. Competitive races of this kind entail repeated
back-and-forth actions that can result in progressively worse outcomes
for all involved. We can liken this example to the Iterated Prisoner’s
Dilemma, where the nations must decide whether to increase (defect) or
decrease (cooperate) their nuclear spending. Both the US and the Soviet
Union often chose to increase spending. They would have created a safer
and less expensive world for both nations (as well as others) if they
had cooperated to reduce their nuclear stockpiles. We discuss this in
more detail in International Governance.</p>
<p><strong>Two kinds of AI races: corporate and military <span
class="citation" data-cites="hendrycks2023overview">[4]</span>.</strong>
Competition between different parties—nations or corporations—is
incentivizing each to develop, deploy, and adopt AIs rapidly, at the
expense of other values and safety precautions. Corporate AI races
consist of businesses prioritizing their own survival or power expansion
over ensuring that AIs are developed and released safely. Military AI
arms races consist of nations building and adopting powerful and
dangerous military applications of AI technologies to gain military
power, increasing the risks of more frequent or damaging wars, misuse,
or catastrophic accidents. We can understand these two kinds of AI races
using two game-theoretic models of iterated interactions. First, we use
the <em>Attrition</em> model to understand why AI corporations are
cutting corners on safety. Second, we’ll use the <em>Security
Dilemma</em> model to understand why militaries are escalating the use
of—and reliance on—AI in warfare.</p>
<h3 id="corporate-ai-races">Corporate AI Races</h3>
<p>Competition between AI research companies is promoting the creation
and use of more appealing and profitable systems, often at the cost of
safety measures. Consider the public release of large language
model-based chatbots. Some AI companies delayed releasing their chatbots
out of safety concerns, like avoiding the generation of harmful
misinformation. We can view the companies that released their chatbots
first as having switched from cooperating to defecting in an Iterated
Prisoner’s Dilemma. The defectors gained public attention and secured
future investment. This competitive pressure caused other companies to
rush their AI products to market, compromising safety measures in the
process.<p>
Corporate AI races arise because competitors sacrifice their values to
gain an advantage, even if this harms others. As the race heats up,
corporations might increasingly need to prioritize profits by cutting
corners on safety, in order to survive in a world where their
competitors are very likely to do the same. The worst outcome for an
agent in the Prisoner’s Dilemma is the one where only they cooperated
while their partner defected. Competitive pressures motivate AI
companies to avoid this outcome, even at the cost of exacerbating
large-scale risks.<p>
Ultimately, corporate AI races could produce societal-scale harms, such
as mass unemployment and dangerous dependence on AI systems. We consider
one such example in <em></em>. This risk is particularly vivid for
emerging industries like AI which lack the better-established safeguards
such as mature regulation and widespread awareness of the harm that
unsafe products can cause found in other industries like
pharmaceuticals.</p>
<p><strong>Attrition model: a multi-player game of “Chicken.”</strong>
We can model this corporate AI race using an “Attrition” model <span
class="citation" data-cites="smith1974theory">[5]</span>, which frames
the race as a kind of auction in which competitors bid against one
another for a valuable prize. Rather than bidding money, the competitors
bid for the risk level they are willing to tolerate. This is similar to
the game “Chicken,” in which two competitors drive headlong at each
other. Assuming one swerves out of the way, the winner is the one who
does not (demonstrating that they can tolerate a higher level of risk
than the loser). Similarly, in the Attrition model, each competitor bids
the level of risk—the probability of bringing about a catastrophic
outcome—they are willing to tolerate. Whichever competitor is willing to
tolerate the most risk will win the entire prize, as long as the
catastrophe they are risking does not actually happen. We can consider
this to be an “all pay” auction: both competitors must pay what they
bid, whether they win or not. This is because all of those involved must
bear the risk they are leveraging, and once they have made their bid
they cannot retract it.</p>
<p><strong>The Attrition model shows why AI corporations may cut corners
on safety.</strong> Let us assume that there are only two competitors
and that both of them have the same understanding of the state of their
competition. In this case, the Attrition model predicts that they will
race each other up to a loss of one-third in expected value <span
class="citation" data-cites="nisan2007algorithmic">[6]</span>. If the
value of the prize to one competitor is “X”, they will be willing to
risk a 33% chance of bringing about an outcome equally disvaluable (of
value “-X”) in order to win the race <span class="citation"
data-cites="dafoe2022governance">[7]</span>.<p>
As we have discussed previously, market pressures may motivate
corporations to behave as though they value what they are competing for
almost as highly as survival itself. According to this toy model, we
might then expect AI stakeholders engaged in a corporate race to risk a
33% chance of existential catastrophe in order to “win the prize” of
their continued existence. With multiple AI races, long time horizons,
and ever-increasing risks, the repeated erosion of safety assurances
down to only 66% generates a vast potential for catastrophe.</p>
<p><strong>Real-world actors may mistakenly erode safety precautions
even further.</strong> Moreover, real-world AI races could produce even
worse outcomes than the one predicted by the Attrition model <span
class="citation" data-cites="dafoe2022governance">[7]</span>. One reason
for this is that competing corporations may not have a correct
understanding of the state of the race. Precisely predicting these kinds
of risks can be extremely challenging: high-risk situations are
inherently difficult to predict accurately, even in fields far more
well-understood than AI. Incorrect risk calibration could cause the
competitors to take actions that accidentally exceed even the 33% risk
level. Like newcomers to an ’all pay’ auction who often overbid, uneven
comprehension or misinformation could motivate the competitors to take
even greater risks of bringing about catastrophic outcomes. In fact, we
might even expect selection for competitors who tend to underestimate
the risks of these races. All these factors may further erode safety
assurances.</p>
<h3 id="military-ai-arms-races">Military AI Arms Races</h3>
<p>Global interest in military applications for AI technologies is
increasing. Some hail this as the “third revolution in warfare” <span
class="citation" data-cites="lee2021visions">[8]</span>, predicting
impact at the scale of the historical development of gunpowder and
nuclear weapons. There are many causes for concern about the adoption of
AI technologies in military contexts. These include increased rates of
weapon development, lethal autonomous weapons usage, advanced
cyberattack execution, and automation of decision-making. These could
in turn produce more frequent and destructive wars, acts of terrorism,
and catastrophic accidents. Perhaps even more important than the
immediate dangers from military deployment of AI is the possibility that
nations will continue to race each other along a path towards ever
increased risks of catastrophe. In this section, we explore this
possibility using another game theoretic model.<p>
First, let us consider a few different sources of risk from military AI
<span class="citation"
data-cites="hendrycks2023overview">[4]</span>:</p>
<ol>
<li><p><strong>AI-developed weapons.</strong> AI technologies could be
used to engineer weapons. Military research and development offers many
opportunities for acceleration using AI tools. For instance, AI could be
used to expedite processes in dual-use biological and chemical research,
furthering the development of programs to build weapons of mass
destruction.</p></li>
<li><p><strong>AI-controlled weapons.</strong> AI might also be used to
control weapons directly. “Lethal autonomous weapons” have been in use
since March 2020, when a self-directing and armed drone “hunted down”
soldiers in Libya without human supervision. Autonomous weapons may be
faster or more reliable than human soldiers for certain tasks, as well
as being far more expendable. Autonomous weapons systems thus
effectively motivate militaries to reduce human oversight. In a context
as morally salient as warfare, the ethical implications of this could be
severe. Increasing AI weapon development may also impact international
warfare dynamics. The ability to deploy lethal autonomous weapons in
place of human soldiers could drastically lower the threshold for
nations to engage in war, by reducing the expected body count—of the
nation’s own citizens, at least. These altered warfare dynamics could
usher in a future with more frequent and destructive wars than has yet
been seen in human history.</p></li>
<li><p><strong>AI cyberwarfare.</strong> Another military application is
the use of AI in cyberwarfare. AI systems might be used to defend
against cyberattacks. However, we do not yet know whether this will
outweigh the offensive potential of AI in this context. Cyberattacks can
be used to wreak enormous harm, such as by damaging crucial systems and
infrastructure to disrupt supply chains. AIs could make cyberattacks
more effective in a number of ways, motivating more frequent attempts
and more destructive successes. For example, AIs could directly aid in
writing or improving offensive programs. They could also execute
cyberattacks at superhuman scales by implementing vast numbers of
offensive programs simultaneously. By democratizing the power to execute
large-scale cyberattacks, AIs would also increase the difficulty of
verification. With many more actors capable of carrying out attacks at
such scales, attributing attacks to perpetrators would be much more
challenging.</p></li>
<li><p><strong>Automated executive decision-making.</strong> Executive
control might be delegated to AIs at higher levels of military
procedures. The development of AIs with superhuman strategic
capabilities may incentivize nations to adopt these systems and
increasingly automate military processes. One example of this is
“automated retaliation.” AI systems that are granted the ability to
respond to offensive threats they identify with counterattacks, without
human supervision. Examples of this include the NSA cyber defense
program known as “MonsterMind.” When this program identified an
attempted cyberattack, it interrupted it and prevented its execution.
However, it would then launch an offensive cyberattack of its own in
return. It could take this retaliatory action without consulting human
supervisors. More powerful AI systems, more destructive weapons, and
greater automation or delegation of military control to AI systems,
would all deplete our ability to intervene.</p></li>
<li><p><strong>Catastrophic accidents.</strong> Lethal Autonomous
Weapons and automated decision-making systems both carry risks of
resulting in catastrophic accidents. If a nation were to lose control of
powerful military AI technologies, the outcome could be calamitous.
Outsourcing executive command of military procedures to AI — such as by
automating retaliatory action — would put powerful arsenals on