forked from RussTedrake/underactuated
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathrobust.html
481 lines (374 loc) · 22 KB
/
robust.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
<!DOCTYPE html>
<html>
<head>
<title>Ch. 14 - Robust and
Stochastic Control</title>
<meta name="Ch. 14 - Robust and
Stochastic Control" content="text/html; charset=utf-8;" />
<link rel="canonical" href="http://underactuated.mit.edu/robust.html" />
<script src="https://hypothes.is/embed.js" async></script>
<script type="text/javascript" src="chapters.js"></script>
<script type="text/javascript" src="htmlbook/book.js"></script>
<script src="htmlbook/mathjax-config.js" defer></script>
<script type="text/javascript" id="MathJax-script" defer
src="htmlbook/MathJax/es5/tex-chtml.js">
</script>
<script>window.MathJax || document.write('<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml.js" defer><\/script>')</script>
<link rel="stylesheet" href="htmlbook/highlight/styles/default.css">
<script src="htmlbook/highlight/highlight.pack.js"></script> <!-- http://highlightjs.readthedocs.io/en/latest/css-classes-reference.html#language-names-and-aliases -->
<script>hljs.initHighlightingOnLoad();</script>
<link rel="stylesheet" type="text/css" href="htmlbook/book.css" />
</head>
<body onload="loadChapter('underactuated');">
<div data-type="titlepage">
<header>
<h1><a href="index.html" style="text-decoration:none;">Underactuated Robotics</a></h1>
<p data-type="subtitle">Algorithms for Walking, Running, Swimming, Flying, and Manipulation</p>
<p style="font-size: 18px;"><a href="http://people.csail.mit.edu/russt/">Russ Tedrake</a></p>
<p style="font-size: 14px; text-align: right;">
© Russ Tedrake, 2022<br/>
Last modified <span id="last_modified"></span>.</br>
<script>
var d = new Date(document.lastModified);
document.getElementById("last_modified").innerHTML = d.getFullYear() + "-" + (d.getMonth()+1) + "-" + d.getDate();</script>
<a href="misc.html">How to cite these notes, use annotations, and give feedback.</a><br/>
</p>
</header>
</div>
<p><b>Note:</b> These are working notes used for <a
href="http://underactuated.csail.mit.edu/Spring2022/">a course being taught
at MIT</a>. They will be updated throughout the Spring 2022 semester. <a
href="https://www.youtube.com/channel/UChfUOAhz7ynELF-s_1LPpWg">Lecture videos are available on YouTube</a>.</p>
<table style="width:100%;"><tr style="width:100%">
<td style="width:33%;text-align:left;"><a class="previous_chapter" href=feedback_motion_planning.html>Previous Chapter</a></td>
<td style="width:33%;text-align:center;"><a href=index.html>Table of contents</a></td>
<td style="width:33%;text-align:right;"><a class="next_chapter" href=output_feedback.html>Next Chapter</a></td>
</tr></table>
<script type="text/javascript">document.write(notebook_header('robust'))
</script>
<!-- EVERYTHING ABOVE THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->
<chapter style="counter-reset: chapter 13"><h1>Robust and
Stochastic Control</h1>
<p>So far in the notes, we have concerned ourselves with only known,
deterministic systems. In this chapter we will begin to consider uncertainty.
This uncertainy can come in many forms... we may not know the governing
equations (e.g. the coefficient of friction in the joints), our robot may be
<a href="https://youtu.be/cNZPRsrwumQ?t=32">walking on unknown terrain,
subject to unknown disturbances</a>, or even be picking up unknown objects.
There are a number of mathematical frameworks for considering this
uncertainty; for our purposes this chapter will generalizing our thinking to
equations of the form: $$\dot\bx = {\bf f}(\bx, \bu, \bw, t) \qquad \text{or}
\qquad \bx[n+1] = {\bf f}(\bx[n], \bu[n], \bw[n], n),$$ where $\bw$ is a new
<i>random</i> input signal to the equations capturing all of this potential
variability. Although it is certainly possible to work in continuous time, and
treat $\bw(t)$ as a continuous-time random signal (c.f. <a
href="https://en.wikipedia.org/wiki/Wiener_process">Wiener process</a>), it is
notationally simpler to work with $\bw[n]$ as a discrete-time random signal.
For this reason, we'll devote our attention in this chapter to the
discrete-time systems.</p>
<p>In order to simulate equations of this form, or to design controllers
against them, we need to define the random process that generates $\bw[n]$. It
is typical to assume the values $\bw[n]$ are independent and identically
distributed (i.i.d.), meaning that $\bw[i]$ and $\bw[j]$ are uncorrelated when
$i \neq j$. As a result, we typically define our distribution via a
probability density $p_{\bf w}(\bw[n])$. This is not as limiting as it may
sound... if we wish to treat temporally-correlated noise (e.g. "colored
noise") the format of our equations is rich enough to support this by adding
additional state variables; we'll give an example below of a "whitening
filter" for modeling wind gusts. The other source of randomness that we will
now consider in the equations is randomness in the initial conditions; we will
similarly define a probabilty density $p_\bx(\bx[0]).$</p>
<p>This modeling framework is rich enough for us to convey the key ideas; but
it is not quite sufficient for all of the systems I am interested in. In
<drake></drake> we go to additional lengths to support more general cases of
<a
href="https://drake.mit.edu/doxygen_cxx/group__stochastic__systems.html">stochastic
systems</a>. This includes modeling system parameters that are drawn from
random each time the model is initialized, but are fixed over the duration of
a simulation; it is possible but inefficient to model these as additional
state variables that have no dynamics. In other problems, even the
<i>dimension of the state vector</i> may change in different realizations of
the problem! Consider, for instance, the case of a robot manipulating random
numbers of dishes in a sink. I do not know many control formulations that
handle this type of randomness well, and I consider this a top priority to
think more about! (We'll begin to address it in the <a href="output_feedback.html">output feedback chapter</a>.)</p>
<p>Roughly speaking, I will refer to "stochastic control" as the discipline of
synthesizing controllers that govern the probabilistic evolution of the
equations. "Stochastic optimal control" defines a cost function (now a random
variable), and tries to find controllers that optimize some metric such as the
expected cost. When we use the terms "robust control", we are typically
referring to a class of techniques that try to guarantee a worst-case
performance or a worst-case bound on the effect of randomness on the input on
the randomness on the output. Interestingly, for many robust control
formulations we do not attempt to know the precise probability distribution of
$\bx[0]$ and $\bw[n]$, but instead only define the <i>sets</i>
of possible values that they can take. This modeling is powerful, but can
lead to conservative controllers and pessimistic estimates of performance.</p>
<p>My goal of presenting a relatively consumable survey of a few of the main
ideas is perhaps more important in this chapter than any other. It's been
said that "robust control is encrypted" (as in you need to know the secret
code to get in). The culture in the robust control community has been to
leverage high-powered mathematics, sometimes at the cost of offering more
simple explanations. This is unfortunate, I think, because robotics and
machine learning would benefit from a richer connection to these tools, and
are perhaps destined to reinvent many of them.</p>
<p>The classic reference for robust control is <elib>Zhou97</elib>.
<elib>Petersen00</elib> has a treatment that does more of it's development in
the time-domain and via Riccati equations.</p>
<section><h1>Finite Markov Decision Processes</h1>
<p>We already had quick preview into stochastic optimal control in one of the cases where it is particularly easy: <a href="dp.html#mdp">finite Markov Decision Processes (MDPs)</a>.</p>
<todo>Perhaps a example of something other than expected-value cost (worst case, e.g. Katie's metastability?)</todo>
</section>
<section><h1>Linear optimal control</h1>
<subsection><h1>Analysis</h1>
<subsubsection><h1>Common Lyapunov functions</h1>
<p>We've already seen one nice example of robustness analysis for
linear systems when we wrote a small optimization to find a <a
href="lyapunov.html#common-lyapunov-linear">common Lyapunov function
for uncertain linear systems</a>. That example studied the dynamics
$\bx[n+1] = \bA \bx[n]$ where the coefficients of $\bA$ were unknown
but bounded.</p>
<p><a href="lyapunov.html#common-lyapunov">We also saw</a> that
essentially the same technique can be used to certify stability against
disturbances, e.g.: $$\bx[n+1] = \bA\bx[n] + \bw[n], \qquad \bw[n] \in
\mathcal{W},$$ where $\mathcal{W}$ describes some bounded set of
possible uncertainties. In order to be compatible with convex
optimization, we often choose to describe $\mathcal{W}$ as either an
ellipsoid or as a convex polytope<elib>Boyd04a</elib>. But let's
remember a very important point: in this situation with additive
disturbances, we can no-longer expect the system to converge stably to
the origin. Our example before used the common Lyapunov function only
to certify the <i>invariance</i> of a region (if I start inside some region, then I'll never leave).</p>
</subsubsection>
<subsubsection><h1>$L_2$ gain</h1>
<p>In some sense, the common-Lyapunov analysis above is probably the
wrong analysis for linear systems (perhaps other systems as well). It
might be unreasonable to assume that disturbances are bounded.
Moreover, we know that the response to an input (including the
disturbance input) is linear, so we expect the magnitude of the
deviation in $\bx$ compared to the undisturbed case to be proportional
to the magnitude of the disturbance, $\bw$. A more natural bound for a
linear system's response is to bound the magnitude of the response
(from zero initial conditions) relative to the magnitude of the
disturbance.
</p>
<!--
<example><h1>Gain bounds for linear maps</h1>
<p>Allow me to make the point with a much simpler problem. Imagine I
have just a affine function (not even a system): $x = aw$. One could certainly make the statement: if $|w|\le 1$, then $|x| \le |a|$</p>
</example>
-->
<p>
Typically, this is done with the a scalar "$L_2$
gain", $\gamma$, defined as: \begin{align*}\argmin_\gamma \quad
\subjto& \quad \sup_{\bw(\cdot) \in \int \|\bw(t)\|^2 dt\le \infty}
\frac{\int_0^T \| \bx(t) \|^2 dt}{\int_0^T \| \bw(t) \|^2dt} \le
\gamma^2, \qquad \text{or} \\ \argmin_\gamma \quad \subjto&
\sup_{\bw[\cdot] \in \sum_n \|\bw[n]\|^2 \le \infty} \frac{\sum_0^N
\|\bx[n]\|^2}{\sum_0^N \| \bw[n] \|^2} \le \gamma^2.\end{align*} The
name "$L_2$ gain" comes from the use of the $\ell_2$ norm on the
signals $\bw(t)$ and $\bx(t)$, which is assumed only to be finite.</p>
<p>More often, these gains are written not in terms of $\bx[n]$
directly, but in terms of some "performance output", $\bz[n]$. For
instance, if would would like to bound the cost of a quadratic
regulator objective as a function of the magnitude of the disturbance,
we can minimize $$ \min_\gamma \quad \subjto \quad \sup_{\bw[n]}
\frac{\sum_0^N \|\bz[n]\|^2}{\sum_0^N \| \bw[n] \|^2} \le \gamma^2,
\qquad \bz[n] = \begin{bmatrix}\sqrt{\bQ} \bx[n] \\ \sqrt{\bR}
\bu[n]\end{bmatrix}.$$ This is a simple but important idea, and
understanding it is the key to understanding the language around robust
control. In particular the $H_2$ norm of a system (from input $\bw$ to
output $\bz$) is the energy of the impulse response; when $\bz$ is
chosen to represent the quadratic regulator cost as above, it
corresponds to the expected LQR cost. The $H_\infty$ norm of a system
(from $\bw$ to $\bz$) is the largest singular value of the transfer
function; it corresponds to the $L_2$ gain.</p>
</subsubsection>
<subsubsection><h1>Small-gain theorem</h1>
<todo>Robust control diagram</todo>
<p>Coming soon...</p>
</subsubsection>
<subsubsection><h1>Dissipation inequalities</h1>
<p>Coming soon... See, for instance, <elib>Ebenbauer09</elib> or <elib part="Ch. 2">Scherer15</elib>.</p>
</subsubsection>
<todo>IQCs</todo>
</subsection>
<subsection><h1>$H_2$ design</h1>
</subsection>
<subsection><h1>$H_\infty$ design</h1>
</subsection>
<todo> Loftberg 2003 for
constrained version (w/ disturbance feedback).
</todo>
<subsection><h1>Linear Exponential-Quadratic Gaussian (LEQG)</h1>
<p><elib>Jacobson73</elib> observed that it is also straight-forward to
minimize the objective: $$J = E\left[ \prod_{n=0}^\infty e^{\bx^T[n] {\bf
Q} \bx[n]} e^{\bu^T[n] {\bf R} \bu[n]} \right] = E\left[
e^{\sum_{n=0}^\infty \bx^T[n] {\bf Q} \bx[n] + \bu^T[n] {\bf R} \bu[n]}
\right],$$ with ${\bf Q} = {\bf Q}^T \ge {\bf 0}, {\bf R} = {\bf R}^T >
0,$ by observing that the cost is monotonically related to $\log{J}$, and
therefore has the same minima (this same trick forms the basis for
"geometric programming" <elib>Boyd07</elib>). This is known as the
"Linear Exponential-Quadratic Gaussian" (LEG or LEQG), and for the
deterministic version of the problem (no process nor measurement noise)
the solution is identical to the LQR problem; it adds no new modeling
power. But with noise, the LEQG optimal controllers are different from
the LQG controllers; they depend explicitly on the covariance of the
noise.
<todo>introduce the coefficient \sigma here, instead of just throwing it
in from the start. The coefficient $\sigma$ in the objective is referred
to as the "risk-sensitivity parameter" <elib>Whittle90</elib>.
</todo>
</p>
<todo>Insert simplest form of the derivation here, plus an example.</todo>
<p><elib>Whittle90</elib> provides an extensive treatment of this framework;
nearly all of the analysis from LQR/LQG (including Riccati equations,
Hamiltonian formulations, etc) have analogs for their versions with
exponential cost, and he argues that LQG and H-infinity control can
(should?) be understood as special cases of this approach.</p>
</subsection>
<subsection><h1>Adaptive control</h1>
<p>The standard criticism of $H_2$ optimal control is that minimizing the
expected value does not allow any guarantees on performance. The
standard criticism of $H_\infty$ optimal control is that it concerns
itself with the worst case, and may therefore be conservative, especially
because distributions over possible disturbances chosen a priori may be
unnecessarily conservative. One might hope that we could get some of
this performance back if we are able to update our models of uncertainty
online, adapting to the statistics of the disturbances we actually
receive. This is one of the goals of adaptive control.</p>
<p>One of the fundamental problems in online adaptive control is the
trade-off between exploration and exploitation. Some inputs might drive
the system to build more accurate models of the dynamics / uncertainty
quickly, which could lead to better performance. But how can we
formalize this trade-off?
</p>
<p>There has been some nice progress on this challenge in machine
learning in the setting of (contextual) <a
href="https://en.wikipedia.org/wiki/Multi-armed_bandit">multi-armed
bandit</a> problems. For our purposes, you can think of bandits as a
limiting case of an optimal control problem where there are no dynamics
(the effects of one control action do not effect the results of the next
control action). In this simpler setting, the online optimization
community has developed exploration-exploitation strategies based on the
notion of minimizing <i>regret</i> -- typically the accumulated
difference in the performance achieved by my online algorithm vs the
performance that would have been achieved if I had been privy to the data
from my experiments before I started. This has led to methods that make
use of concepts like upper-confidence bound (UCB) and more recently
bounds using a least-squares squares confidence bound
<elib>Foster20</elib> to provide bounds on the regret.</p>
<p>In the last few years, we've see these results translated into the setting of linear optimal control... </p>
<todo>finish this... cite Elad / Max</todo>
</subsection>
<subsection><h1>Structured uncertainty</h1>
<todo>$\mu$ synthesis. D-K iterations.</todo>
</subsection>
<subsection><h1>Linear parameter-varying (LPV) control</h1>
<todo>
Pendulum example from Briat15 1.4.1 (and the references therein).
</todo>
</subsection>
</section>
<section><h1>Trajectory optimization</h1>
<subsection><h1>Monte-carlo trajectory optimization</h1>
</subsection>
<subsection><h1>Iterative $H_2$</h1>
<todo>cover iLQR and perhaps Scott's UKF version in the output-feedback chapter?</todo>
</subsection>
<subsection><h1>Finite-time (reachability) analysis</h1>
<p>Coming soon...</p>
<todo>Finite-time (reachability) analysis. Sadra's polytopic
containment: Sadraddini19a</todo>
</subsubsection>
<subsection><h1>Robust MPC</h1>
<todo>Tube MPC, Sadra</todo>
</subsection>
<todo>iLEQG/iLEG.</todo>
</section>
<section><h1>Nonlinear analysis and control</h1>
<todo>Stochastic verification of nonlinear systems. (Having bounds on
expected value does yield bounds on system state).
Uncertainty quantification (e.g., linear guassian
approximation; discretize then closed form solutions using the transition
matrix....). Monte-Carlo. Particle sim/filter. Rare event simulation </p>
<p>L2-gain with dissipation inequalities. Finite-time verification with
sums of squares.</p>
<p>Occupation Measures</p>
SOS-based design
</todo>
</section>
<section><h1>Domain randomization</h1>
<todo>and empirical risk?</todo>
</section>
<section><h1>Extensions</h1>
<subsection><h1>Alternative risk/robustness metrics</h1>
<todo>e.g. Ani + Pavone</todo>
</subsection>
</section>
</chapter>
<!-- EVERYTHING BELOW THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->
<div id="references"><section><h1>References</h1>
<ol>
<li id=Zhou97>
<span class="author">Kemin Zhou and John C. Doyle</span>,
<span class="title">"Essentials of Robust Control"</span>, Prentice Hall
, <span class="year">1997</span>.
</li><br>
<li id=Petersen00>
<span class="author">I. R. Petersen and V. A. Ugrinovskii and A. V. Savkin</span>,
<span class="title">"Robust Control Design using H-infinity Methods"</span>, Springer-Verlag
, <span class="year">2000</span>.
</li><br>
<li id=Boyd04a>
<span class="author">Stephen Boyd and Lieven Vandenberghe</span>,
<span class="title">"Convex Optimization"</span>, Cambridge University Press
, <span class="year">2004</span>.
</li><br>
<li id=Ebenbauer09>
<span class="author">Christian Ebenbauer and Tobias Raff and Frank Allgower</span>,
<span class="title">"Dissipation inequalities in systems theory: An introduction and recent results"</span>,
<span class="publisher">Invited Lectures of the International Congress on Industrial and Applied Mathematics 2007</span> , pp. 23-42, <span class="year">2009</span>.
</li><br>
<li id=Scherer15>
<span class="author">Carsten Scherer and Siep Weiland</span>,
<span class="title">"Linear {Matrix} {Inequalities} in {Control}"</span>, Online Draft
, pp. 293, <span class="year">2015</span>.
</li><br>
<li id=Jacobson73>
<span class="author">D. Jacobson</span>,
<span class="title">"Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games"</span>,
<span class="publisher">IEEE Transactions on Automatic Control</span>, vol. 18, no. 2, pp. 124--131, apr, <span class="year">1973</span>.
</li><br>
<li id=Boyd07>
<span class="author">S. Boyd and S.-J. Kim and L. Vandenberghe and A. Hassibi</span>,
<span class="title">"A Tutorial on Geometric Programming"</span>,
<span class="publisher">Optimization and Engineering</span>, vol. 8, no. 1, pp. 67-127, <span class="year">2007</span>.
</li><br>
<li id=Whittle90>
<span class="author">Peter Whittle</span>,
<span class="title">"Risk-sensitive optimal control"</span>, Wiley New York
, vol. 20, <span class="year">1990</span>.
</li><br>
<li id=Foster20>
<span class="author">Dylan Foster and Alexander Rakhlin</span>,
<span class="title">"Beyond {UCB}: Optimal and Efficient Contextual Bandits with Regression Oracles"</span>,
<span class="publisher">Proceedings of the 37th International Conference on Machine Learning</span> , vol. 119, pp. 3199--3210, 13--18 Jul, <span class="year">2020</span>.
</li><br>
</ol>
</section><p/>
</div>
<table style="width:100%;"><tr style="width:100%">
<td style="width:33%;text-align:left;"><a class="previous_chapter" href=feedback_motion_planning.html>Previous Chapter</a></td>
<td style="width:33%;text-align:center;"><a href=index.html>Table of contents</a></td>
<td style="width:33%;text-align:right;"><a class="next_chapter" href=output_feedback.html>Next Chapter</a></td>
</tr></table>
<div id="footer">
<hr>
<table style="width:100%;">
<tr><td><a href="https://accessibility.mit.edu/">Accessibility</a></td><td style="text-align:right">© Russ
Tedrake, 2022</td></tr>
</table>
</div>
</body>
</html>