-
Notifications
You must be signed in to change notification settings - Fork 1
/
DA1_Chap5.tex
executable file
·1752 lines (1680 loc) · 73.5 KB
/
DA1_Chap5.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% DA1_Chap5.tex
%
\chapter{LINEAR (MATRIX) ALGEBRA}
\epigraph{``Never send a human to do a machine's job.''}{\textit{Agent Smith} in \textit{The Matrix}}
\label{ch:matrix}
A large subset of data analysis techniques is simply the practical application of
linear algebra. Undergraduate students in the natural sciences often lack any formal introduction
to this material, or they suffered through an overly theoretical presentation in a course
offered by a mathematics department. In this book we will not dwell too much on the finer
theoretical points of linear algebra but instead present a simple overview of the aspects that are particularly
pertinent in data analysis. There are of course an infinite number of books on matrix and
linear algebra that the eager reader could consult beyond this brief introduction.
\section{Matrix Algebra Terminology}
\index{Matrix!definition}
\index{Matrix!order}
A \emph{matrix} is simply a rectangular array of elements arranged in a series of $n$ rows and $k$
columns. The \emph{order} of a matrix is the specification of the number of rows by the number of
columns. \emph{Elements} of a matrix are denoted $a_{ij}$, where index $i$ specifies the \emph{row} position
and index $j$ specifies the \emph{column} position; thus $a_{ij}$ identifies the element at position $i,j$.
\index{Matrix!element}
An element can be a number (real or complex), an algebraic expression, or (with some
restrictions) another matrix or matrix expression. As an example of a real matrix, consider
\begin{equation}
\mathbf{A} = \left[ \begin{array}{rcc}
12 & 4 & 10 \\
8 & 1 & 11\\
15 & 3 & 11\\
14 & 1 & 11
\end{array} \right]
\ = \
\left[ \begin{array}{ccc}
a_{11} & a_{12} & a_{13} \\
a_{21} & a_{22} & a_{23}\\
a_{31} & a_{32} & a_{33} \\
a_{41} & a_{42} & a_{43}
\end{array} \right].
\end{equation}
This matrix, $\mathbf A$, is of order $4 \times 3$, with elements $a_{11}$ = 12, $a_{21}$ = 8, and so on. The notation used for matrices is
not always consistent, but it is usually one of the following schemes:
\begin{description}
\item [Matrix:] Designated by a bold, uppercase letter (the most common scheme), brackets, or hat $(\hat{ \ })$,
sometimes with one or (more commonly) two underscores.
The order is also sometimes explicitly given. E.g., $\mathbf{A}_{(4,3)}$ means
$\mathbf A$ is of order $4 \times 3$.
\item [Order:] Always given as rows x columns but uses letters $n,k,m,p$ differently, e.g., $n$ (rows) x $k$ (columns) or
$k$ (rows) x $n$ (columns).
\item [Element:] Most commonly $a_{ij}$, with $i =$ row; $j =$ column (sometimes other dummy indices like $l,p,q$ are used).
\end{description}
The advantages of matrix algebra mainly lies in the fact that it provides a concise and simple
method for manipulating large sets of numbers or computations, making it ideal for computers.
Furthermore,
\begin{enumerate}
\item The compact form of matrices allows a convenient notation for describing large tables of data.
\item Matrix operations allow complex relationships to be seen, which otherwise would be obscured by the
shear size of the data (i.e., it aids in clarification).
\item Most matrix manipulations involve just a
few standard operations for which standard subroutines are readily available.
\end{enumerate}
MATLAB, which stands for ``Matrix Laboratory'', is ideally suited to perform such manipulations,
as is its Open Source clone, Octave. Python with numPy, R or Julia are also good choices.
\index{MATLAB}
\index{Octave}
As a convention with data matrices (i.e., when the elements are data values), the columns
usually represent the different \emph{variables} (e.g., one column contains temperatures, another salinity,
etc.) while rows contain the \emph{observations} (e.g., the values of the variables at different depths, times, or positions). Since
there are usually more observations than variables, such data matrices are typically rectangular, having
more rows $(n)$ than columns $(k)$, i.e., the matrix has an order $n \times k$ where $n > k$.
\section{Matrix Definitions}
Matrices whose smallest dimension equals one are called \emph{vectors} and are typically designated
by a bold, lowercase letter, but sometimes they may be typeset normally with either an
arrow above them or a single underscore beneath. By having only one dimension, one of the two indices (row or column) is dropped.
A \emph{column vector} is a matrix containing only a single column of elements, such as
\index{Column vector}
\index{Vector!column}
\begin{equation}
\mathbf{a} = \left[ \begin{array}{c}
a_1\\
a_2\\
\vdots\\
a_n
\end{array} \right] .
\end{equation}
\noindent
A \emph{row vector} is a matrix containing only a single row of elements, e.g.,
\index{Row vector}
\index{Vector!row}
\begin{equation}
\mathbf{b} = \left[ \begin{array}{cccc}
b_1 & b_2 & \cdots & b_n
\end{array} \right] .
\end{equation}
The size of a vector is simply the number of elements it contains ($n$, in both examples above).
\index{Null matrix}
\index{Matrix!null}
The \emph{null matrix}, written as $\mathbf{0}$ or $\mathbf{0}_{(k,n)}$ has all its elements set equal to 0 --- it plays the
role of ``zero'' in matrix algebra.
A \emph{square matrix} has the same numbers of rows as columns, so its order is $n \times n$.
\index{Square matrix}
\index{Matrix!square}
A \emph{diagonal matrix} is a square matrix with zeros in all positions except along the principal (or
leading) diagonal:
\begin{equation}
\index{Diagonal matrix}
\index{Matrix!diagonal}
\mathbf{D} = \left[
\begin{array}{ccc}
3 & 0 & 0 \\
0 & 1 & 0\\
0 & 0 & 6
\end{array} \right]
\end{equation}
or
\index{Matrix!identity}
\index{Identity matrix}
\begin{equation}
d_{ij} = \left \{ \begin{array}{cl}
0 & \mbox{for } i \neq j\\
\mbox{nonzero} & \mbox{for } i = j
\end{array} \right.
\end{equation}
This type of matrix is important for scaling the rows or columns of other matrices. The \emph{identity
matrix} is a diagonal matrix with all of its nonzero elements equal to one. Written as $\mathbf{I}$ or $\mathbf{I}_n$,
it plays the role of ``one'' in matrix algebra.
A \emph{lower triangular matrix} is a square
matrix with all elements equal to zero \emph{above} the principal diagonal, e.g.,
\begin{equation}
\index{Matrix!lower triangular}
\index{Lower triangular matrix}
\mathbf{L} = \left [ \begin{array}{ccc}
1 & 0 & 0 \\
3 & 7 & 0 \\
8 & 2 & 6
\end{array}
\right ] =
\left[ \begin{array}{ccc}
1 \\
3 & 7 \\
8 & 2 & 6 \end{array} \right]
\end{equation}
or
\begin{equation}
\ell_{ij} = \left \{ \begin{array}
{cl}
0 & \mbox{for } i < j\\
\mbox{nonzero} & \mbox{for } i \geq j
\end{array} \right .
\end{equation}
An \emph{upper triangular matrix} is thus a square matrix with all elements equal to zero \emph{below} the principal
diagonal
\begin{equation}
\index{Matrix!upper triangular}
\index{Upper triangular matrix}
u_{ij} = \left \{ \begin{array}{cl}
0 & \mbox{for } i > j\\
\mbox{nonzero} & \mbox{for } i \leq j
\end{array} \right .
\end{equation}
If one multiplies two triangular matrices of the same form, the result is a third matrix of the same
form.
\index{Fully populated matrix}
\index{Matrix!fully populated}
\index{Matrix!sparse}
\index{Sparse matrix}
We also have the \emph{fully populated matrix} which is a matrix with all of its elements nonzero,
the \emph{sparse matrix} which is a matrix with only a small proportion of its elements nonzero, and the \emph{scalar}
which simply is a number (i.e., a matrix of order 1x1, representing a single element).
A \emph{matrix transpose} (or the transpose of a matrix) is obtained by \emph{interchanging} the rows and columns
of the matrix. Thus, row $i$ becomes column $i$ and column $j$ becomes row $j$. As a consequence, the order of the matrix
is reversed:
\begin{equation}
\index{Matrix!transpose}
\index{Transpose of matrix}
\mathbf{A} = \left[ \begin{array}{cc}
1 & 14 \\
6 & 7 \\
8 & 2
\end{array} \right]
\Rightarrow
\mathbf{A}^T = \left[ \begin{array}{ccr}
1 & 6 & 8 \\
14 & 7 & 2 \end{array} \right]
\end{equation}
As shown, taking the transpose is indicated by the superscript $^T$.
Repeated transposing yields the original matrix, i.e.,
\begin{equation}
(\mathbf{A}^T)^T = \mathbf{A}.
\end{equation}
A diagonal matrix is its own transpose: $\mathbf{D}^T = \mathbf{D}$. In general, we find the transpose rule
\begin{equation}
a_{ij} \Leftrightarrow a_{ji}.
\end{equation}
A \emph{symmetric matrix} is a square matrix that is symmetric about its principal diagonal, so
$a_{ij} = a_{ji}$. Therefore, a symmetric matrix is equal to its own transpose:
\begin{equation}
\index{Matrix!symmetric}
\index{Symmetric matrix}
\mathbf{A} = \left[ \begin{array}{ccc}
1 & 2 & 5 \\
2 & 6 & 3 \\
5 & 3 & 4
\end{array}
\right]
= \mathbf{A}^T .
\end{equation}
A \emph{skew symmetric matrix} is a matrix in which
\begin{equation}
a_{ij} = -a_{ji}.
\end{equation}
Therefore, $\mathbf{A}^T = - \mathbf{A}$. Thus, $a_{ii}$, the principal diagonal elements, must all be zero.
The following matrix is skew symmetric:
\begin{equation}
\index{Matrix!skew symmetric}
\index{Skew symmetric matrix}
\mathbf{A} = \left[ \begin{array}{ccc}
0 & 4 & -5 \\
-4 & 0 & 3 \\
5 & -3 & 0
\end{array}
\right].
\end{equation}
Any square matrix can be decomposed into the \emph{sum} of a symmetric and a skew-symmetric matrix:
\begin{equation}
\mathbf{A} = \frac{1}{2} (\mathbf{A} + \mathbf{A}^T) + \frac{1}{2} (\mathbf{A} - \mathbf{A}^T).
\end{equation}
The \emph{trace} of a square matrix is simply the sum of the elements along the principal diagonal. It
is symbolized as tr ($\mathbf{A}$).\index{Matrix!trace}\index{Trace of matrix}
This property is useful in calculating various quantities from matrices.
\index{Matrix!submatrix}
\index{Submatrix}
\index{Matrix!supermatrix}
\index{Supermatrix}
\emph{Submatrices} are smaller matrix \emph{partitions} of the larger \emph{supermatrix}, i.e.,
\begin{equation}
\left [ \frac{\mbox{Supermatrix}}{\mathbf{F}} \right] = \left[ \frac{\mathbf{A} | \mathbf{B}}{\mathbf{C} | \mathbf{D}} \right].
\end{equation}
Such partitioning will frequently be useful.
\section{Matrix Addition}
\index{Matrix!addition}
Matrix addition and subtraction require matrices of the \emph{same order} since each operation
simply involves the addition or subtraction of corresponding elements. So, if $\mathbf{C} = \mathbf{A} + \mathbf{B}$ then
\begin{equation}
\mathbf{A} = \left[ \begin{array}{cc}
a_{11} & a_{12}\\
a_{21} & a_{22}\\
a_{31} & a_{32} \end{array} \right] , \mathbf{B} =
\left[ \begin{array}{cc}
b_{11} & b_{12}\\
b_{21} & b_{22}\\
b_{31} & b_{32} \end{array} \right] ,
\mathbf{C} = \left[ \begin{array}{cc}
a_{11}+ b_{11} & a_{12} + b_{12}\\
a_{21}+ b_{21} & a_{22} + b_{22}\\
a_{31} + b_{31} & a_{32} + b_{32} \end{array} \right] ,
\end{equation}
and (with apologies to ABBA fans)
\begin{equation}
\mathbf{A} + \mathbf{B} = \mathbf{B} + \mathbf{A},
\end{equation}
\begin{equation}
(\mathbf{A} + \mathbf{B}) + \mathbf{C} = \mathbf{A} + (\mathbf{B} + \mathbf{C}),
\end{equation}
where all matrices must be of the same order. \emph{Scalar multiplication} of a matrix is achieved by multiplying all elements of a matrix
by a constant (the scalar):
\begin{equation}
\beta \mathbf{A} = \beta \left[ \begin{array}{cc}
a_{11} & a_{12} \\
a_{21} & a_{32}\\
a_{31} & a_{32}
\end{array}
\right]
=
\left[ \begin{array}{cc}
\beta a_{11} & \beta a_{12} \\
\beta a_{21} & \beta a_{22}\\
\beta a_{31} & \beta a_{32}
\end{array}
\right],
\end{equation}
where $\beta$ is a scalar.
Thus, every element is multiplied by the scalar.
\section{Dot Product}
\index{Vector!product|(}
\index{Dot product|(}
The \emph{scalar product} (or \emph{dot product} or
\emph{inner product}) is the product of two vectors of the same size, e.g.,
\begin{equation}
\mathbf{a}\cdot \mathbf{b} = \beta,
\end{equation}
where $\mathbf a$ is a row vector (or the transpose of a column vector) of length $n, \mathbf{b}$ is a column vector (or
the transpose of a row vector), also of length $n$, and $\beta$ is the scalar product of $\mathbf a \cdot \mathbf b$.
Given the two 3-D vectors
\begin{equation}
\mathbf{a} = [a_1 \ a_2 \ a_3 ], \quad \mathbf{b} = \left[ \begin{array}{c}
b_1\\
b_2\\
b_3
\end{array}
\right],
\end{equation}
we sum the products of corresponding elements in the two vectors, obtaining
\begin{equation}
\beta = a_1 b_1 + a_2b_2 + a_3b_3.
\end{equation}
We may visualize this multiplication, as illustrated in Figure~\ref{fig:Fig1_dotproduct} for two 4-D vectors.
\PSfig[H]{Fig1_dotproduct}{The dot product of the two 4-D vectors $\mathbf{a} = [2 \ 1 \ 4 \ 5]$
and $\mathbf{b} = [1 \ 3 \ 4 \ 2]$
is obtained by multiplying the component pairs and calculating the sum of these products.}
Geometrically, this product can be thought of as multiplying the length of one vector by the
component of the other vector that is parallel to the first, as shown in Figure~\ref{fig:Fig1_dotvector}:
\PSfig[H]{Fig1_dotvector}{Geometrical meaning of the dot product of two vectors. Regardless of dimension, the
dot product is proportional to the cosine of the angle between the two vectors.}
As an example, think of $\mathbf b$ as a force and $|\mathbf a|$ as the magnitude of displacement, with their product equal to the work in the
direction of $\mathbf a$. Thus:
\begin{equation}
\mathbf{a \cdot b = |a||b|} \cos (\theta),
\end{equation}
where the \emph{magnitude} of a vector $\mathbf x$ is given by
\begin{equation}
|\mathbf{x}| = \sqrt{x^2_1 + x ^2_2 + \ldots + x^2_n}.
\end{equation}
The maximum principle says that the unit vector $\mathbf{(\hat{n})}$ making $\mathbf{ a \cdot \hat{n}}$ a maximum is that unit vector
pointing in the same direction as $\mathbf{a}:$ If $\mathbf{\hat{n} \parallel a}$ then $\cos(\theta) = \cos(0^{\circ}) = 1$ and $\mathbf{a\cdot n = |a| |n|}\cos(\theta) =
\mathbf{|a||\hat{n}| = |a|}$. This is equally true where $\mathbf{d}$ is any vector of a given magnitude --- that vector $\mathbf{\hat{n}}$ which
parallels $\mathbf{d}$ will give the largest scalar product.
Parallel vectors thus have $\cos(\theta) = 1$, then $\mathbf{a \cdot b = |a||b|}$ and
$\mathbf{a = \beta b}$ (i.e., two vectors are parallel if
one is simply a scalar multiple of the other --- this property comes from equating direction cosines),
where
\begin{equation}
\beta = \mathbf{|a|/|b|}.
\end{equation}
In contrast, \emph{perpendicular} vectors have $\cos \theta = \cos 90^\circ = 0$, so that $\mathbf{ a \cdot b} = 0$, where $\mathbf{a \bot b}$.
Squaring vectors is simply the dot product of a vector with its own transpose, i.e.,
\begin{equation}
\mathbf{a}^2 = \mathbf{ a \cdot a}^T \mbox{ for row vectors}
\end{equation}
and
\begin{equation}
\mathbf{a}^2 = \mathbf{ a}^T \cdot \mathbf{a} \mbox{ for column vectors}.
\end{equation}
\index{Vector!product|)}
\index{Dot product|)}
\section{Matrix Multiplication}
\index{Matrix!multiplication|(}
Matrix multiplication requires ``conformable'' matrices. Matrices are conformable when
there are as many columns in the first matrix as there are rows in the second matrix. Consider
\index{Matrix!conformable}
\begin{equation}
\mathbf{C}_{(n,k)} = \mathbf{A}_{(n,p)} \cdot \mathbf{B}_{(p,k)}.
\end{equation}
The matrix product $\mathbf{C}$ is of order $n \times k$ and has elements $c_{ij}$, given by
\begin{equation}
c_{ij} = \sum^p _{q=1} a_{iq}b_{qj}.
\end{equation}
This is an extension of the scalar product --- in this case, each element of $\mathbf{C}$ is the scalar product of a row
vector in $\mathbf{A}$ and a column vector in $\mathbf{B}$. For instance, if
\begin{equation}
\left[
\begin{array}{cc}
c_{11} & c_{12} \\
c_{21} & c_{22}
\end{array}
\right ]
=
\left[
\begin{array}{ccc}
a_{11}& a_{12} & a_{13}\\
a_{21}& a_{22} & a_{23}
\end{array}
\right ]
\left [
\begin{array}{cc}
b_{11} & b_{12}\\
b_{21} & b_{22}\\
b_{31} & b_{32}
\end{array}
\right],
\end{equation}
then
\begin{equation}
c_{12} = a_{11}b_{12} + a_{12}b_{22} + a_{13}b_{32}.
\end{equation}
We illustrate the situation in Figure~\ref{fig:Fig1_matprod1}.
\PSfig[h]{Fig1_matprod1}{The matrix product of two matrices $\mathbf{A}$ and $\mathbf{B}$. The light blue row in $\mathbf{A}$ is ``dotted'' with
the light blue column in $\mathbf{B}$, resulting in the single, light blue element in $\mathbf{C}$. Similarly, the dot
product of the light green vectors result in the single, light green element. This process is repeated by letting
all rows in $\mathbf{A}$ be ``dotted'' with all the columns in $\mathbf{B}$.}
The order of multiplication is critical. Usually
\begin{equation}
\mathbf{A \cdot B \neq B \cdot A},
\end{equation}
and unless $\mathbf{A}$ and $\mathbf{B}$ are square matrices or the order of $\mathbf{A}^T$ is the same as the order of $\mathbf{B}$ (or vice
versa), one of the two products cannot even be formed. The multiplication order is specified by stating
\begin{description}
\item [A] is \emph{pre}-multiplied by $\mathbf{B}$ (yielding $\mathbf{B \cdot A}$)
\index{Matrix!premultiply}
\item [A] is \emph{post}-multiplied by $\mathbf{B}$ (yielding $\mathbf{ A \cdot B}$)
\index{Matrix!postmultiply}
\end{description}
The order in which the pairs are multiplied is not important \emph{mathematically}, i.e.,
\begin{equation}
\mathbf{ D = (A \cdot B) \cdot C = A \cdot (B \cdot C)},
\end{equation}
but we will see later that the order matter \emph{computationally}.
The \emph{transpose of a matrix product} is simply the multiplication of the transpose of each individual matrix in
reverse order, i.e.,
\begin{equation}
\mathbf{D = A \cdot B \cdot C},
\end{equation}
\begin{equation}
\mathbf{D}^T = \mathbf{C}^T \cdot \mathbf{B}^T \cdot \mathbf{A}^T.
\label{eq:transposerule}
\end{equation}
\emph{Multiplication by $\mathbf{I}$} leaves the matrix unchanged, i.e.,
\begin{equation}
\mathbf{A \cdot I = I \cdot A = A}.
\end{equation}
For example,
\begin{equation}
\left[ \begin{array}{ccc}
3 & 6 & 9 \\
2 & 8 & 7
\end{array} \right]
\left[ \begin{array}{ccc}
1 & 0 & 0\\
0 & 1 & 0\\
0 & 0 & 1 \end{array}
\right]
=
\left[
\begin{array}{ccc}
3 & 6 & 9 \\
2 & 8 & 7
\end{array}
\right] .
\end{equation}
\emph{Premultiplication by a diagonal matrix} is written
$\mathbf{C = D \cdot A}$,
where $\mathbf{D}$ is a diagonal matrix. Here, $\mathbf{C}$ is the $\mathbf{A}$ matrix with each \emph{row} scaled by the corresponding diagonal
element of $\mathbf{D}$:
\begin{equation}
\mathbf{D} = \left[ \begin{array}{ccc}
d_{11}\\
& d_{22} \\
& & d_{33}
\end{array} \right] , \quad \mathbf{A} = \left[ \begin{array}{ccc}
a_{11} & a_{12} & a_{13}\\
a_{21} & a_{22} & a_{23}\\
a_{31} & a_{32} & a_{33}
\end{array} \right ]
\end{equation}
\begin{equation}
\mathbf{C = D \cdot A} = \left[ \begin{array}{ccc}
a_{11}d_{11} & a_{12}d_{11} & a_{13}d_{11}\\
a_{21}d_{22} & a_{22}d_{22} & a_{23}d_{22}\\
a_{31}d_{33} & a_{32}d_{33} & a_{33}d_{33}
\end{array} \right]
\quad
\begin{array}{l}
\leftarrow \mbox{ each element } \times d_{11}\\
\leftarrow \mbox{ each element } \times d_{22}\\
\leftarrow \mbox{ each element } \times d_{33}
\end{array}
\end{equation}
\emph{Postmultiplication by a diagonal matrix} produces a matrix in which each \emph{column} has been
scaled by the corresponding diagonal element $\mathbf{D}$. Hence,
\begin{equation}
\mathbf{C = A \cdot D} = \left[\begin{array}{ccc} a_{11} d_{11} & a_{12} d_{22} & a_{13} d_{33}\\
a_{21} d_{11} & a_{22} d_{22} & a_{23} d_{33}\\
a_{31} d_{11} & a_{32}d_{22} & a_{33} d_{33}
\end{array} \right],
\end{equation}
where each column in $\mathbf{A}$ has been scaled by the corresponding diagonal matrix elements, $d_{ii}$.
\subsection{Computational considerations}
The matrix product
\begin{equation}
\mathbf{C}_{(n,k)} = \mathbf{A}_{(n,p)} \cdot \mathbf{B}_{(p,k)}
\end{equation}
involves $n \times k \times p$ multiplications and $n \times k \times (p -1)$ additions. hence,
\begin{equation}
\mathbf{E}_{(n,k)} = [ \mathbf{A}_{(n,p)} \cdot \mathbf{B}_{(p,q)}] \cdot \mathbf{C}_{(q,k)}
\end{equation}
gives $n \times p \times q$ multiplications, so
\begin{equation}
\mathbf{E} = [\mathbf{D}_{(n,q)}] \cdot \mathbf{C}_{(q,k)}
\end{equation}
gives $n\times q \times k$ multiplications, and
\begin{equation}
\mathbf{E} _{(n,k)} = \mathbf{A}_{(n,p)} \cdot [\mathbf{B}_{(p,q)} \cdot \mathbf{C}_{(q,k)}]
\end{equation}
gives $p\times q \times k$ multiplications, while
\begin{equation}
\mathbf{E}_{(n,k)} = \mathbf{A}_{(n,p)}\cdot [\mathbf{D}_{(p,k)} ]
\end{equation}
gives $n \times p \times k$ multiplications. Therefore, the total number of operations depend on the order of multiplications:
\begin{enumerate}
\item $\mathbf{(A \cdot B) \cdot C} \Rightarrow nq(p+k)$ total multiplications
\item $\mathbf{A \cdot (B \cdot C)} \Rightarrow pk(n+q)$ total multiplications
\end{enumerate}
If both $\mathbf{A}$ and $\mathbf{B}$ are $100 \times 100$ matrices and $\mathbf{C}$ is $100 \times 1$, then $n = p = q = 100$, and $k
= 1$. Multiplying using form (1) involves $\sim 1 \times 10^6$ multiplications, whereas form (2) involves $2 \times
10^4$; so computing $\mathbf{B \cdot C}$ first, then premultiplying by $\mathbf{A}$ saves almost a million multiplications
and almost an equal number of additions. Therefore, the order of operations is extremely important
computationally for both speed and accuracy, as more operations lead to a greater accumulation of
\emph{round-off errors}.
\index{Matrix!multiplication|)}
\section{Matrix Determinant}
\index{Matrix!determinant|(}
The \emph{determinant of a matrix} is a single number representing a property of a square matrix
(and is dependent upon what the matrix represents). The main use here is for finding the inverse of a
matrix or for solving simultaneous linear equations. Symbolically, the determinant is usually written as
det $(\mathbf{A})$, $\mathbf{|A|}$ or $\mathbf{||A||}$ (to differentiate from magnitude).
The calculation of a $2 \times 2$ determinant is carried out using the definition
\begin{equation}
|\mathbf{A}| = \left| \begin{array}{cc} a_{11} & a_{12} \\
a_{21} & a_{22}
\end{array} \right| = a_{11}a_{22} - a_{12}a_{21},
\end{equation}
which is the difference of the cross products. The calculation of an $n \times n$ determinant is given by
\begin{equation}
|\mathbf{A}| = a_{11} m_{11} - a_{12} m_{12} + a_{13} m_{13} - \cdots - (-1)^na_{1n} m_{1n},
\end{equation}
where $m_{11}$ is the determinant of $\mathbf{A}$ with the first row and column missing; $m_{12}$ is the determinant with
the first row and second column missing, etc. For larger matrices the procedure is used recursively.
The determinant of a $1 \times 1$ matrix is just the
particular element. An example of a $3 \times 3$ determinant follows:
\begin{equation}
|\mathbf{A} | = \left | \begin{array}{ccc}
a_{11} & a_{12} & a_{13}\\
a_{21} & a_{22} & a_{23}\\
a_{31} & a_{32} & a_{33}
\end{array}\right |
\end{equation}
\begin{equation}
m_{11} = \left| \begin{array}{ccc}
a_{11} & a_{12} & a_{13}\\
a_{21} & a_{22} & a_{23}\\
a_{31} & a_{32} & a_{33}
\end{array} \right |
= a_{22}a_{33} - a_{23}a_{32}
\end{equation}
\begin{equation}
m_{12} = \left| \begin{array}{ccc}
a_{11} & a_{12} & a_{13}\\
a_{21} & a_{22} & a_{23}\\
a_{31} & a_{32} & a_{33}
\end{array} \right |
= a_{21}a_{33} - a_{23}a_{31}
\end{equation}
\begin{equation}
m_{13} = \left| \begin{array}{ccc}
a_{11} & a_{12} & a_{13}\\
a_{21} & a_{22} & a_{23}\\
a_{31} & a_{32} & a_{33}
\end{array} \right |
= a_{21}a_{32} - a_{22}a_{31}
\end{equation}
So
\begin{equation}
\begin{array}{c}
|\mathbf{A}| = a_{11}m_{11} - a_{12}m_{12} + a_{13}m_{13} \\
= a_{11}(a_{22}a_{33} - a_{23}a_{32}) - a_{12}(a_{21}a_{33} - a_{23}a_{31}) + a_{13}(a_{21}a_{32} - a_{22}a_{31}).
\end{array}
\end{equation}
For a $4 \times 4$ determinant, each $m_{1i}$ would be an entire expansion given above for the
$3 \times 3$
determinant --- one quickly needs a computer.
\index{Matrix!singular}
\index{Singular matrix}
A \emph{singular matrix} is a square matrix whose determinant is zero. A determinant is zero if:
\begin{enumerate}
\item Any row or column is zero.
\item Any row or column is equal to a linear combination of other rows or columns.
\end{enumerate}
As an example of a singular matrix, consider
\begin{equation}
|{\mathbf A}| = \left | \begin{array}{ccc} 1 & 6 & 4 \\
2 & 1 & 0 \\
5 & -3 & -4
\end{array} \right |,
\end{equation}
where row $1 = 3\cdot$(row 2) $-$ row 3. Then, the determinant becomes
\begin{equation}
\begin{array}{ccl}
|\mathbf{A}| & = & a_{11}(a_{22}a_{33}-a_{23}a_{32})-a_{12}(a_{21}a_{33}-a_{23}a_{31})+a_{13}(a_{21}a_{32}-a_{22}a_{31}) \\
& = & 1[1(-4)-0(-3)]-6[2(-4)-0(5)]+4[2(-3)-1(5)]=-4+48-44=0. \\
\end{array}
\end{equation}
\index{Matrix!degree of clustering}
\index{Matrix!rank}
The \emph{degree of clustering} symmetrically about the principal diagonal is another (of many)
properties of a determinant. The more the clustering, the higher the value of the determinant.
The \emph{rank} of a matrix is the number of linearly
independent vectors that it contains (either row or column vectors). Consider
\begin{equation}
\mathbf{A} = \left [ \begin{array}{cccc}
1 & 4 & 0 & 2 \\1 & 0 & 1 & -1\\
-3 & -4 & -2 & 0
\end{array} \right] .
\end{equation}
Since row $3 = -$(row 1)$ - 2\cdot$(row 2), or col $3 = $ col $1 - 1/4\cdot$(col 2) and col 4 $= -$(col 1)$ + 3/4\cdot$(col 2),
the matrix $\mathbf{A}$ has rank 2 (i.e., it has only two linearly independent vectors, independent of whether
viewed by rows or columns).
\index{Matrix!rank}
The \emph{rank} of a \emph{matrix product} must be less than or equal to the
smallest rank of the matrices being multiplied:
\begin{equation}
\mathbf{A}_{\mbox{(rank 2)}}\cdot \mathbf{B}_{\mbox{(rank 1)}} = \mathbf{C}_{\mbox{(rank 1)}}.
\end{equation}
Therefore (and seen from another angle), if a matrix has rank $r$ then any matrix factor of it must have
rank of at least $r$. Since the rank cannot be greater than the smallest of $k$ or $n$ in a $k \times n$ matrix,
this definition also limits the size (order) of factor matrices. (That is, one cannot factor a matrix
of rank 2, into two matrices of which either is of less than rank 2, so $k$ and $n$ of each factor must
also be $\geq 2$).
\index{Matrix!determinant|)}
\section{Matrix Division (Matrix Inverse)}
\index{Matrix!division}
\index{Matrix!inverse}
%Was {Fig1_matrixalien}{Abducted by an alien circus company, Professor Wessel
%is forced to write Linear Algebra equations in center ring.}
\emph{Matrix division} can be thought of as multiplying by the \emph{inverse}. Consider the scalar division
\begin{equation}
\frac{x}{b}=x\frac{1}{b}=xb^{-1},
\end{equation}
where we can write
\begin{equation}
bb^{-1}=1.
\end{equation}
Likewise, matrices can be effectively divided by multiplying by an inverse matrix. \emph{Nonsingular square
matrices} may have an inverse symbolized as $\mathbf{A}^{-1}$ and satisfying $\mathbf{AA}^{-1} = \mathbf{A}^{-1}\mathbf{A} = \mathbf{I}$.
The calculation of a matrix inverse is usually done using elimination methods on the computer.
For a simple 2 x 2 matrix, its inverse is given by
\begin{equation}
\mathbf{A}^{-1} = \frac{1}{|\mathbf{A}|}
\left [ \begin{array}{cc} a_{22} & -a_{12} \\
-a_{21} & a_{11} \\
\end{array}
\right ] .
\end{equation}
As an example, let
\begin{equation}
\mathbf{A}=
\left [\begin{array}{cc} 7 & 2\\
10 & 3\\
\end{array}
\right].
\end{equation}
We solve for the inverse as
\begin{equation}
\mathbf{A}^{-1}=\frac{1}{21-20}
\left [\begin{array}{cc}3 & -2\\
-10 & 7\\
\end{array}
\right]=
\left [\begin{array}{cc}3 & -2\\
-10 & 7\\
\end{array}
\right],
\end{equation}
and as a check we note that
\begin{equation}
\mathbf{AA}^{-1}=
\left [\begin{array}{cc} 7 & 2\\ 10 & 3\\
\end{array} \right] \left [\begin{array}{cc}3 & -2\\ -10 & 7\\ \end{array} \right]=
\left [\begin{array}{ccr} 1 & 0\\ 0 & 1\\ \end{array} \right]=\mathbf{I}.
\end{equation}
Given the concept of a matrix inverse we may summarize a few useful matrix properties:
\begin{equation}
(\mathbf{A}^{-1})^{-1} = \mathbf{A},
\end{equation}
\begin{equation}
(\mathbf{A}^{-1})^T = (\mathbf{A}^T)^{-1} = \mathbf{A}^{-T},
\end{equation}
\begin{equation}
\mathbf{D} = \mathbf{ABC} \mbox{ then } \mathbf{D}^{-1} = \mathbf{C}^{-1} \mathbf{B}^{-1} \mathbf{A}^{-1}.
\end{equation}
This ``reversal rule'' for inverse products may be useful for eliminating or minimizing the number
of matrix inverses requiring calculation.
\section{Matrix Manipulation and Normal Scores}
\index{Normal scores|(}
We will look at a few examples of matrix manipulations. Consider the data matrix
\begin{equation}
\mathbf{A} = \left[ \begin{array}{c}
1 \ 2 \ 3 \\
4 \ 5 \ 6 \\7 \ 8 \ 9
\end{array} \right ]
\end{equation}
and unit row vector
\begin{equation}
\mathbf{j}^T_3 = [1 \ 1 \ 1].
\end{equation}
To compute the mean of each column vector in $\mathbf{A}$ (here, each column has length $n = 3$), we note that
\begin{equation}
\mathbf{\bar{x}} _c = \frac{1}{3} \mathbf{j}^T_3 \mathbf{A}, \mbox{ and in general } \mathbf{\bar{x}} _c = \frac{1}{n} \mathbf{j}^T_n \mathbf{A}.
\end{equation}
For our example, we find
\begin{equation}
\mathbf{\bar{x}}_c = \frac{1}{3} \left[ 1 \ 1\ 1 \right ] \cdot
\left[ \begin{array}{c}
1 \ 2 \ 3 \\ 4 \ 5 \ 6 \\ 7 \ 8 \ 9 \end{array} \right ]
= \frac{1}{3} \left[ 12 \ 15 \ 18 \right] = \left[ 4 \ 5 \ 6 \right].
\end{equation}
To compute the mean of each row vector in $\mathbf{A}$ (here, each row has length $k = 3$), let
\begin{equation}
\mathbf{\bar{x}}_r = \frac{1}{3} \mathbf{Aj}_3, \mbox{ and in general } \mathbf{\bar{x}}_r = \frac{1}{k} \mathbf{Aj}_k .
\end{equation}
Again, for our example, we find
\begin{equation}
\mathbf{\bar{x}}_r = \frac{1}{3}
\left [ \begin{array}{c}
1 \ 2 \ 3 \\
4 \ 5 \ 6 \\
7 \ 8 \ 9
\end{array} \right ] \cdot
\left [ \begin{array}{c}
1\\
1\\
1
\end{array} \right ] = \frac{1}{3}
\left[
\begin{array}{c}
6 \\
15\\
24 \end{array} \right ]
=
\left[
\begin{array}{c}
2 \\
5\\
8 \end{array} \right ].
\end{equation}
Given these terms, how can we compute normal scores for a data table (or matrix)? What we want in
each cell are the elements
\begin{equation}
z_{ij} = \frac{a_{ij} - \bar{a}_j}{s_j}.
\end{equation}
In matrix terminology we would first need to form the difference matrix, $\mathbf{D}$, given as
\begin{equation}
\mathbf{D = A} - \frac{1}{n}\mathbf{JA},
\end{equation}
where $\mathbf{J}$ is the $n \times n$ unit matrix (all entries equal 1).
Given the diagonal matrix $\mathbf{S}$ containing the standard deviation of each column defined as
\begin{equation}
\mathbf{S} =
\left [ \begin{array}{cccc}
s_1 & 0 & \ldots & 0 \\
0 & s_2 & \ldots & 0 \\
\vdots & \vdots & \ddots & \vdots\\
0 & 0 & \ldots & s_n
\end{array} \right]
\end{equation}
we get the normal scores (here, $\mathbf{I}$ is of size $n \times n$) as
\begin{equation}
\mathbf{Z = DS}^{-1} = \left ( \mathbf{A} - \frac{1}{n} \mathbf{JA} \right) \mathbf{S}^{-1} = \left ( \mathbf{I} - \frac{1}{n} \mathbf{J} \right)
\mathbf{AS}^{-1}.
\end{equation}
\index{Normal scores|)}
\section{Solution of Simultaneous Linear Equations}
\index{Solution of simultaneous linear equations|(}
A system of four simultaneous linear equations in four unknowns $x_1, x_2, x_3, x_4$ can be written
\begin{equation}
\begin{array}{c}
a_{11}x_1+a_{12}x_2+a_{13}x_3+a_{14}x_4=b_1\\
a_{21}x_1+a_{22}x_2+a_{23}x_3+a_{24}x_4=b_2\\
a_{31}x_1+a_{32}x_2+a_{33}x_3+a_{34}x_4=b_3\\
a_{41}x_1+a_{42}x_2+a_{43}x_3+a_{44}x_4=b_4\\
\end{array}
\end{equation}
or, in matrix form,
\begin{equation}
\mathbf {Ax=b},
\end{equation}
where
\begin{equation}
\mathbf{A}=
\left[\begin{array}{cccc}
a_{11} & a_{12} & a_{13} & a_{14}\\
a_{21} & a_{22} & a_{23} & a_{24}\\
a_{31} & a_{32} & a_{33} & a_{34}\\
a_{41} & a_{42} & a_{43} & a_{44}\\
\end{array}
\right]
\end{equation}
is called the coefficient matrix,
\begin{equation}
\mathbf{x}= \left[\begin{array}{c}x_1\\x_2\\x_3\\x_4\\
\end{array}\right]
\end{equation}
is the unknown vector, and
\begin{equation}
\mathbf{b}=\left[\begin{array}{c}b_1\\b_2\\b_3\\b_4\\
\end{array}
\right]
\end{equation}
is the right hand side (i.e., the observations). Premultiplying both sides by $\mathbf{A}^{-1}$ yields
\begin{equation}
\mathbf{A}^{-1}\mathbf{Ax}=\mathbf{A}^{-1}\mathbf{b},
\end{equation}
hence
\begin{equation}
\mathbf{Ix=x=A}^{-1}\mathbf{b}
\end{equation}
gives the solution for values of $x_1, x_2, x_3, x_4$ which solve the system. For simplicity, the following example
solves for two simultaneous equations only. Consider two equations in two unknowns (e.g., equations of lines
in the $x_1-x_2$ plane):
\begin{equation}
\begin{array}{c}
5x_1 + 7x_{2} = 19\\
3x_1 - 2 x_2 = -1
\end{array} .
\end{equation}
In matrix form this system translates to
\begin{equation}
\left[ \begin{array}{cc}
5 & 7\\
3 & -2
\end{array}
\right ]
\left[
\begin{array}{c}
x_1\\
x_2
\end{array}
\right ] =
\left[ \begin{array}{c}
19\\
-1
\end{array}
\right ]
\end{equation}
or
\begin{equation}
\mathbf{A \cdot x = b}.
\end{equation}
To solve this matrix equation we need the inverse of $\mathbf{A}$, which is simply
\begin{equation}
\mathbf{A}^{-1} = \frac{1}{-10 - 21} \quad
\left[ \begin{array}{cc}
-2 & -7\\
-3 & 5
\end{array}\right ] = \left[ \begin{array}{cc}
\frac{2}{31} & \frac{7}{31}\\[4pt]
\frac{3}{31} & \frac{-5}{31}
\end{array} \right ] .
\end{equation}
Then, $\mathbf{x = A}^{-1}\cdot \mathbf{b}$, where
\begin{equation}
\mathbf{x = A}^{-1}\mathbf{b} = \left[ \begin{array}{cc}
\frac{2}{31} & \frac{7}{31}\\[4pt]
\frac{3}{31} & \frac{-5}{31}
\end{array}
\right]
\left[ \begin{array}{c}
19\\
-1
\end{array} \right ] = \left [ \begin{array}{c}
\frac{38}{31} - \frac{7}{31} \\[4pt]
\frac{57}{31} + \frac{5}{31}
\end{array} \right ]
= \left[ \begin{array}{c}
1 \\
2 \end{array} \right ] .
\end{equation}
So, the values $x_1 = 1$ and $x_2 = 2$ solve the above system, or
\begin{equation}
\mathbf{x} = \left[ \begin{array}{c}
x_1\\ x_2 \end{array} \right]
=
\left[ \begin{array}{c}
1\\ 2 \end{array}
\right ] .
\end{equation}
While this approach may seem burdensome, it is good because it is extremely general and
allows for a straightforward solution to very large systems. However, it is
true that direct (elimination) methods to the solution are in fact quicker for fully populated
matrices:
\begin{enumerate}
\item A solution using the inverse matrix approach involves $n^3$ multiplications for the inversion and $n^2k$
more multiplications to finish the solution, where $n$ is the number of equations per set, and $k$
is the number of sets of equations (each of the same form but different $\mathbf{b}$ vector). The total
number of multiplications is $n^3 + n^2k$.
\item A solution by directly solving the linear equations involves $n^3/3 + n^2k$ multiplications.
\end{enumerate}
Hence, while the matrix form is easy to handle, one should not necessarily always use it blindly. We
will consider many situations for which matrix solutions are ideal. For sparse or symmetrical
matrices, the above relationships may not hold.
\index{Solution of simultaneous linear equations|)}
\subsection{Simple regression and curve fitting}
\index{Simple regression|(}
\index{Regression!simple|(}
\index{Curve fitting|(}
\PSfig[h]{Fig1_L2_error}{Graphical representation of the regression errors used in least-squares procedures.
We measure misfit vertically in the $d$-direction from data point to regression curve.}
Whereas an interpolant fits each data point exactly, it is frequently advantageous to produce a
smoothed fit to the data --- not exactly fitting each point, but producing a ``best'' fit. A popular (and
convenient) method for producing such fits is known as the \emph{method of least squares}.
\index{Method of least squares}
\index{Least squares method}
The method of least squares produces a fit of a specified (usually continuous) basis to a set of
data points which minimizes the sum of the squared misfit (error) between the fitted curve
and the data. The misfit can be measured vertically, as in Figure~\ref{fig:Fig1_L2_error}.
\index{Regression}
This \emph{regression} of $y$ on $x$ is the most commonly used method. Less common methods (i.e., more work
involved) is the regression of $x$ on $y$ and even orthogonal regression (which we will return to later;
see Figure~\ref{fig:Fig1_y_and_ortho_error}).
\PSfig[h]{Fig1_y_and_ortho_error}{Two other regression methods: regressing $x$ on $d$ and orthogonal regression.
Here we measure misfits horizontally from data point to regression line or orthogonally onto the regression line,
respectively.}
Consider fitting a single ``best'' linear curve to $n$ data points. This can be a scatter plot of $x(t)$, $d(t)$
plotted at similar values of $t$, or a simple $d = f(x)$ relationship. At any rate, $d$ (our data) are considered a
function of $x$ (which may be a spatial coordinate or time). We wish to fit a line of the form
\begin{equation}
d(x) = m_1 + m_2 (x-x_0)
\end{equation}
and must therefore determine values for the model coefficients $m_1$ and $m_2$ that produce a line that minimizes the sum
of the squared misfits (here, $x_0$ is a constant specified beforehand). In other words,
\begin{equation}
\mbox{minimize } \sum ^n _{i=1} \left [(d_{\mbox{computed}}(x_i) - d_{\mbox{observed}}(x_i) \right ]^2.
\end{equation}
Ideally, for each observation $d_i$ at location $x_i$ we should have
\begin{equation}
\begin{array}{c}
m_1 + m_2(x_1 - x_0) = d_1\\
m_1 + m_2(x_2 - x_0) = d_2\\
m_1 + m_2(x_3 - x_0) = d_3\\
\vdots\\
m_1 + m_2 (x_n - x_0) = d_n
\end{array}
\end{equation}
There are many more equations ($n$ --- one for each observed value of $d$) than unknowns (2 --- $m_1$ and
$m_2$). Such a system is \emph{overdetermined} and there exists no unique solution (unless all the $d_i$'s do
lie exactly on a single line, in which case any two equations will uniquely determine $m_1$ and $m_2$).
In matrix form,
\index{Overdetermined system of equations}
\begin{equation}
\left[
\begin{array}{cc}
1 & (x_1 - x_0) \\
1 & (x_2 - x_0) \\
\vdots & \vdots \\
1 & (x_n - x_0)
\end{array} \right]
\ \left [ \begin{array}{c}
m_1\\
m_2
\end{array} \right ] =
\left [ \begin{array}{c}
d_1\\
d_2\\
\vdots\\
d_n