-
Notifications
You must be signed in to change notification settings - Fork 1
/
DA1_Chap8.tex
executable file
·2039 lines (1931 loc) · 119 KB
/
DA1_Chap8.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% DA1_Chap5.tex
%
\chapter{SPECTRAL ANALYSIS}
\label{ch:spectralanallysis}
\epigraph{``Science is spectral analysis. Art is light synthesis.''}{\textit{Karl Kraus, Writer}}
In Chapter~\ref{ch:sequences} we were preoccupied with the topic of time-series analysis in the time-domain
and learned a few things about the autocorrelation and cross-correlation techniques.
In this chapter, we will take the different perspective of studying the \emph{periodicities} present in a time
series.
\index{Frequency content}
At the heart of spectral analysis lies the notion of a signal's \emph{frequency content}. This concept is
utilized to decompose an observed signal into simpler components of known shape. Because many real
observations in fact contain periodic components that fluctuate in a predictable way (e.g., yearly, monthly,
daily), it is desirable to use periodic functions as the basic building blocks of the time-series.
The most obvious choices are the trigonometric functions \emph{sine} and \emph{cosine}.
\index{Fourier!series}
The use of sines and cosines to approximate data and functions goes back to the early 1700s
but was given mathematical rigor and extensive treatment by Joseph Fourier\index{Fourier, J.} late in the 18th century.
Fourier proved that any continuous, single-valued function could be represented by a series of
sinusoids --- today we know such series by the name \emph{Fourier series}. Thus, spectral analysis
involves finding the components of the Fourier series and interpreting the frequency content
represented by the series. Spectral analysis also goes by other names, such as frequency analysis and
harmonic analysis. Before we get into the details we must review some terminology and basic
trigonometry.
\section{Basic Terminology}
\PSfig[h]{Fig1_sincos}{The periodic functions cosine and sine are defined as the $x$ and $y$-components of
the counter-clockwise spinning unit vector $r$ as a function of the rotation angle $\phi$. a)
Spinning vector at a specific time $t_0$, yielding an angle $\phi_0$ and the corresponding $x$ and $y$
values indicated by the dashed lines. b) Over time, these components trace the sine and cosine functions.}
Consider a unit vector that rotates counterclockwise (Figure~\ref{fig:Fig1_sincos}). The time it takes to
complete one cycle is
called the \emph{period}, $T$. The $y$ and $x$ coordinates are then periodic functions of $t$ and are given by the
sine and the cosine, respectively. The \emph{radial frequency}, $f$, of the signal is the number of complete
revolutions per second. Hence,
\index{Period}
\index{Sinusoid!period}
\index{Sinusoid!frequency}
\index{Sinusoid!wavelength}
\index{Wavelength}
\index{Frequency!radial}
\begin{equation}
x(t) = \cos (2 \pi ft) = \cos (\omega t) \quad y(t) = \sin (2\pi ft) = \sin (\omega t)
\end{equation}
The period $T = 1/f$ has units of seconds per cycle. Instead of radial frequency we may use the \emph{angular
frequency}, $\omega = 2 \pi f$, which has units of radians/sec. For spatial data, the period $T$ corresponds to
the \emph{wavelength}, $\lambda$, and the angular frequency is referred to as the \emph{wavenumber}, $k = 2\pi/\lambda$.
The \emph{amplitude}, \index{Wavenumber}\index{Frequency!angular}\index{Sinusoid!amplitude}$A$,
of the signal is the length of the radial vector, $r$.
Instead of requiring that the sine curve go
through zero at an even number of $\pi$, we can shift it horizontally by subtracting a constant $\phi$
from the argument.
\PSfig[H]{Fig1_phase}{A phase-shifted sine curve (dotted line) is shifted along the $t$-axis.}
The constant $\phi$ is called the \emph{phase} of the signal (Figure~\ref{fig:Fig1_phase}).
It is clear that the cosine and sine are out of phase
by 90$^\circ$. Let us assume that a particular time-series has one single periodic component with angular
frequency $\omega$. An example of such a series is shown in Figure~\ref{fig:Fig1_onecos},
where we would need to find both $A$ and $\phi$. Unfortunately, while this model is linear in $A$ it is \emph{nonlinear}
in $\phi$. However, using the trigonometric identity for the cosine of a difference between two angles we find
\index{Phase}
\index{Sinusoid!phase}
\PSfig[H]{Fig1_onecos}{Sinusoid with arbitrary phase can be considered a sum of a sine and a cosine, both with zero phase.}
\noindent
\begin{equation}
d(t) = A\cos (\omega t -\phi ) = A [ \cos \phi \cos \omega t + \sin \phi \sin \omega t ] = a \cos \omega t + b \sin \omega t,
\label{eq:sinusoid}
\end{equation}
which \emph{is} linear in both $a$ and $b$. Thus, instead of finding one amplitude and a phase we instead find the two
amplitudes $a$ and $b$ for a cosine and sine pair, respectively, each with \emph{no} phase shift. We may then easily recover the original
parameters $A$ and $\phi$ using
\begin{equation}
\index{Sinusoid!amplitude}
\index{Amplitude!sinusoid}
\index{Sinusoid!phase}
\index{Phase!sinusoid}
A = \sqrt{a^2 + b^2 }, \quad \phi = \tan^{-1} b/a.
\end{equation}
More often than not, the observed signal will contain many different sinusoids of different
periods, phases, and amplitudes. We can use the subscript $j$ to indicate the $j$'th component of the
series. Perhaps a complete Fourier series for a signal $y(t)$ could therefore be written
\begin{equation}
d(t) = \sum^\infty_{j=0} a_{j} \cos \omega_{j}t + b_{j} \sin \omega_{j}t,
\end{equation}
where $\omega_j$ represent the various angular frequency components? The $a_j$ and $b_j$ coefficients could then be found
by standard least squares techniques. However, if we try to solve this system for many
components we quickly run into computational problems. To avoid this problem we must look
at \emph{harmonics}.
\subsection{Harmonics}
\index{Harmonics}
\index{Fundamental frequency}
\index{Frequency!fundamental}
\PSfig[h]{Fig1_fundamental}{The fundamental frequency (solid line) has period $T$ and typically represents the length of our data. Also shown
is the second harmonic (dashed line).}
The \emph{fundamental} frequency of a signal has period $T$ (scaled to $2\pi$; see Figure~\ref{fig:Fig1_fundamental})
and reproduces a full cycle corresponding to the length of the data signal. Consequently,
\index{First harmonic}
\index{Second harmonic}
\begin{equation}
f_{F} = 1/T, \quad \omega_{F} = 2 \pi f_{F} = 2 \pi/T.
\end{equation}
The first harmonic is the sinusoid which makes two complete oscillations over the period $T$. However,
because $f_1 = 2f_F,$ this first harmonic sinusoid is usually called the \emph{second harmonic} (though
some people still refer to it as the first harmonic). Using that notation,
\begin{equation}
\begin{array}{ll}
f_1 = f_F = 1/T & \omega_{1} = 2\pi f_F = 2\pi/T\\
f_2 = 2f_F = 2/T & \omega_{2} = 4\pi f_F = 4\pi/T\\
\vdots & \vdots \\
f_n = nf_F = n/T & \omega_{n} = 2n\pi f_F= 2n\pi/T\\
\end{array}
\end{equation}
Superposition of harmonics will always produce a new periodic function with period $T$.
\subsection{Beats}
Nearby frequency components can interact in interesting ways. Consider
\begin{equation}
d(t) = \cos \omega_1t + \cos \omega_2 t,
\end{equation}
where
\begin{equation}
\omega_1 = \omega + \delta \omega \quad \omega_2 = \omega - \delta \omega,
\end{equation}
and $\delta \omega$ is small. Using trigonometric identities,
\begin{equation}
d(t) = 2 \cos (\delta \omega t)\cos(\omega t).
\end{equation}
\index{Amplitude!modulation (AM)}
\index{Beat (amplitude modulation)}
\index{Modulation!amplitude (AM)}
Thus, the component $\cos (\omega t)$ (with $T = 2 \pi/\omega$) has a slowly varying amplitude according to $\cos
( \delta \omega t)$, which is the \emph{modulation} function. This phenomenon is referred to as a \emph{beat} (Figure~\ref{fig:Fig1_AM}).
\PSfig[h]{Fig1_AM}{Graphical representation of a ``beat'' or amplitude modulation. Modulating the amplitude
of a constant frequency carrier wave is the basis for AM radio, while FM broadcasts use a constant amplitude
carrier wave and modulate the frequency instead.}
As $\delta \omega$ gets smaller, the period of the beat curve gets longer and the phenomenon becomes more
noticeable. In acoustics, the two frequencies $\omega _1$ and $\omega_ 2$ are often too high to hear but the beat is
within an audible range. \emph{Modulation} is the general phenomenon of a sinusoid with a varying
amplitude (of any functional form).
\index{Modulation!frequency (FM)}
\index{Frequency!modulation (FM)}
\index{AM radio signals}
\index{FM radio signals}
\section{Fitting the Fourier Series}
\label{sec:FFS}
Consider fitting a set of data, $d_i$, using a set of sine and cosines as the basis functions.
At each observation time, $t_i$, our model predictions would be
\begin{equation}
\hat{d_i} = m_0 + m_1 \sin 2 \pi t_i/T + m_2 \cos 2 \pi t_i/T + m_3 \sin 4 \pi t_i/T + m_4 \cos 4 \pi t_i/T + \dots,
\end{equation}
where $T = n \Delta t$ if $t_i$ are evenly spaced. We want to interpolate the values exactly, hence at our observations,
\begin{equation}
\begin{array}{c}
m_0 + m_1 \sin 2 \pi t_1/T + m_2 \cos 2 \pi t_1/T + m_3 \sin 4 \pi t_1/T + \dots = d_1\\
m_0 + m_1 \sin 2 \pi t_2/T + m_2 \cos 2 \pi t_2/T + m_3 \sin 4 \pi t_2/T + \dots = d_2\\
\vdots\\
m_0 + m_1 \sin 2 \pi t_n/T + m_2 \cos 2 \pi t_n/T + m_3 \sin 4 \pi t_n/T + \dots = d_n\\
\end{array},
\end{equation}
and this gives us $n$ equations in $n$ unknowns. Written as a matrix equation,
\begin{equation}
\left [ \begin{array}{ccccc}
1 & \sin 2 \pi t_1/T & \cos 2 \pi t_1/T & \sin 4 \pi t_1/T & \dots \\
\vdots & \vdots & \vdots & \vdots & \dots \\
1 & \sin 2 \pi t_n/T & \cos 2 \pi t_n/T & \sin 4 \pi t_n/T & \dots \\
\end{array} \right ]
\times
\left[ \begin{array}{c}
m_0 \\
\vdots \\
m_{n-1} \\
\end{array} \right]
=
\left[ \begin{array}{c}
d_1\\
\vdots\\
d_n \\
\end{array} \right] .
\end{equation}
This standard matrix equation, $\mathbf{G\cdot m = d}$, can now be solved for the $n$ coefficients of the sine and cosine
series (i.e., the Fourier series) in the usual (matrix) way (i.e., $\mathbf{m} = (\mathbf{G}^T\mathbf{G})^{-1}\mathbf{G}^T\mathbf{d}$).
However, because of orthogonality
relationships between harmonics, the coefficients can be found analytically (this was fist shown
by Lagrange in the 1800's using just the sine components over a range of $x$ from 0 to $\pi$).
First, we rewrite the Fourier series as
\begin{equation}
\hat{d_i} = a_0 + \sum^{\leq n/2}_{j=1} \left[ a_j \cos \frac{2 \pi jt_i}{n \Delta t}
+ b_j \sin \frac{2 \pi jt_i}{n \Delta t} \right],\quad i = 1,n,
\label{eq:thesummation}
\end{equation}
where $\leq n/2$ means the largest whole integer resulting from the division $n/2$.
Thus, the unknown vector $\mathbf{m}$ would now contain the renamed components
\begin{equation}
\mathbf{m}^T = \left[ a_0 \quad a_1 \quad \dots a_ {\leq n/2} \quad b_1 \quad b_2 \quad \dots \quad b_{\leq n/2} \right] .
\end{equation}
The number of data points $n$ may be any integer value, but we will see it makes a minor difference if $n$ is or is not divisible by two.
If $n$ is an even number, then for $j = n/2$ we find the two last terms in (\ref{eq:thesummation}) to be
\begin{equation}
a_{n/2} \cos \frac{2 \pi n t_i}{2n \Delta t} + b_{n/2} \sin \frac{2 \pi n t_i}{2n \Delta t} = a_{n/2} \cos \frac { \pi t_i}{ \Delta t} + b_{n/2} \sin \frac{ \pi t_i}{ \Delta t}.
\end{equation}
Since $t_i = (i-1) \Delta t$, where $i = 1, \cdots , n$, we have
\begin{equation}
a_{n/2} \cos (i-1) \pi + b_{n/2} \sin (i-1) \pi = \left[-1 \right]^{(i-1)} a_{n/2}.
\end{equation}
This is true, since $\sin (i-1) \pi$ = 0 for all $i$, hence the $b_{n/2}$ term drops out and we find
\begin{equation}
\mathbf{m}^T = \left[ a_0 \quad a_1 \quad \dots \quad a_{n/2} \quad b_1 \quad b_2 \quad \dots \quad b_{(n/2)-1} \right],
\end{equation}
which gives us $1 + (n/2) + (n/2) - 1 = n$ coefficients for $n$ unknowns, and thus a solvable system results.
On the other hand, if $n$ is \emph{odd} then for $j < (n/2) = (n-1)/2$, both the sine and cosine terms remain and we have
\begin{equation}
\mathbf{m}^T = \left[ a_0 \quad a_1 \quad \dots \quad a_{(n-1)/2} \quad b_1 \quad b_2 \quad \dots \quad b_{(n-1)/2} \right],
\end{equation}
which again yields $1 + (n-1)/2 + (n-1)/2 = n$ coefficients for $n$ unknowns. Since, for $j = 0$,
\begin{equation}
a_0 \cos 0 + b_0 \sin 0 = a_0,
\end{equation}
and thus there is never a $b_0$ term. Using the convention $a_0 = \frac{a_0}{2}$ and $a_{n/2} = \frac{a_{n/2}}{2}$,
the Fourier series may be written as
\begin{equation}
\hat{d_i} = \sum^{ \leq n/2}_{j=0} \left[a_j \cos \frac{2 \pi jt_i}{n \Delta t} + b_j \sin \frac {2 \pi j t_i}{n \Delta t} \right],\quad i = 1,n,\mbox{ with } a_0 = \frac{a_0}{2},\quad a_{n/2} = \frac{a_{n/2}}{2}
\end{equation}
(the $a_{0}/2$ and $a_{n/2}/2$ convention is only a convenience that will become obvious later). The frequency
\index{Fourier frequency}
\index{Frequency!Fourier}
\index{Fourier!frequency}
\begin{equation}
\omega_j = \frac{2 \pi j}{n \Delta t} = \frac{2 \pi j}{T}
\end{equation}
is called the $j$'th \emph{Fourier frequency}. Notice if $n$ is odd, then $j < n/2$, so $\omega_j < \pi/\Delta t$ and hence $\pi/\Delta t$ is \emph{not}
a Fourier frequency, otherwise it \emph{is} a
Fourier frequency and the sine term is zero. Also, the highest frequency $f_{n/2} = (n/2)/ (n \Delta t) = 1/(2 \Delta t)$ is called the \emph{Nyquist}
frequency, which we will return to later.
\index{Nyquist frequency}
\index{Frequency!Nyquist}
\index{Fourier!series orthogonality|(}
\index{Orthogonality!Fourier series|(}
Lagrange took advantage (in a brute force and laborious manner) of the following five
relationships of harmonic components (which are easily shown using the corresponding integral
relationships):
\begin{equation}
\sum^{n}_{i=1} \cos \omega_{j}t_{i} = \left\{ \begin{array}{cc} 0, & j \neq 0 \\ n, & j = 0\\
\end{array} \right.,
\label{eq:cosonly}
\end{equation}
\begin{equation}
\sum^{n}_{i=1}\sin \omega_{j}t_{i} = 0,
\end{equation}
\begin{equation}
\sum^{n}_{i=1} \cos \omega_{j}t_{i} \cos \omega_{k}t_{i} =
\left\{ \begin{array}{ll} n/2, & j=k\neq 0,n/2\\n, & j=k=0, n/2\\ 0& j \neq k\\ \end{array} \right.,
\label{eq:coscos}
\end{equation}
\begin{equation}
\sum^{n}_{i=1} \sin \omega_{j}t_{i} \cos \omega_{k}t_{i} = 0,
\label{eq:sincos}
\end{equation}
and
\begin{equation}
\sum^{n}_{i=1} \sin \omega_{j}t_{i} \sin \omega_{k}t_{i} =
\left\{ \begin{array}{ll} n/2, & j=k\\ 0& j \neq k\\ \end{array} \right. .
\label{eq:sinsin}
\end{equation}
For a proof, consider the integral corresponding to (\ref{eq:cosonly}):
\begin{equation}
\int^{T}_0 \cos \omega_{j} tdt = \frac{1}{\omega_{j}} \int^{T\omega_{j}}_{0} \cos udu = \frac{1}{\omega_{j}} \sin u \left |^{T \omega {j}}_0 = \frac{T}{2 \pi j} \right. \left( \sin 2 \pi j - \sin 0\right) = 0 \left( j \neq 0 \right).
\end{equation}
For $j = 0$,
\begin{equation}
\int^{T}_{0} \cos 0tdt = \int^{T}_{0}dt = T.
\end{equation}
It is also easy to visualize these relationships. For instance, see Figure~\ref{fig:Fig1_ortho} for the a graphical proof of (\ref{eq:sincos}).
\PSfig[h]{Fig1_ortho}{Orthogonality of two cosine harmonics drawn as solid ($\omega_1$) and dashed ($\omega_3$) lines,
with green circles and squares as the hypothetical discrete samples. The products in (\ref{eq:coscos}) are represented by the
product of the red (negative) and blue (positive) signed lengths. For this pair, you can see that for each red--blue
line there is another of opposite orientation, thus canceling each other and yielding a final sum of zero.}
\index{Discrete Fourier transform|(}
\index{Fourier!discrete transform|(}
Returning to the Fourier series, we have
\begin{equation}
\hat{d_i} = \sum^{ \leq n/2}_{j=0}\left[ a_{j} \cos \omega_{j}t_{i} + b_{j} \sin \omega_{j}t_{i}\right],\quad i = 1,n
\label{eq:fourierseries}
\end{equation}
and once again we want to minimize the misfit between data and model:
\begin{equation}
E=\sum^{n}_{i=1}e^2_i = \sum^{n}_{i=1}\left[ d_i - \hat{d_i}\right]^2 = \sum^{n}_{i=1}\left[ d_{i}-\sum^{\leq \frac{n}{2}}_{j=0}\left( a_{j} \cos \omega_{j}t_{i} + b_{j} \sin \omega_{j}t_{i}\right)\right]^2.
\end{equation}
Our unknowns are the $n$ coefficients $a_j$ and $b_j$. Taking the partial derivatives of $E$ with respect to those parameters gives
\begin{equation}
\frac{\partial E}{\partial a_{k}}=2 \sum^{n}_{i=1}\left[d_{i}-\sum^{ \leq n/2}_{j=0} \left(a_{j} \cos \omega_{j} t_{i} + b_{j} \sin \omega_{j} t_{i}\right) \right] \cos \omega_{k} t_{i} = 0, \quad k =0, \dots, \leq \frac{n}{2}
\end{equation}
\begin{equation}
\frac{\partial E}{\partial b_{k}} = 2 \sum^{n}_{i=1}\left[ d_{i}-\sum^{< n/2}_{j=1} \left( a_{j} \cos \omega_{j} t_{i} + b_{j} \sin \omega_{j} t_{i}\right) \right] \sin \omega_{k} t_{i} =0, \quad k = 1, \dots, < \frac{n}{2}.
\end{equation}
Eliminating the factor of 2 and rearranging, we obtain
\begin{equation}
\sum^{n}_{i=1}d_{i} \cos \omega_{k} t_{i} = \sum^{\leq n/2}_{j=0}\left[ a_{j}\sum^{n}_{i=1} \cos \omega_{j} t_{i} \cos \omega_{k} t_i + b_{j} \sum^{n}_{i=1} \sin \omega_{j} t_{i}\cos \omega_{k} t_i \right ],
\label{eq.thecosterm}
\end{equation}
\begin{equation}
\sum^{n}_{i=1}d_{i} \sin \omega_{k}t_{i} = \sum^{< n/2}_{j=1}\left[ a_{j}\sum^{n}_{i=1} \cos \omega_{j}t_{i} \sin \omega_{k}t_{i} + b_{j}\sum^{n}_{i=1} \sin \omega_{j}t_{i} \sin \omega_{k}t_{i} \right] .
\label{eq:dft_n2}
\end{equation}
The five orthogonality relationships can now be employed. For $k = 0$, (\ref{eq.thecosterm}) becomes
\begin{equation}
\sum^{n}_{i=1}d_{i}\cos \omega_{0}t_{i} = \sum^{\leq n/2}_{j=0}\left[a_{j} \sum^{n}_{i=1} \cos \omega_{j}t_{i} \cos \omega_{0} t_{i} +b_{j} \sum^{n}_{i=1} \sin \omega_{j}t_{i} \cos \omega_{0}t_{i} \right],
\end{equation}
and since $\omega_{0} = 0$, $\cos \omega _0 t_i$ is unity. Furthermore, we already determined that $b_0 = 0$, thus
\begin{equation}
\begin{array}{c}
\sum^{n}_{i=1}d_{i} = a_{0} \sum^{n}_{i=1} \cos \omega_{0}t_{i} + a_{1}\sum^{n}_{i=1} \cos \omega_{1} t_{i} + a_2 \sum^{n}_{i=1} \cos \omega _2 t_i \\[4pt]
+ \cdots + b_1 \sum^n_{i=1} \sin \omega _1 t_i + b_2 \sum ^n_{i=1} \sin \omega_2 t_i + \cdots .
\end{array}
\end{equation}
Using the relationships (\ref{eq:coscos}) and (\ref{eq:sincos}), we find
\begin{equation}
\sum^n_{i=1} d_i = a_0n + a_1 0+ a_2 0 + \cdots b_1 0 + b_2 0 + \cdots,
\end{equation}
and with our convention $a_0 = a_0/2$ we obtain
\begin{equation}
\sum^n_{i=1} d_i = \frac{n}{2}a_0 \quad \Rightarrow \quad a_0 = \frac{2}{n}
\sum^n_{i=1} d_i = 2 \bar{d},
\end{equation}
i.e., the first coefficient $a_0$ is simply twice the mean of the data (a consequence of our convention for $a_0$
that includes the factor of 1/2).
Using the same approach for $k = 1$, we first obtain
\begin{equation}
\begin{array}{c}
\displaystyle \sum^n_{i=1} d_i \cos \omega_1 t_i = a_0 \sum^n_{i=1} \cos \omega _0 t_i \cos \omega _1 t_i + a_1 \sum^n_{i=1} \cos \omega_1 t_i \cos \omega _1 t_i + a_2 \sum^n_{i=1} \cos \omega_2 t_i \cos
\omega _1 t_i + \cdots \\*[2ex]
+ b_1 \displaystyle \sum^n_{i=1} \sin \omega_1 t_i \cos \omega_1 t_i + b_2 \displaystyle \sum^n_{i=1} \sin \omega_2 t_i \cos \omega_1 t_i + \cdots.
\end{array}
\end{equation}
Using the orthogonality relationships, we find
\begin{equation}
\displaystyle \sum^n_{i=1} d_i \cos \omega_1 t_i = a_0 0 + a_1 \frac{n}{2} + a_2 0 + \cdots + b_1 0 + b_2 0 + \cdots .
\end{equation}
Therefore,
\begin{equation}
a_1 = \frac{2}{n} \displaystyle \sum^n_{i=1} d_i \cos \omega_1 t_i.
\label{eq:a1coeff}
\end{equation}
Since the orthogonality relationships are the same for all $k = 1, 2, \cdots < n/2$ as they were for $k = 1$ in
(\ref{eq:a1coeff}), we find
\begin{equation}
a_k = \frac{2}{n} \displaystyle \sum^n_{i=1} d_i \cos \omega_{k} t_i, \quad k=0, 1, \cdots, < \frac{n}{2}.
\end{equation}
\PSfig[h]{Fig1_FourierFit}{Example of a data set (thin line with connected dots) and two fitted Fourier components. Here,
we show the least-squares solution for the two harmonics $\omega_3$ (solid line) and $\omega_9$ (dashed line). When
these and all other harmonics are evaluated they will sum to equal the original data set.}
Finally, we need to look at the last case, $k = n/2$ (for even $n$). For $k = n/2$,
\begin{equation}
\begin{array}{c}
\displaystyle \sum^n_{i=1} d_i \cos \omega_{n/2} t_i = a_0 \sum^n_{i=1} \cos \omega_{0} t_i
\cos \omega_{n/2} t_i + a_1 \sum^n_{i=1}
\cos \omega_{1} t_i \cos \omega_{n/2} t_i + a_2
\sum^n_{i=1} \cos \omega_{2} t_i
\cos \omega_{n/2} t_i + \cdots \\*[2ex]
+ b_1 \displaystyle \sum^n_{i=1} \sin \omega_1 t_i \cos \omega _{n/2}t_i + b_2
\displaystyle \sum^n_{i=1} \sin \omega_2 t_i \cos \omega_{n/2} t_i + \cdots.
\end{array}
\end{equation}
Again, using the orthogonality relationships,
\begin{equation}
\displaystyle \sum^n_{i=1} d_i \cos \omega _{n/2} t_i = a_0 0 + a_1 0 + \cdots + na_{n/2} + \cdots + b_1 0 + b_2 0 + \cdots .
\end{equation}
Since we required $a_{n/2} = a_{n/2}/2$, we find
\begin{equation}
a_{n/2} = \frac{2}{n} \displaystyle \sum^n_{i=1} d_i \cos \omega_{n/2} t_i.
\end{equation}
Therefore, with the convention that $a_0 = a_0/2$ and $a_{n/2} = a_{n/2}/2$, and interchanging the dummy subscripts
$k$ and $j$, the $a_j$ can be defined as
\begin{equation}
\index{Discrete cosine transform}
\label{eq:DCT}
\boxed{a_j = \frac{2}{n} \displaystyle \sum^n_{i=1} d_i \cos \omega_j t_i, \quad 0 \leq j \leq \frac{n}{2}.}
\end{equation}
We now turn our attention to the other half of the normal equations (\ref{eq:dft_n2}) involving the $\sin \omega _j t_i$
terms. Previously, we have shown that $b_0 = b_{n/2} = 0$ so we will only need to look at one case $(k =
1)$ and generalize the result for all $k$. We find
\begin{equation}
\begin{array}{c}
\displaystyle \sum^n_{i=1} d_i \sin \omega_{1} t_i = a_0 \sum^n_{i=1} \cos \omega_{0} t_i
\sin \omega_{1} t_i + a_1 \sum^n_{i=1}
\cos \omega_{1} t_i \sin \omega_{1} t_i + a_2 \sum^n_{i=1} \cos \omega_{2} t_i
\sin \omega_{1} t_i + \cdots \\*[2ex]
+ b_1 \displaystyle \sum^n_{i=1} \sin \omega_1 t_i \sin \omega _{1} t_1 + b_2
\displaystyle \sum^n_{i=1} \sin \omega_2 t_i \sin \omega_{1} t_i + \cdots.
\end{array}
\end{equation}
Thus,
\begin{equation}
\displaystyle \sum^n_{i=1} d_i \sin \omega_1 t_i = a_0 0 + a_1 0 + \cdots + b_1 \frac{n}{2} + b_2 0 + \cdots,
\end{equation}
and
\begin{equation}
b_1 = \frac{2}{n} \sum^n_{i=1} d_i \sin \omega_1 t_i.
\end{equation}
Since the orthogonality relationships hold for all $k$, we again interchange $k$ and $j$ and find
\begin{equation}
\index{Discrete sine transform}
\label{eq:DST}
\boxed{b_j = \frac{2}{n} \sum^n_{i=1} d_i \sin \omega_j t_i, \quad 0 < j < \frac{n}{2}.}
\end{equation}
The formulae for $a_j$ and $b_j$ are called the \emph{Discrete Cosine and Sine Transforms} and combined they
define the \emph{Discrete Fourier Transform}. Figure~\ref{fig:Fig1_FourierFit} shows an
example of a data set and the determination of two Fourier components.
\index{Discrete sine transform}
\index{Discrete cosine transform}
\index{Discrete Fourier transform|)}
\index{Fourier!discrete transform|)}
\begin{example}
Consider the time-series $d = [ 1 \ 0 \ \ \mbox{-2} \ \ \ \mbox{-1} \ \ 1 \ 2 \ 1 \ 0.5 ]^T$ with
$\Delta t = 1, n = 8, T = 8$. The Fourier frequencies are therefore
\begin{equation}
\begin{array}{lll}
\omega_j = 2 \pi j /T, & \omega_0 = 0, & \omega_1 = \pi/4, \\
\omega_2 = \pi/2, & \omega_3 = 3 \pi/4, & \omega_4 = \pi.
\end{array}
\end{equation}
Solving for the coefficients, we find
\begin{equation} \begin{array}{lllll}
a_0 = 0.3125, & a_1 = -0.0884, & a_2 = 0.7508, & a_3 = 0.0804, & a_4 = -0.0625, \\
b_1 = -1.3687, & b_2 = 0.625, & b_3 = 0.1313.
\end{array}
\end{equation}
Thus, our Fourier series is
\begin{equation}
\begin{array}{lll}
d(t)& = & \displaystyle 0.3125 - 0.0884 \cos \frac{\pi}{4} t -1.3687 \sin \frac{\pi}{4} t + 0.7508 \cos \frac{\pi}{2} t\\\\[2pt]
& & + \displaystyle 0.625 \sin \frac{\pi}{2}t + 0.0804 \cos \frac{3 \pi}{4}t + 0.1313 \sin \frac{3 \pi}{4} t - 0.0625 \cos \pi t \end{array}.
\end{equation}
\end{example}
Note that all the terms have periods that are multiples of the fundamental frequency; hence the
Fourier representation must itself be periodic with the same fundamental period $T$. We will later
see that the assumption of periodic data may have grave consequences for our coefficients.
\subsection{The power of orthogonality}
Let us pause and lament the demise of our dear friend from Chapter~\ref{ch:matrix}, the design matrix $\mathbf{G}$.
What just happen to it in our analysis?
Well, recall equations (\ref{eq:gdotg}) and (\ref{eq:gdotd}), both overflowing with dot-products of the basis vectors $\mathbf{g}_j$.
Yet, with our choice of harmonics we found that these basis vectors were in fact orthogonal and thus their dot-products yielded
zero except along the matrix diagonal (where we got the simple constant $n/2$). With $\mathbf{G}$ being diagonal, the formidable $[\mathbf{G}^T\mathbf{G}]^{-1}$
collapsed to the identity matrix $\mathbf{I}$ scaled by $2/n$, as we just witnessed. Now \emph{that} is the power of
orthogonal functions (of which sine and cosine are just two possibilities) and why they are so widely used in data analysis as well as for modeling in the physical sciences.
\index{Fourier!series orthogonality|)}
\index{Orthogonality!Fourier series|)}
\section{The Periodogram}
\label{sec:periodogram}
\index{Periodogram|(}
\index{Spectrum|(}
We determined that the Fourier series expansion of our observed time-series $d_i$ could be
written
\begin{equation}
\hat{d_i} = \sum^{\leq n/2}_{j=0} [ a_j \cos \omega_j t_i + b_j \sin \omega_j t_i ].
\end{equation}
Remember that (\ref{eq:sinusoid}) started out by trying to fit a cosine of arbitrary amplitude $A_j$ and phase $\phi_j$,
but that we could rewrite this single term as a sum of a cosine and sine components with different amplitudes
and zero phases. We found
\begin{equation}
a_j = A_j \cos \phi_j, \quad b_j = A_j \sin \ \phi_j.
\end{equation}
From these expressions we readily find a component's full amplitude and phase. Dividing the
$b_j$ by $a_j$ gives
\begin{equation}
\tan \phi_j = b_j / a_j \quad \Rightarrow \quad \phi _j = \tan ^{-1} (b_j / a_j).
\end{equation}
Squaring $a_j$ and $b_j$ and adding them gives
\index{Power spectrum}
\begin{equation}
A^2_j = a^2_j + b^2_j.
\end{equation}
The \emph{periodogram} is constructed by plotting $A_j^2$ versus $j$, $f_j$, $\omega_j$, or $P_j$. While often called the
\emph{power spectrum}, it is strictly speaking a raw, discrete periodogram. The true spectrum is a
smoothed periodogram showing frequency components of statistical regularity. However, the
periodogram is the most common form of output of a Fourier transform. Figure~\ref{fig:Fig1_periodogram}
shows the periodogram for the function
\begin{equation}
d(t) = \frac{1}{2} \cos \omega_1 t + \frac{3}{4} \cos \omega_2 t + \frac{1}{2} \sin \omega_3 t
+ \frac{1}{4} \cos \omega_3 t + \frac{1}{3} \cos \omega_4 t + \frac{1}{5} \sin \omega_4 t + \frac{1}{3} \sin \omega_6 t - \frac{3}{5}.
\label{eq:periodogram}
\end{equation}
\PSfig[h]{Fig1_periodogram}{Raw periodogram of the function given in (\ref{eq:periodogram}). The peak corresponds to the
$A_0^2 = a_0^2$ term defined to be twice the mean (-0.6) squared.}
Let us look, for a moment, at the variance of the time series expansion. Recall, the variance is
given by
\begin{equation}
s^2 = \frac{1}{n-1} \sum^n _{i=1} (\hat{d_i} - \bar{d}) ^2.
\end{equation}
We shall write the Fourier series as
\begin{equation}
\hat{d_i} = \bar{d} + \sum_{j=1} ^{\leq \frac{n}{2}} \left (a_j \cos \omega _j t_i + b_j \sin \omega_j t_i \right ),
\end{equation}
by pulling the constant (mean) term out separately. Since the two means cancel, we find
\begin{equation}
s^2 = \frac{1}{n-1} \sum^n_{i=1} \left \{ \left [ \sum_{j=1} ^{\leq \frac{n}{2}} \left (a_j \cos \omega_{j} t_i + b_j \sin \omega_{j} t_i \right ) \right ] \left [ \sum_{q=1} ^{\leq \frac{n}{2}} \left (a_q \cos \omega_{q} t_i + b_q \sin \omega_{q} t_i \right ) \right ] \right \}.
\end{equation}
Also recall that, because of orthogonality, all the cross terms $(q \neq j)$ resulting from the full expansion
of the two squared expressions will be zero when summed over $i$, while the remaining terms will sum to $n/2$ (since $j,q > 0$).
Hence, we are left with
\begin{equation}
s^2 = \frac{n}{2(n-1)} \sum_{j=1} ^{\leq \frac{n}{2}} (a^2_j + b^2_j) \sim \frac{1}{2}\sum_{j=1} ^{\leq \frac{n}{2}} A^2_j.
\end{equation}
Therefore, the power spectrum (periodogram) of $(a_j^2 + b_j^2)$ versus $\omega_j$ is a plot showing the
contribution of individual frequency components to the total variance of the signal. For this reason,
the power spectrum is often called the variance spectrum. However, most of the time it is simply called ``the
spectrum.'' Hence, the Fourier transform converts a signal from the time domain to the frequency
domain (or wavenumber domain), where the signal can be viewed in terms of the contribution of
the different frequency components of which it is made. The phase spectrum ($\phi_j$ versus $\omega_j$)
shows the relative phase of each frequency component. In general, phase spectra are more difficult
to interpret than amplitude (or power) spectra.
\subsection{Aliasing of higher frequencies}
\index{Aliasing}
\index{Nyquist frequency}
\index{Frequency!Nyquist}
We mentioned before that the highest frequency (or shortest period, or wavelength) that can be
estimated from the data is called the Nyquist frequency (or period, or wavelength), given by
\begin{equation}
f_N = f_{n/2} = \frac{1}{2\Delta t}, \quad \omega_N = 2\pi f_N = \frac{\pi}{\Delta t}\quad P_{n/2} = 2 \Delta t.
\end{equation}
Higher frequencies, whose wavelengths are less than twice the spacing between sample points
\emph{cannot be detected}. However, when we sample a signal every $\Delta t$ and the original signal has
higher frequencies than $f_{n/2}$, we introduce \emph{aliasing}. Aliasing means that some frequencies will
leak power into other frequencies. This concept is readily seen by sampling a high-frequency
signal at a spacing larger than the Nyquist interval.
\PSfig[h]{Fig1_aliasing}{Aliasing: A short-wavelength signal that is not sampled at the Nyquist frequency
or higher will instead appear as a longer-wavelength component that does not exist in the actual data.}
Sampling of the high-frequency signal actually results in a longer-period signal (Figure~\ref{fig:Fig1_aliasing}).
When Clint Eastwood's wagon wheels seem to spin backwards in an old Western movie --- that's aliasing: The 24
pictures/sec rate is simply too slow to capture the faster rotation of the wheels.
\subsection{Significance of a spectral peak}
\index{Test!spectral peak}
In some applications we may be interested in testing whether a particular
component is dominant or if its larger amplitude is due to chance. The statistician R. A. Fisher\index{Fisher, R. A.} devised a test that
calculates the probability that a spectral peak $s_j^2$ will exceed the value $\sigma_j^2$ of a hypothetical time series
composed of independent random points. We must evaluate the ratio of the variance contributed by the
maximum peak to the entire data variance:
\begin{equation}
g = \frac{s^2 _j}{2s^2},
\label{eq:computed_g}
\end{equation}
where $s^2_j$ is the largest peak in the periodogram (we divide by two to get its variance contribution)
and $s^2$ is the variance of the entire series. For
a prescribed confidence level, $\alpha$, the critical value that we wish to compare to our observed $g$ is
\begin{equation}
g_{\alpha,k} \approx 1 - \exp \left( \frac{\ln \alpha - \ln k}{k-1} \right ),
\label{eq:critical_g}
\end{equation}
with $k = n/2$ (for even $n$) or $k = (n-1)/2$ (for odd $n$). Should our observed $g$ (obtained via \ref{eq:computed_g}) exceed this
critical value we decide that the dominant component is real and reflects a true
characteristic of the phenomenon we are observing. Otherwise, $s^2_j$ may be large simply by
chance.
\subsection{Estimating the continuous spectrum}
The power spectrum or periodogram obtained from the Fourier coefficients is discrete, yet
we do not expect the power at frequency $\omega_j$ to equal the underlying continuous $P(\omega)$ at exactly
$\omega_j$, since the discrete spectrum must necessarily represent some average value of power at all frequencies between $\omega_{j-1}$ and
$\omega_{j+1}$. In other words, the computed power at $\omega_j$ also represents the power from nearby frequencies
not among the chosen harmonic frequencies $\omega_j$. Furthermore, the uncertainty in any individual
estimate $p^2_j$ is very large; in fact, it is equal to $\pm p^2_j$ itself.
Can we improve (i.e., reduce) the uncertainties in $p^2_j$ by using more data points or sample the
data more frequently? The unpleasant answer is that the periodogram estimates do not become
more accurate at all! The reason for this is that adding more points simply produces power
estimates at a greater number of frequencies $\omega_j$. The only way to reduce the uncertainty in the
power estimates is to smooth the periodogram over nearby discrete frequencies. This can be
achieved in one of two ways:
\begin{enumerate}
\item Use a time-series that is $M$ times longer (so $f_1' = f_1/M$) and \emph{sum} the $M$ power estimates $p^2_k$
straddling each original $\omega_j$ frequency to obtain a smooth estimate $p^2_j = \sum p^2_k$.
\item Split the original data into $M$ smaller series, find the $p^2_j$ for each series, and take the \emph{mean} of
the $M$ estimates for the same $j$ (i.e., the same frequency).
\end{enumerate}
\index{Windowing}
In both cases the variance of the power spectrum estimates drop by a factor of $M$, i.e., $s^2_j = p^2_j/M$.
The exact way the smoothing is achieved may vary among analysts. Several different
types of weights or spectral \emph{windows} have been proposed, but they are all relatively similar. These windows
arose because, historically, the power spectrum was estimated by taking the Fourier transform of
the \emph{autocorrelation} of the data; hence many windows operated in the lag-domain. The
introduction of the Fast Fourier Transform made the FFT the fastest way to obtain the spectrum,
which then is simply smoothed over nearby frequencies. The FFT is a very rapid algorithm for
doing a discrete Fourier transform, especially if $n$ is a power of 2. It can be shown that one can
always split the discrete transform into the sum of two discrete, scaled transforms of subsets of the data.
Applying this result recursively, we eventually end up with a sum of transforms of data sets
with one entry, whose transform equals itself. While mathematically equivalent, there is a huge
difference computationally: While the discrete Fourier transform's execution time is proportional to $n^2$,
the FFT only takes $n\cdot \log(n)$. For a data set of $10^6$ points, the speed-up is a factor of $> 75,000$.
By doing a Fourier Analysis, we have transformed our data from one domain (time or space)
to another (frequency or wavenumber). A physical analogy is the transformation of light sent
through a triangular prism. White light is composed of many frequencies, and the prism acts as a
frequency analyzer that separates the various frequency components, here represented by colors. Each color band
is separated from its neighbor by an amount proportional to their difference in wavelength, and
the intensity of each band reflects the amplitude of that component in the white light. We know
that by examining the spectrum we can learn much about the composition and temperature of the
source and the material the light passed through. Similarly, examining the power spectra of
other processes may tell us something about them that may not be apparent in the time domain.
Consequently, spectral analysis remains one of the most powerful techniques we have for examining
temporal or spatial sequences.
\subsection{First-Order Spectrum Interpretation}
\label{sec:firstorderspectrum}
\PSfig[h]{Fig1_spectratypes}{Simplified representations of typical spectra that are called ``white'' (left; equal power at all frequencies),
``red'' (middle; power falling off with increasing frequency), and ``blue'' (right; power increasing with frequency).}
Per Section~\ref{sec:periodogram}, the raw power spectrum, or \emph{periodogram}, is obtained by plotting the squared amplitude $A_j^2$ versus
frequency. Often, a spectrum will fall into one of three categories (see Figure~\ref{fig:Fig1_spectratypes}):
\begin{description}
\item [white:] This is a spectrum that shows little or no amplitude variation with frequency. Random values
such as independent samples drawn from a normal distribution will have a white spectrum.\index{White spectrum}\index{Spectrum!white}
\item [red:] This spectrum is dominated by long-wavelength (low-frequency) signals, with the spectrum tapering
off for higher frequencies. This is very common behavior in observed data, such as topography and potential fields (gravity, magnetics).
It may also be indicative of data that represent an integrated phenomenon.\index{Red spectrum}\index{Spectrum!red}
\item [blue:] This spectrum is dominated by short-wavelength (high-frequency) signal, with the spectrum tapering
off for lower frequencies. Data that depend on derivatives, such as slopes and curvatures, might behave this way, being higher-order
derivatives of a red-spectrum topography signal.\index{Blue spectrum}\index{Spectrum!blue}
\end{description}
One reason for the prevalence of red or blue spectra for natural phenomena can be understood if we consider what effect a temporal derivative (e.g., $d/dt$) has in the frequency domain. Given that the Fourier series representation of data can be written
\begin{equation}
\hat{d}(t) = \sum_{j = 0}^{\leq n/2} A_j \cos \left (\omega_j t - \phi_j \right ),
\end{equation}
taking the derivative yields
\begin{equation}
\frac{d}{dt}\hat{d}(t) = \sum_{j = 0}^{\leq n/2} -\omega_j \cdot A_j \sin \left (\omega_j t - \phi_j \right ),
\end{equation}
In effect, we multiply each Fourier amplitude by its corresponding frequency, hence amplitudes at higher frequencies are preferentially enhanced while those at lower frequencies are attenuated. This scaling
will make the spectrum more ``blue''. By analogy, integration in the temporal domain has the effect of \emph{dividing} the spectrum by the frequency, conversely
``reddening'' the spectrum. This frequency effect is what we allude to when we say that taking a derivative typically make data noisier (it amplifies
the short-wavelength or high-frequency components in the data) while integration tends to make data smoother by attenuating the same
components. Of course, these statements assume that the uncertainties in the data are mostly at high frequencies, but some data have more
uncertainty at low frequencies, in which case the situation is reversed.
These simple considerations may be useful when interpreting your observed spectra. Finally, note that the derivative
also introduces a phase change of $\pi/2$ (90\DS) since the cosine and the negative sine are shifted by 90\DS. A second multiplication (i.e., for a second-derivative result)
leads to a 180\DS\ change in phase since we are essentially multiplying by $-1$ (this does not affect power, which is proportional to amplitude squared).
\index{Periodogram|)}
\index{Spectrum|)}
\section{Convolution}
\index{Convolution|(}
\emph{Convolution} represents one of the most fundamental operations of time series analysis and
is one of the most physically meaningful. Consider the passage of a signal through a linear filter,
where the filter (a ``black box'') will modify a signal passing through it
(Figure~\ref{fig:Fig1_blackbox}). For instance, it may
\begin{enumerate}
\item Amplify, attenuate or delay the signal.
\item Modify or eliminate specific frequency components.
\end{enumerate}
\PSfig[H]{Fig1_blackbox}{Example of convolution between an input signal and a filter.}
Consider the propagation of a seismic pulse through the upper layers of the Earth's crust, as illustrated
in Figure~\ref{fig:Fig1_earthfilter}. The generated pulse may be sharp and thus have high
frequencies, yet the recorded signal that traveled through the crust may be much smoother and
include repeating signals that reflect internal boundaries.
\PSfig[H]{Fig1_earthfilter}{Convolving a seismic pulse with the Earth gives a seismic trace that may
reflect changing properties of the Earth with depth.}
Convolution is this process of linearly modifying one signal using another signal. In Figure~\ref{fig:Fig1_earthfilter} we
convolved the seismic pulse with the ``Earth filter'' to produce the observed returned seismogram.
Symbolically, we write the convolution of a signal $d(t)$ by a filter $p(t)$ as the integral
\index{Deconvolution}
\index{Inverse filtering}
\index{Filtering!inverse}
\begin{equation}
h(t) = d(t) * p(t) = \int_{-\infty}^{+\infty} d(u) \cdot p(t-u) du,
\label{eq:convolution}
\end{equation}
where $*$ represents the convolution operator.
\emph{Deconvolution}, or \emph{inverse filtering}, is the process of unscrambling the convolved signal to
determine the nature of the filter \emph{or} the nature of the input signal. Consider these two cases:
\begin{enumerate}
\item If we knew the exact shape of our seismic pulse $d(t)$ and seismic signal received, $h(t)$, we could
deconvolve the data with the pulse to determine the (filtering) properties of the upper layers of the Earth through
which the pulse passed (i.e., $p(t) = d^{-1}(t) * h(t)$).
\item If we wanted to determine the exact shape of our pulse $d(t)$, we could pass it through a known
filter $p(t)$ and deconvolve the output with the shape of the filter (i.e., $d(t) = p^{-1}(t) * h(t)$).
\end{enumerate}
The hard work here is to determine the inverse functions $d^{-1}(t)$ or $p^{-1}(t)$, which is akin to matrix inversion.
Other examples of convolution include:
\begin{enumerate}
\item Filtering data --- using running means, weighted means, removing specific frequency components,
etc.
\item Recording a phenomenon with an instrument that responds slower than the rate at which the
phenomenon changes, or which produces a weighted mean over a narrow interval of time,
or which has lower resolving power than the phenomenon requires.
\item Conduction and convection of heat.
\item Deformation and the resulting gravity anomalies caused by the flexural response of the lithosphere
to a volcano.
\end{enumerate}
Convolution is most easily understood by examining its effect on discrete functions.
First, consider the discrete impulse $d(t)$ sent through the filter $p(t)$, as illustrated in Figure~\ref{fig:Fig1_conv1}:
\PSfig[H]{Fig1_conv1}{A filter's impulse response is obtained by sending an impulse $d(t)$ through the filter $p(t)$.}
\noindent
The output $h(t)$ from the filter is
known as the \emph{impulse response function} since it represents the response of the filter to an
impulse, $d(t)$. It represents a fundamental property of the filter $p(t)$.
\index{Impulse response function}
Next, consider a more complicated input signal convolved with the filter, as shown in Figure~\ref{fig:Fig1_conv2}:
\PSfig[H]{Fig1_conv2}{Filtering seen as a convolution.}
\noindent
Since the filter is linear, we may think of the input as a series of individual impulses. The output
is thus the sum of several impulse responses scaled by their amplitudes and shifted in
time. Calculating convolutions is a lot like calculating cross-correlations, except
that the second time-series must be reversed. Consider the two signals as finite sequences on separate strips of
paper (Figure~\ref{fig:Fig1_conv3}).
\PSfig[H]{Fig1_conv3}{Graphical representation of a convolution. We write the discrete values of $d(t)$ and $p(t)$ on two separate strips of paper.}
\noindent
We obtain the zero lag output by aligning the paper strips as shown in Figure~\ref{fig:Fig1_conv4},
after reversing the red strip.
\PSfig[H]{Fig1_conv4}{Convolution, zero lag. Reverse one strip and arrange them to yield a single overlap.}
\noindent
The zero lag result $h_0$ is thus simply $d_0 \cdot p_0$. Moving on, the first lag results from the alignment shown in Figure~\ref{fig:Fig1_conv5}.
\PSfig[H]{Fig1_conv5}{Convolution, first lag. We shift one strip by one to increase the overlap.}
\noindent
This simple process is repeated, and for each lag $k$ we evaluate $h_k$ as the sum of the products of the overlapping
signal values. This is a graphic (or mechanical) representation of the discrete convolution
equation (compare this operation to the integral in \ref{eq:convolution}).
Consider the convolution of the two functions shown in Figure~\ref{fig:Fig1_conv6}.
\PSfig[H]{Fig1_conv6}{Moving averages is obtained by the convolution of data with a rectangular function of unit area.}
\noindent
If we look at this convolution with the moving strips of paper approach, we get the setup illustrated in Figure~\ref{fig:Fig1_conv7}:
\PSfig[h]{Fig1_conv7}{The mechanics of convolutions, this time without the paper strips.}
\noindent
Given the simple nature of $p(t)$, we can estimate the values of $h_k$ directly:
\begin{equation}
\begin{array}{rcl}
h_0 & = & d_{0}/5 \\[4pt]
h_1 & = & \frac{1}{5} (d_0 + d_1)\\
& & \vdots \\
h_4 & = & \frac{1}{5} ( d_0 + d_1 + d_2 + d_3 + d_4)\\[4pt]
h_5 & = & \frac{1}{5} ( d_1 + d_2 + d_3 + d_4 + d_5)\\
& & \vdots \\
h_{18}& = & d_{14}/5 \end{array}
\end{equation}
\PSfig[h]{Fig1_conv8}{The final result of the convolution is a smoothed data set since any short-wavelength signal
will be greatly attenuated.}
\noindent
This is simply a five-point running (or moving) average of $d(t)$, and the result is shown in Figure~\ref{fig:Fig1_conv8}.
An $n$-point average would be the
result if $p(t)$ consisted of $n$ points, each with a value of $1/n$.
\subsection{Convolution theorem}
\index{Convolution theorem}
Although not shown here, it can be proven that a convolution of two functions $p(t)$ and $d(t)$
in the time-domain is equivalent to the product of $P(f)$ and $D(f)$ in the frequency domain
(here, uppercase letters indicate the Fourier transforms of the lowercase, time-domain functions).
The converse is also true, thus
\begin{equation}
\begin{array}{rcl}
p(t) * d(t) & = & h(t) \quad \leftrightarrow \quad P(f) \cdot D (f) = H(f),\\[4pt]
p(t) \cdot d(t) & = & z(t) \quad \leftrightarrow \quad P(f) * D (f) = Z(f).\\
\end{array}
\end{equation}
Because convolution is a slow calculation it is often advantageous to transform our data from one
domain to the other, perform the simpler multiplication, and transform the data back to the original
domain. The availability of \emph{fast Fourier transforms} (FFTs) makes this approach practical.
\index{Fast Fourier transform (FFT)}
\index{FFT (Fast Fourier transform)}
\section{Sampling Theory}
\index{Sampling!theory|(}
\index{Sampling!theorem}
\index{Sampling!theorem}
\index{Band-limited}
The \emph{sampling theorem} states that if a function is \emph{band-limited} (i.e., the transform is zero for all
radial frequencies $f > f_N$), then the continuous function $d(t)$ can be uniquely determined from knowledge
of its sampled values given a sampling interval $\Delta t \leq 1/(2 f_N)$. From distribution theory, we have
\begin{equation}
d_t = \sum^{+\infty} _{j= - \infty} d(t) \delta (t-j \Delta t) = \sum^\infty _{j= - \infty}
d(j \Delta t) \delta ( t - j \Delta t) = d(t) \cdot \Delta (t),
\end{equation}
where
\index{Sampling!function}\index{Comb function}
\begin{equation}
\Delta (t) = \sum^{+\infty }_{j= - \infty} \delta ( t - j \Delta t)
\end{equation}
is the sampling or ``comb'' function in the time domain (Figure~\ref{fig:Fig1_sampl1}).
Thus, $d_t$ is the continuous function $d(t)$ sampled at the discrete times $j\Delta t$.
Consequently, it is true that the original signal $d(t)$ can be reconstructed exactly from
its sampled values $d_t$ via the \emph{Whittaker-Shannon} interpolation formula\index{Whittaker-Shannon interpolation}\index{Interpolation!Whittaker-Shannon}
\begin{equation}
d(t) = \sum_{j=-\infty}^{+\infty} d_j \sinc \left( \frac{t - j\Delta t}{\Delta t} \right),
\label{eq:WhittakerShannon}
\end{equation}
where $d_j = d(j\Delta t)$ are the sampled data values and the $\sinc$ function\index{$\sinc$ (sinc function)} is defined as
\begin{equation}
\sinc(x) = \frac{\sin \pi x}{\pi x}.
\label{eq:sincfunction}
\end{equation}
Recall that the multiplication of two functions in
the time domain is equivalent to the convolution of the their Fourier transforms in the frequency domain,
hence
\PSfig[h]{Fig1_sampl1}{The sampling or ``comb'' function, $\Delta (t)$, represents mathematically what we
do when we sample a continuous phenomenon $d(t)$ at discrete, equidistantly spaced times.}
\noindent
\begin{equation}
d(t) \cdot \Delta (t) \leftrightarrow D(f) * \Delta (f).
\end{equation}
The time-domain expression is visualized in Figure~\ref{fig:Fig1_sampl2}.
\PSfig[H]{Fig1_sampl2}{Sampling equals multiplication of a continuous signal $d(t)$ with a comb function $\Delta (t)$ in the time-domain.}
\noindent
The transformed function $\Delta(f)$ can be shown to be a series of impulses as well (Figure~\ref{fig:Fig1_sampl3}).
\PSfig[H]{Fig1_sampl3}{The Fourier transform of the comb function, $\Delta (t)$, is another comb function, $\Delta (f)$, with a spacing of $1/\Delta t$ between impulses.}
\noindent
In the frequency domain, $d(t)$ is represented as $D(f)$ and illustrated in Figure~\ref{fig:Fig1_sampl4}.
We note that while the time-domain comb function $\Delta(t)$ is a series of impulses spaced every $\Delta t$,
the frequency-domain comb function $\Delta(f)$ is also a series of impulses, but spaced every $1/\Delta t$.
The time and frequency domain spacings of the comb
functions are thus reciprocal: A finer sampling interval leads to a larger distance between the impulses in the
frequency domain.
\PSfig[H]{Fig1_sampl4}{The Fourier transform of our continuous phenomenon, $d(t)$. We assume it is band-limited so that the transform
goes to zero beyond the highest frequency, $f_N$.}
\noindent
Given $D(f)$ and $\Delta(f), D(f) * \Delta(f)$ is schematically shown in Figure~\ref{fig:Fig1_sampl5}.
\PSfig[H]{Fig1_sampl5}{Replication of the transform, $D(f)$, due to its convolution with the comb function, $\Delta (f)$.}
\noindent
If the impulses in $\Delta(f)$ are spaced closer than $1/\Delta t$ then there will be some overlap between the $D(f)$ replicas
that are centered at the location of each impulse (see Figure~\ref{fig:Fig1_sampl6}).
\PSfig[H]{Fig1_sampl6}{Aliasing in the frequency domain occurs when the sampling interval $\Delta t$ is too large.}
\noindent
This overlap introduces \emph{aliasing} (which we shall discuss more later). To prevent aliasing, we must ensure $\Delta t \leq 1/(2 f_N)$,
where $f_N$ is the highest (radial) frequency component present in the time series. As mentioned earlier, we call $f_N$ the
\emph{Nyquist frequency} and the Nyquist sampling interval is $\Delta t = 1/(2 f_N)$, hence $f_N = 1/(2 \Delta t)$.
\index{Aliasing}
\index{Frequency!Nyquist}
\index{Nyquist frequency}
As long as we follow the sampling theorem and select $\Delta t \leq 1/(2 f_N)$, with $f_N$ being the highest
frequency component, there will be no spectral overlap in $D(f)* \Delta (f)$ and we will be able to recover
$D(f)$ completely. Therefore (and to prove the sampling theorem) we recover $D(f)$ by truncating the signal:
\begin{equation}
D(f) = [ D (f) * \Delta (f) ] \cdot H(f),
\end{equation}
\index{Gate function}
\noindent
which is illustrated in Figure~\ref{fig:Fig1_sampl7} as a multiplication of the replicating spectrum with a \emph{gate} function, $H(f)$.
\PSfig[h]{Fig1_sampl7}{Truncation of the Fourier spectrum via multiplication with a rectangular gate function, $H(f)$.}
\subsection{Aliasing, again}
\PSfig[h]{Fig1_aliasing2}{Aliasing as seen in the time domain. Thin line shows a phenomenon with period $P$.
The circles and heavy dashed line show
the signal obtained using a sampling rate of $1.25P$, while the squares and dashed
line show a constant signal ($f = 0$) obtained with a sampling rate of $2P$.}
Aliasing can be viewed from several angles. Conceptually, if $\Delta t > 1/(2 f_N)$ (where $f_N$ is
the highest frequency component in phenomenon of interest), then a high frequency component will \emph{masquerade} in the
sampled series as a lower, artificial frequency component, as shown in Figure~\ref{fig:Fig1_aliasing2}.
\noindent
If $\Delta t$ is a multiple of $P$ (e.g., see the squares in Figure~\ref{fig:Fig1_aliasing2}), then this frequency component is indistinguishable from a
horizontal line (i.e., a constant, with frequency $f = 0$). If $\Delta t = 5 P/4$ (see circles in Figure~\ref{fig:Fig1_aliasing2}) then this frequency
component is indistinguishable from a component with frequency $1/5 P$ (i.e., period of $5P$). Therefore, the
under-sampled frequency components manifest themselves as lower frequency components
(hence the word alias). In fact, every frequency \emph{not} in the range
\index{Principal alias}
\begin{equation}
0 \leq f \leq 1/(2\Delta t)
\end{equation}
has an alias in that range --- this is its \emph{principal alias}. Furthermore, any frequency $f_H > f_N$
will be indistinguishable from its principal alias. That is, the actual frequency $f_H = f_N + \Delta f$ will appear as the aliased frequency $f_L = f_N - \Delta f$.
\index{Folding frequency}
\index{Frequency!folding}
Because of this relationship, the Nyquist frequency $(f_N)$ is often called the \emph{folding frequency} since the
aliased frequencies ($f > f_N$) will appear at their principal aliases folded back into the range $\leq f_N$ (Figure~\ref{fig:Fig1_nyquist}).
Therefore, when computing the transform of a data set,
any frequency components in the phenomenon with true frequencies $f > f_N$ have been folded back into
the resolved frequency range during sampling. Consequently, we must carefully choose $\Delta t$ so that the powers at frequencies $f' > f_N$ are either small
or nonexistent, or we must ensure that $f_N$ is high enough so that the aliased part of the spectrum only affects
frequencies higher than those of interest ($f \leq f_I$, see Figure~\ref{fig:Fig1_nyquist}).
\PSfig[h]{Fig1_nyquist}{Aliasing and folding frequency. Power at higher frequencies than the Nyquist ($f_N$) will
reappear as power at lower frequencies, ``folded'' around $f_N$. This extra power (orange) is then added to the
actual power and the result is a distorted, total power spectrum (red). Selecting the Nyquist frequency so that aliasing only affects frequencies higher
than the frequencies of interest $(f \leq f_I)$. In this case, the extra power (orange) that is folded around $f_N$
does not reach into the lower frequencies of interest, and consequently the total spectrum is unaffected for frequencies
lower than $f_I$.}
\index{Convolution|)}
\index{Sampling!theory|)}
\section{Aliasing and Leakage}
\index{Aliasing|(}
\index{Leakage|(}
\PSfig[H]{Fig1_AL}{The continuous and band-limited phenomenon of interest, represented both in the time and frequency domains.
Left column represents the time domain and the right column represents the frequency domain, separated by a vertical dashed gray line.
The multiply, convolve, and equal signs indicate the operations that are being performed. a) Continuous phenomenon, b) Sampling function,
c) Infinite discrete observations, d) Gate function, e) Truncated discrete observations, f) Assumed periodicity $T$, g) Aliasing and leakage of signal.}
We were exploring the relationship between the continuous and discrete Fourier transform
and found that we could illustrate the process graphically. First, we found that we had to sample
the time-series $d(t)$ (Figure~\ref{fig:Fig1_AL}a). The sampling of the phenomenon by the sampling function $\Delta(t)$
(Figure~\ref{fig:Fig1_AL}b) is a multiplication in the time-domain, which implies a convolution in the frequency domain.
This sampling yields discrete observations in the time domain, but the multiplication in the time
domain equals a convolution in the frequency domain, enforcing periodicity of the spectrum (Figure~\ref{fig:Fig1_AL}c).
Depending on the chosen sampling interval we may or may not have spectral overlap (aliasing).
This discrete infinite series must then be truncated to contain a finite number of observations.
The truncation is conceptually performed by multiplying our infinite time series with a finite gate function.
\index{Gate function}
\index{Data!truncation}
This truncation of the infinite and periodic signal amounts to a multiplication in the time domain with a gate function, $h(t)$,
whose transform is
\begin{equation}
H(f) = \sinc (fT) = \frac{\sin \pi fT}{\pi fT},
\end{equation}
with both functions displayed in Figure~\ref{fig:Fig1_AL}d. This process results in the finite discrete observations
shown in Figure~\ref{fig:Fig1_AL}e.
It is this truncation that is responsible for introducing \emph{leakage}.
Leakage arises because the truncation implicitly
assumes that the time-series is periodic with period $T$ (Figure~\ref{fig:Fig1_AL}f). Consequently, the
discretization of frequencies is equivalent to enforcing a periodic signal (Figure~\ref{fig:Fig1_AL}g).
Because both the time and frequency domain functions have been convolved with a series of
impulses (by $\Delta(t)$ in time and $\Delta(f)$ in frequency), both functions are periodic in $n$ discrete values, so
the final discrete spectrum (for a real series as shown here) between $0$ and $f_N$ represents the
discrete transform of the series on the left (which is periodic over $T$).
If the procedure in Figure~\ref{fig:Fig1_AL} is followed mathematically, it is seen that the continuous Fourier
transform is related to the discrete Fourier transform by the steps outlined graphically above.
These show that a discrete Fourier transform will \emph{differ} from the continuous transform by two effects:
\begin{enumerate}
\item Aliasing --- from discrete time domain sampling.
\item Leakage --- from finite time domain truncation.
\end{enumerate}
Aliasing can be prevented by choosing $\Delta t \leq 1/(2 f_N)$ or reduced as discussed previously. Leakage is
always a problem for most observed (and hence truncated) time series.
As discussed, leakage arises from truncation in the time domain, which corresponds to a
convolution with a $\sinc$ function in the frequency domain. Conceptually, consider the effect of time domain
truncation (Figure~\ref{fig:Fig1_trunc1}).
Fourier analysis is essentially fitting a series of sines and cosines (using the harmonics of the
fundamental frequency $1/T$) to the series $d(t)$. Since the Fourier series is necessarily periodic, it
follows that
\begin{equation}
d(T/2 + \Delta t) = d( - T/2).
\end{equation}
In other words, the transform is equivalent to that of a time series in which $d(t)$ is repeated every $T$ (Figure~\ref{fig:Fig1_trunc2}).
\PSfig[h]{Fig1_trunc1}{Truncation of a continuous signal, the equivalent of multiplying the signal
with a gate function $h(t)$, determines the fundamental frequency, $f = 1/T$.}
\PSfig[h]{Fig1_trunc2}{Artificial high frequencies are introduced due to the forced periodicity of a truncated time-series, which
produces a discontinuous signal (highlighted by the gray regions).}
\noindent
The leakage (conceptually) thus results from the frequency components that must be present to
allow the discontinuity, occurring every $T$, to be fit by the Fourier series. If the series $d(t)$
is perfectly periodic over $T$ then there is no leakage because $d(T + \Delta t) = d(\Delta t)$ and the transition will be
continuous and smooth across $T$.
To minimize leakage we attempt to minimize the discontinuity (between $d(0)$ and $d(T)$) or
minimize the lobes of the $\sinc (fT)$ function convolving the spectrum. This is accomplished by
truncating the time series with a more gently sloping gate function (called a taper, fader, window,
etc.). In other words, we use a smoother function that has fewer high frequency components (Figure~\ref{fig:Fig1_trunc3}).
\noindent
\index{Gate function}
\index{Bartlett window}
\index{Windowing!Bartlett}
\index{Hanning window}
\index{Windowing!Hanning}
\index{Parzen window}
\index{Windowing!Parzen}
\index{Hamming window}
\index{Windowing!Hamming}
\index{Bartlett-Priestley window}
\index{Windowing!Bartlett-Priestley}
The triangular function is the \emph{Bartlett} window, which is the rectangle function convolved with
itself (hence its transform is $\sinc^2 (fT)$). The dashed line is the split cosine-bell window. Other
windows include: \emph{Hanning} (a cosine taper), \emph{Parzen} (similar to Hanning but decays sooner and
more steeply, \emph{Hamming} (like Hanning), and \emph{Bartlett-Priestley} (which is quadratic and has ``optimal'' properties,
satisfying specific error considerations.) All of these tapers have transforms that are less
oscillatory than the $\sinc$ function but they are also wider. Therefore, multiplication of the time series with one
of these gate functions results in a convolution whose transform in the frequency domain will smear spectral peaks
more than the $\sinc$ function did. In return, it will not introduce ripples far away from these spectral peaks.
\PSfig[h]{Fig1_trunc3}{Alternative gate functions and their spectral representations. The less abrupt
a gate function is in the time domain the less ringing it will introduce in the frequency domain.}
Note that multiplying by, say, a Hanning window will make $d(T/2+\Delta t) \sim d(-T/2)$, so
the bothersome discontinuity is eliminated --- however damping of all $d(t)$ away from $d(T)$ acts like a modulation,
which accounts for the smearing of spectral peaks. Hence, leakage is still not completely eliminated.