-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathDA1_Chap9.tex
executable file
·696 lines (656 loc) · 40.4 KB
/
DA1_Chap9.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
% DA1_Chap9.tex
%
\chapter{ANALYSIS OF DIRECTIONAL DATA}
\label{ch:directional}
\epigraph{``Essentially, all models are wrong, but some are useful.''}{\textit{George E. P. Box, Statistician}}
Much environmental data contain information about orientations in the plane where the \emph{orientations}
and not the lengths of the features are the important attributes. Examples of such data types abound: Strikes and dips
of bedding planes, fault surfaces, and joints are familiar from structural geology, augmented by glacial striations, sole marks,
lineaments in satellite images, directions of winds and currents, and much more. First, we must distinguish between
\emph{directional} and \emph{oriented} data. Directional data can take on unique values in the entire 0--360$^{\circ}$
range, like drumlins and wind directions, while oriented data consist of ``two-headed'' vectors: there is a 180$^{\circ}$
ambiguity
inherent in the data. Examples of oriented data include fault traces and other lineaments on maps.
Such data require special care in the analysis.
However, environmental data involve not only directions in the plane but spatial directions as well, which
introduces another degree of complexity. Because 3-D vector data are very common, particularly in
the Earth and environmental sciences, we need to examine such data in more detail. We find that much of the
measurements used in structural geology, such as strike and dip of fault planes, can be expressed
as a normal vector to the fault plane. Other examples include vector measurements of the
geomagnetic field, palaeomagnetic measurements, stress directions, wind and current directions, and determinations of
crystallographic axes for petrofabric studies.
\index{Directional data}
\index{Data!directional}
\index{Oriented data}
\index{Data!oriented}
\section{Circular Data}
\subsection{Displaying directional distributions}
Directional data can be displayed in circular diagrams. One can either plot each direction as
a unit vector, or we may count the number of vectors within a given angular sector and draw a polar histogram of
the distribution (Figure~\ref{fig:Fig1_dir1}):
\index{Polar histogram}
\index{Histogram!polar}
\PSfig[H]{Fig1_dir1}{Two types of graphical presentations of the same directional data set. (a) Windrose diagram shows
all individual directions, (b) Sector diagram shows distribution via a polar histogram (a so-called \emph{rose} diagram).}
\noindent
Unfortunately, the sector diagram as described here is biased in the way it presents the data.
Nevertheless, it is still the most commonly used type of display for directional data. Where is this bias
coming from? Consider the area of a sector of width $\Delta \alpha$, given by
\begin{equation}
A=\frac{\pi r^{2}\Delta\alpha}{360} \propto r^2.
\end{equation}
We see that the area of a sector is proportional to the radius squared, whereas for a conventional Cartesian histogram the
column area is proportional to height, not height squared. Consequently, this leads to a visual distortion of
sectors with high counts. Therefore, we should let the sector
radius be proportional to the square root of the frequency (or count) to ensure that the final rose diagram
will have an area proportional to frequency. If this is not done, the larger counts for some sectors will completely
swamp smaller sectors due to the $r^2$ effect. Thus, small but significant trends may not be
detected or may simply be considered noise.
Before we can statistically analyze orientation data, the 180$^{\circ}$ ambiguity must be accounted
for. The simplest way to do this is to double all angles and analyze these doubled angles instead. For instance, if
two fault traces are reported as having strikes of 45$^{\circ}$ and 225$^{\circ}$ (which means the two strikes have the
same orientation), we double the angles and find $90^{\circ}$ and $450^{\circ} - 360^{\circ} = 90^{\circ}$.
Statistics derived from doubled angles can then be divided by two to recover their original meanings.
\index{Mean direction}
\index{Direction!mean}
\index{Resultant}
The dominant or \emph{mean direction} can be found by computing the \emph{vector resultant}
(i.e., the vector sum) of the unit vectors that represent the various directions in the data. Since the
coordinates of these vectors are given by
\begin{equation}
x_{i}=\cos \theta_{i},\quad y_{i}=\sin \theta_{i},
\label{eq:xy_components}
\end{equation}
we find the coordinates of the resultant $\mathbf{R} = (x_r, y_r)$ to be
\index{Vector!resultant}
\index{Vector!resultant length}
\begin{equation}
x_{r}=\displaystyle \sum^{n}_{i=1} \cos \theta_{i}, \quad y_{r}=\displaystyle \sum^{n}_{i=1} \sin \theta_{i}.
\end{equation}
The mean direction $\bar{\theta}$ is then simply given by
\begin{equation}
\index{Mean direction}
\index{Direction!mean}
\bar{\theta}= \tan^{-1}(y_{r}/x_{r})=\tan^{-1} \left(\sum^{n}_{i=1} \sin \theta_{i}/\sum^{n}_{i=1}\cos \theta_{i} \right).
\label{eq:meandir}
\end{equation}
Computer implementations of (\ref{eq:meandir}) should take care to use the \texttt{atan2} function rather than \texttt{atan}
so that the mean direction is found in the correct quadrant.
The magnitude $R$ of the resultant should be normalized by the number of vectors before it can be
used further. This gives us the \emph{mean resultant length}
\begin{equation}
\index{Mean resultant length}
\index{Vector!mean resultant length}
\bar{R} = \frac{R}{n} = \frac{\sqrt{x^2_r + y^2_r}}{n},
\end{equation}
which ranges from 0 to 1 and is a measure of (normalized) dispersion analogous
to the variance. Since it increases for focused distribution we often convert it to a
\emph{circular variance} via
\begin{equation}
\index{Variance!circular}
s^{2}_{c}=1- \bar{R}= \left( n-R \right) / n.
\end{equation}
\PSfig[H]{Fig1_vonMises}{The circular von Mises distribution. One population is centered on $\mu = 75$ (dashed line), with very low directionality
($\kappa = 1$), and shows the effect of wrapping the wide distribution around the full circle. The other, at $\mu = -75$, is much more directional ($\kappa = 10)$.)}
To test various statistical hypotheses about circularly distributed data we need a
probability density function that can be used to obtain critical values. A traditional distribution that has
been used extensively in studies of directional data is the \emph{von Mises} distribution, given by
\begin{equation}
\index{Probability distribution!von Mises}
\index{von Mises distribution}
p(\theta) = \frac{1}{2 \pi I_0 ( \kappa)} e^{\kappa \cos\left( \theta - \mu \right)},
\label{eg:von_Mises}
\end{equation}
where $\kappa$ is a measure of the \emph{concentration} of the distribution about the mean direction $\mu$, and $I_0$ is
the modified Bessel function of the first kind and order zero (it normalizes the
cumulative distribution of $p(\theta)$ over $360^{\circ}$ to unity.) A large $\kappa$ means we have a strongly preferred direction.
The concentration parameter $\kappa$ is obviously related to
$R$ (or $s_c^2$) and this relationship can be
derived if we assume that the data represent a random sample from a population having a von Mises
distribution. Statistical tables (e.g., Table~\ref{tbl:Critical_kappa2}) report solutions to (\ref{eq:kappa2d}) that relate $\kappa$ and $\bar{R}$.
\subsection{Test for a random direction}
\index{Test!Rayleigh}
\index{Test!random direction}
The simplest test for circular data is to determine whether the directional observations are
random or not, i.e., that there is no \emph{preferred direction}. In that case we have a \emph{uniform} distribution.
In terms of the von Mises distribution (\ref{eg:von_Mises}), a uniform distribution means that $ \kappa$ must be zero.
Hence, the hypothesis test becomes
\begin{equation}
H_0: \kappa =0 \quad \mbox{versus} \quad H_1: \kappa>0.
\end{equation}
If the data came from a uniform distribution (i.e., $ \kappa$ = 0), we would expect a small but not necessarily zero value for $\bar{R}$. Lord Rayleigh devised this
test, which gives critical values for $\bar{R}$ depending on the $\nu = n$ degrees of freedom and chosen level of significance, $\alpha$.
For $\nu \geq 10$ we can approximate the critical mean resultant via
\begin{equation}
\bar{R}_{\alpha,n} = \frac{1}{2n}\sqrt{1+4n(n+1)-\left (\log \alpha + 2n + 1 \right)^2},
\end{equation}
while for smaller samples we need to consult exact tables (e.g., Table~\ref{tbl:Critical_R2}).
If $\bar{R}$ exceeds the critical $\bar{R}_{\nu, \alpha}$, then we must reject the null hypothesis.
\begin{example}
We have measured the strikes of fault planes in an area
of the sea floor (called Area 1) imaged by a side-scan sonar device and found the following orientations:
$$
\theta_i : 110^{\circ}, 300^{\circ}, 310^{\circ}, 135^{\circ}, 138^{\circ}, 320^{\circ}, 141^{\circ}, 145^{\circ}, 330^{\circ}, 335^{\circ}, 280^{\circ}, 160^{\circ}, 170^{\circ} \quad n=13.
$$
Since these are oriented features, we double the angles to remove any ambiguity in the orientations:
$$
2\theta_i : 220^{\circ}, 240^{\circ}, 260^{\circ}, 270^{\circ}, 276^{\circ}, 280^{\circ}, 282^{\circ}, 290^{\circ}, 300^{\circ}, 310^{\circ}, 200^{\circ}, 320^{\circ}, 340^{\circ}.
$$
Summing up the sines and cosines via (\ref{eq:xy_components}), we find
\begin{equation}
x_r = 1.2972, \quad y_r = -10.349,
\end{equation}
which gives us
\begin{equation}
2\bar{\theta}_1 = 277^{\circ}, \quad \bar{R}_1 = 0.802.
\end{equation}
We convert back to original angles by dividing by 2 and find the mean orientation to be
\begin{equation}
\bar{\theta}_1 = 138.5^{\circ}.
\end{equation}
The null hypothesis is not affected by doubling the angles since we are simply testing if $\kappa = 0$.
At the $\alpha = 0.05$ level of significance we use Table~\ref{tbl:Critical_R2} to find critical $\bar{R}_{13,0.05} = 0.475$. Clearly, this value is greatly
exceeded by the observed $\bar{R} = 0.802$ and we must conclude that the fault strikes are not randomly oriented.
\end{example}
\subsection{Test for a specific direction}
\index{Test!specific direction}
There are instances when we would like to test whether an observed trend equals a
specified trend. The determination of critical values for such a scenario is more difficult and involves
using complex equations or charts. As an alternative approach we may find the \emph{confidence angle} $\Delta \theta$ around
the mean direction of the sample. We can then investigate whether this angular interval is large enough to accommodate the specified trend
we want to test against. The approximate standard error in $\bar{\theta}$ is given (in radians) by
\index{Confidence interval!mean direction}
\begin{equation}
s_e \approx 1/\sqrt{n \bar{R}k} = 1/\sqrt{Rk},
\end{equation}
where $k$ is our sample estimate of $\kappa$, the population parameter. If we assume that the estimation errors are normally distributed
then we may use critical $z$-values for a given confidence level to construct the angular interval as
\begin{equation}
\bar{\theta} \pm \left | z_{\frac{\alpha}{2}} \right | s_e,
\end{equation}
which will contain the true mean (double angle) orientation, $2\mu, 100\cdot\alpha$ \% of the time.
\begin{example}
In our fault strike data case above,
it is believed that the faults arose as tensional features in response to tectonic forces operating in the N$30^{\circ}$E
direction. Under such a stress regime we would theoretically expect the faults to trend at 90$^{\circ}$ to the direction of these tensile
stresses, i.e., in the N120$^{\circ}$E direction. From Table~\ref{tbl:Critical_kappa2} we find $\kappa =2.897$, which yields
\begin{equation}
s_{e_2} = \frac{1}{ \sqrt{13 \cdot 0.802 \cdot 2.897}} = 0.1820 = 10.4^{\circ}.
\end{equation}
Note that this is the standard deviation about the \emph{doubled} mean angle orientation of $277^{\circ}$. We divide this deviation by
2 to find the standard deviation associated with the original orientations, $s_{e_1} = 5.2^{\circ}$. The
$95 \%$ confidence interval corresponds to $z_{0.95} = 1.96$, so the confidence interval around $\bar{\theta}_1$ becomes
\begin{equation}
\bar{\theta}_1 = 138.5^{\circ} \pm 10.2^{\circ}.
\end{equation}
Since the predicted direction of $120^{\circ}$ is outside this interval we conclude that the observed
mean direction deviates from that predicted based on the orientation of current tectonic forces. It is possible that the
orientation of stresses may have changed since the formation of the faults.
\end{example}
\subsection{Test for equality of two mean directions}
\index{Test!equality of two mean directions}
\label{sec:twoDdir}
Another common situation where we may want to apply a statistical test is to determine if two
sample mean directions are equal at some prescribed level of confidence. This situation, of course, is
reminiscent of the two-sample mean test for scalars that we described in Section~\ref{sec:twomeans}. For example, we may
have obtained new side-scan sonar data for an area farther away and would like to test whether the
mean direction of fault strikes at the second location is similar to what was found at the first location.
We can carry out this
test by comparing the vector resultants of the two groups to that produced by pooling the
two data sets and recalculate a single grand resultant. The key principle here
is that if the mean directions are different, then the pooled resultant should be \emph{shorter} than
the sum of the two individual resultants. This expectation can be examined using an \emph{F}-test by
comparing observed $F$ to critical $F_{1,n-2,\alpha}$. Unfortunately,
due to approximations needed to relate Cartesian and circular critical values, the procedure to evaluate
the $F$-statistic changes with the dispersion parameter, $\kappa$, represented by our sample-based value, $k$.
For large $k > 10$, we may compute
\index{\emph{F}-test}
\index{Test!\emph{F}}
\begin{equation}
F = \frac{(n-2)(R_1 + R_2 - R_p)}{n - R_1 -R_2},
\end{equation}
with $n$ being the combined number of observations, $R_1$ and $R_2$ are the full resultants (not \emph{mean} resultants)
for each data set, and
$R_p$ is the resultant from the combined data. Here, $k$ is estimated from $\bar{R}_p$, the mean resultant.
However, if $2 < k < 10$ then a more accurate $F$-statistic is obtained via the adjustment
\begin{equation}
F^{\ast} = \left(1 + \frac{3}{8k} \right) F,
\label{eq:betterFk}
\end{equation}
while for even smaller values of $k$ special tables must be used.
\begin{example}
We will use the equality test to determine whether the mean
strike of the faults in our second area (Area 2) is the same as what was found for the first area
($\bar{\theta}_1 = 138.5^{\circ}$, double angle = $277^{\circ}$). The new data are:
$$
\theta_i = 91^{\circ}, 280^{\circ}, 111^{\circ}, 115^{\circ}, 118^{\circ}, 300^{\circ}, 122^{\circ}, 126^{\circ}, 130^{\circ}, 80^{\circ}, 320^{\circ}, 149^{\circ} \quad (n = 12),
$$
which we double to get
$$
2\theta_i = 182^{\circ}, 200^{\circ}, 222^{\circ}, 230^{\circ}, 236^{\circ}, 240^{\circ}, 240^{\circ}, 244^{\circ}, 252^{\circ}, 260^{\circ}, 160^{\circ}, 280^{\circ}, 298^{\circ}.
$$
Carrying out a similar analysis on our new data
we now find $2\bar{\theta}_2$ = 234.5$^{\circ}$ and $\bar{R}_2$ = 0.805. The hypothesis becomes
$H_0:\bar{\theta}_1 = \bar{\theta}_2$ versus $H_1:\bar{\theta}_1 \neq \bar{\theta}_2$, at
$\alpha = 0.05$. For the combined data set (with $n = 25$), we obtain
\begin{equation}
2\bar{\theta}_p =256.7^{\circ}, \quad \bar{R}_p =0.749.
\end{equation}
\index{\emph{F}-test}
\index{Test!\emph{F}}
We find observed $k$ from Table~\ref{tbl:Critical_kappa2} to be 2.36. Hence, we use (\ref{eq:betterFk}) to obtain
\begin{equation}
F = \left(1 + \frac{3}{8 \cdot 2.36} \right) \frac{23 \left(13\cdot0.802 + 12\cdot0.805 - 25\cdot0.749 \right)}{ 25-13\cdot0.802-12\cdot0.805}=7.42.
\end{equation}
Table~\ref{tbl:Critical_F95} gives the critical $F_{1,n-2,\alpha}$ value for 1 and 23
degrees of freedom as 4.28. Therefore, we must
reject the null hypothesis and conclude that the fracture directions in the two areas appear to have
different orientations at the $95 \%$ confidence level.
\end{example}
\subsection{Robust directions}
\index{Robust!directions}
\index{Direction!robust}
\PSfig[h]{Fig1_dir2}{The effect of outliers on the mean direction is most severe when the outlier is orthogonal to the bulk of the data directions.
Headed vectors represent the mean resultants.}
The concept of robustness is also applicable to directional data since such data may also contain outliers.
However, directional outliers cause less harm than their Cartesian counterparts discussed earlier since directional data are forced
to be periodic. It is clear from Figure~\ref{fig:Fig1_dir2} that an outlier causes most damage to estimates of mean direction when it is oriented
90$^{\circ}$ away from the trend of the bulk of the data.
We may find the circular analog of the median by finding the direction $\tilde{\theta}$ that minimizes the sum
\begin{equation}
\mbox{minimize} \ \sum^n_{i=1} d \left( \theta_i, \tilde{ \theta} \right),
\end{equation}
where $d\left( \theta_i, \tilde{\theta}\right)$ is the arc distance between $\theta_i$ and
$ \tilde{ \theta} $ measured along the perimeter of the unit circle. Since this distance is proportional
to $\mid \theta_i - \tilde{ \theta} \mid$ it is the equivalent of minimizing the sum
\begin{equation}
\mbox{minimize} \ \sum^n_{i=1} \mid \theta_i - \tilde{ \theta} \mid,
\end{equation}
which we know gives the median of the $ \theta_i$ data set. Similarly, a LMS mode estimate can be found by
determining the midpoint of the shortest arc containing $n/2+1$ points. Again, this would involve sorting
the directions first, possibly after doubling orientations.
\begin{example}
Consider the wind directions
$$
\theta_i: 75^{\circ}, 85^{\circ}, 90^{\circ}, 98^{\circ}, 170^{\circ} \quad
\bar{ \theta}=100.7^{\circ}.
$$
The shortest arc over $5/2+1=3$ points is located between 85$^{\circ}$ and 98$^{\circ}$, giving the mode
estimate $\hat{\theta} = 91.5^{\circ}$. This estimate suggests that 170$^{\circ}$ is an outlier
with respect to the rest of the atmospheric data.
\end{example}
\subsection{Data with length and direction}
\index{Digitizing}
\index{Uncertainty!digitizing}
In the analysis so far we have only considered the direction (or orientation) of a feature and not its length.
However in many cases, such as fracture data or fault traces, the features will have very different lengths.
The analysis described above, when applied to such data, would give both a one km long and a 100 km long fault
the same weight, which does not seem to be appropriate. We
can account for this bias by weighing the directions by the respective \emph{lengths} of the features. By keeping
track of the length of the faults (and not their numbers) per sector we can obtain a rose diagram that
reflects the proportions of the various fracture directions. The rationale employed here is that large and/or
long faults may be more representative of the tectonic stresses than a few short fractures. The
rose diagram may then be normalized by the total length of the fractures to give overall proportions in
percent.
Staying with fault strike data, it is clear that in many regions the faults are not entirely straight
lines but may actually curve or bend. From a directional analysis point of view, such fractures must first be approximated by shorter
straight line segments. It then becomes obvious that we must weigh the pieces by their lengths,
otherwise the directional frequencies would depend on the number of pieces used. Fault traces
must therefore be digitized prior to analysis on a computer. A typical fault trace is
displayed in Figure~\ref{fig:Fig1_dir3}.
\PSfig[h]{Fig1_dir3}{A digitized fault trace. Open circles indicate the digitized points. Digitizing too finely may lead to short-wavelength
noise in the representation of the fault. Points within the dashed box are examined in Figure~\ref{fig:Fig1_dir4}.}
\noindent
Depending on the angular width of the polar histogram, the digitized fault may end up in two
bins corresponding to the orientations $\alpha_1$ and $\alpha_2$. We must also be aware of the fact that the digitizing process will introduce uncertainties in the
digitized points. To see how this may affect the analysis, consider the line segment in Figure~\ref{fig:Fig1_dir4}.
\PSfig[h]{Fig1_dir4}{The uncertainty $r_s$ in the locations of two digitized points (A and B) from Figure~\ref{fig:Fig1_dir3} introduce uncertainties
$\Delta d$ and $\Delta \alpha$ in the length ($d_0$) and orientation ($\alpha_0$) of a line segment, respectively.}
\noindent
We may assume that the exact position of each point is uncertain, here represented by the one-sigma uncertainty estimate $s_r$ in radial position
(gray circles) for each point. There are two points of interest here: First, the length of the segment, $d$,
will have an uncertainty since it reflects a difference between two uncertain values. In Chapter~\ref{ch:error}, we found this to be
\begin{equation}
\Delta d = \sqrt{2 s_r}, \quad d= d_0 \pm \Delta d,
\label{eq:dig_error}
\end{equation}
assuming the uncertainties are \emph{independent}.
Second, we see that the direction (or orientation) $\alpha$ may be in error by $ \pm \Delta \alpha$, given by
\begin{equation}
\Delta \alpha = \tan^{-1} \left( 2 s_r/d \right), \quad \alpha = \alpha_0 \pm \Delta \alpha.
\end{equation}
Consequently, one should not digitize lines so frequently that $ \Delta \alpha$ exceeds the desired polar histogram interval.
For instance, if this interval is 10$^{\circ}$ then you are best served by making the average digitizing interval
$d > 10 s_r$. Alternatively, one can \emph{filter} the digitized track so that short-wavelength noise is smoothed out before binning the segments.
The length of all segments that have a direction within the width of a single bin is simply computed by adding
up all the individual lengths. Note that since internal nodes are shared by adjacent line segments the uncertainty
in the total length is independent of the number of line segments (e.g., Figure~\ref{fig:Fig1_digline}) and only depends on (\ref{eq:dig_error}). However,
the \emph{sum} of the lengths of the individual segments will have an uncertainty that is cumulative since the segments are no longer connected.
This method will provide a frequency distribution with error bars in which directions with many small segments will have higher
uncertainty than directions with fewer and longer faults.
\section{Spherical Data Distributions}
\index{Spherical data distributions|(}
\index{Data!spherical|(}
The analysis of 3-D directions and orientations is an extension of the methods used for circular data.
It is common to require that these 3-D vectors have unit lengths so that their endpoints all lie
on the surface of a sphere with unit radius --- hence the name spherical distributions.
Similar to the 2-D case for circular data, there will be spherical data that only reflect orientations (i.e., \emph{axes}) rather than directions.
\PSfig[h]{Fig1_3D}{The relation between the Cartesian ($x, y, z$) and spherical ($\phi, \theta, r$) coordinate systems.}
\noindent
We need to use a 3-D Cartesian coordinate system to describe the unit vectors (Figure~\ref{fig:Fig1_3D}). Thus, any
vector $\mathbf{v}$ is uniquely determined by the triplet ($x,y,z$). We could also use spherical angles
$\theta$ (colatitude) \index{Colatitude} and $\phi$ (longitude) to specify the vector direction, assuming the length $r=1$.
We relate the Cartesian coordinates and the spherical angles as follows:
\begin{equation}\begin{array}{rll}
x & = & \sin \theta \cos \phi,\\
y & = & \sin \theta \sin \phi,\\
z & = & \cos \theta.\\
\end{array}
\end{equation}
However, geological measurements like \emph{strike}\index{Strike} and \emph{dip}\index{Dip} are more commonly used than Cartesian
coordinates and spherical angles and these follow their own convention. We define a new local
coordinate system in which $x$ points toward north, $y$ points east, and $z$ points vertically down
(i.e., in order to maintain a right-handed coordinate system). In such a system, fault plane dips are
expressed as positive angles. For the fault plane in Figure~\ref{fig:Fig1_AD} we find that the angle $A$ is the
azimuth of the strike of the plane and $D$ is the dip, measured positive down. The slip-vector
\emph{OP} is then given by its components
\begin{equation} \begin{array}{lll}
x = - \sin A \cos D,\\
y = \cos A \cos D,\\
z = \sin D.\\
\end{array}
\end{equation}
\PSfig[h]{Fig1_AD}{Local, right-handed coordinate system shows the convention used in structural geology.
Here, $A$ is the strike (measured from north over east) of the dipping plane (light green) and $D$ is its dip, measured as the angle between
the dip vector \emph{OP} and its projection \emph{ON} onto the horizontal $x-y$ plane. The red plane containing
the dip vector is orthogonal to the light green plane.}
Once we have converted our ($A, D$) data to ($x,y,z$) we can compute such quantities as mean
direction and spherical variance, which are simple extensions of the 2-D or directional analogs.
The length of the resultant vector is simply
\begin{equation}
\index{Vector!mean resultant}
\index{Mean resultant vector}
R = \sqrt{\left( \displaystyle \sum x_i \right)^2 + \left( \displaystyle \sum y_i \right)^2 +
\left( \sum z_i \right)^2},
\label{eq:Rlength}
\end{equation}
with the sum taken over all the $n$ points. The resultant vector
is usually normalized to give $\bar{R} = R / n$. The coordinates $\bar{x}, \bar{y}$ and $\bar{z}$
of the mean vector $\mathbf{m}$ are then obtained via
\begin{equation}
\bar{x} = \displaystyle \sum x_i/n,\quad
\bar{y} = \displaystyle \sum y_i/n,\quad
\bar{z} = \displaystyle \sum z_i/n,
\end{equation}
so that the mean slip-vector is given by its two components
\begin{equation} \begin{array}{ccc}
\bar{D} = \sin^{-1} \bar{z},\\
\bar{A} = \tan^{-1} (- \bar{x}/ \bar{y}).
\end{array}
\end{equation}
If all the vectors are closely clustered then the resultant $R$ will approach $n$, but if the vectors
are more randomly distributed then $R$ will approach zero. As in 2-D, we can use $R$ (or $\bar{R}$) as the basis
for the \emph{spherical variance}, $s^2_s$, give by
\begin{equation}
s^2_s = (n-R) / n = 1 - \bar{R}.
\end{equation}
\subsection{Test for a random direction}
\index{Test!Rayleigh}
\index{Test!random direction}
\PSfig[h]{Fig1_Fisher}{Well-focused Fisher distribution on a sphere, centered on a point in northern India, with $\kappa = 40$.}
We can perform simple hypothesis tests in a manner analogous to those we carried out for 2-D directional
data. Unlike the case for 2-D we now require a \emph{spherical} probability density function from which we can derive critical values for
comparison with our observed statistics. In response to the need for 3-D statistical analysis, in particular
for palaeomagnetic studies that started to explore ``continental drift'' in the 1950s, the famous British
statistician Ronald Fisher\index{Fisher, R. A.} developed a suitable theoretical distribution
for spherical data. His probability density function has since been called the \emph{Fisher} distribution on a sphere and is
given by
\index{Continental drift}
\index{Palaeomagnetism}
\begin{equation}
\index{Probability distribution!Fisher}
\index{Fisher distribution}
P(\mathbf{x}) = \frac{ \kappa}{ 4 \pi \sinh \kappa} e^{\kappa \left(\mathbf{x} \cdot \boldsymbol{\mu} \right)},
\label{eq:fisher}
\end{equation}
where the dot product between the mean direction $\boldsymbol{\mu}$ and
any other direction $\mathbf{x}$ equals the cosine of the angle $\psi$ between them, $\kappa$ is the precision parameter similar to the one we encountered
for the von Mises distribution on the circle, and $\sinh$ is the \emph{hyperbolic sine} function (which is needed to ensure the cumulative distribution
of $P(\mathbf{x})$ over the unit sphere equals one).
Fisher\index{Fisher, R. A.} showed one can estimate $\kappa$ from the sample, provided $n > 7$ and $\kappa > 3$.
The estimate is then given by
\begin{equation}
k \approx \kappa = \frac{n-1}{n-R},
\end{equation}
but more accurate tables also exist (e.g., Table~\ref{tbl:Critical_kappa3}) which solve (\ref{eq:kappa3d}).
Testing a spherical distribution for randomness follows the same approach used for directional data:
We must first evaluate $\bar{R}$, the mean resultant. Then, we state the null hypothesis to be
\begin{equation}
H_0:\kappa = 0 \quad H_1:\kappa >0.
\end{equation}
As before, this test is executed by establishing critical values of $\bar{R}$ given the prescribed level of
confidence, $\alpha$. Table~\ref{tbl:Critical_R3} shows such critical $\bar{R}$ values for selected values of $\alpha$ and $n$.
\begin{example}
Let us examine a set of palaeomagnetic measurements. We
have been given six measurements of \emph{declination} and \emph{inclination}, reported as
\index{Declination}
\index{Inclination}
$$
\begin{array}{|l||r|r|r|r|r|r|}
\hline
\mbox{Dec} & 105^{\circ} & 130^{\circ} & 115^{\circ}
& 120^{\circ} & 118^{\circ} & 145^{\circ} \\ \hline
\mbox{Inc} & 40^{\circ} & 49^{\circ} & 57^{\circ} & 32^{\circ} & 55^{\circ} & 45^{\circ}\\ \hline
\end{array}
$$
First, we note that a dip-vector specified by strike $A$ and dip $D$ is not using the same geometry as
a magnetic field vector given by its magnetic declination $D$ and inclination $I$.
Because the projection of the dip-vector onto the horizontal plane is 90\DS\ away from the strike,
we must use a slightly modified conversion suitable for magnetic vectors to obtain the Cartesian coordinates:
\begin{equation} \begin{array}{lll}
x = \cos D \cos I,\\
y = \sin D \cos I,\\
z = \sin I.\\
\end{array}
\end{equation}
We convert the $n = 6$ observed declinations and inclinations to $x,y,z$ and find
$$
\begin{array}{|l||r|r|r|r|r|r||l|r|}
\hline
x & -0.20 & -0.42 & -0.23 & -0.42 & -0.27 & -0.58 & \bar{x} & -0.354 \\ \hline
y & 0.74 & 0.50 & 0.49 & 0.73 & 0.51 & 0.41 & \bar{y} & 0.564 \\ \hline
z & 0.64 & 0.75 & 0.84 & 0.53 & 0.82 & 0.71 & \bar{z} & 0.715 \\ \hline
\end{array}
$$
Computing the mean direction and resultant gives
\begin{equation}
\bar{I}_1 = \sin ^{-1} \bar{z} = 45.7^{\circ},
\end{equation}
\begin{equation}
\bar{D}_1 = \tan^{-1} \bar{y} / \bar{x} = 122.1^{\circ},
\end{equation}
\begin{equation}
\bar{R}_1 = \frac{1}{6} \cdot 5.86 = 0.977,
\end{equation}
\begin{equation}
k_1 = \frac{6-1}{6-5.86} = 35.7.
\end{equation}
It is evident that our distribution has a clear preferred direction since $k_1$ is so large (the critical
$\bar{R}$ for $\alpha = 0.05$ is only 0.642).
\end{example}
\subsection{Test for a specific direction}
\index{Test!specific direction}
Often, we will be interested in testing whether the observed mean direction equals a
prescribed direction, given the uncertainties due to random errors. As for circular data, such tests
are best performed by constructing the $\alpha$ confidence region around the mean direction. This
statistic is based on the Fisher\index{Fisher, R. A.} distribution and gives a spherical cap radius for a \emph{cone of confidence}
around the mean direction. As usual, this radius is a function of both the confidence level $\alpha$ and $R$.
We find
\index{Cone of confidence}
\begin{equation}
\delta_{1- \alpha} = \cos^{-1} \left \{1- \frac{n-R}{R} \left[ \left (\frac{1}{ \alpha}\right )^{ \frac{1}{n-1}}-1 \right] \right \}.
\label{eq:exactconer}
\end{equation}
This fairly complicated expression simplifies considerably if we assume (or actually know) that $k > 7$ and
standardize our tests for $\alpha = 0.05$. Then,
\begin{equation}
\delta_{95\%} \approx 140^\circ /\sqrt{kn}
\label{eq:cone95}
\end{equation}
given in spherical degrees. We may now say that there is a 95\% probability that the true mean direction lies within the cone of
confidence specified by the angular radius $\delta_{95\%}$.
The application of the cone of confidence is more involved than
for circular data since we cannot directly compare the two angular measures defining the mean direction (e.g., strike, dip) to the
comparable quantities of a specific direction direction to be tested, here called $\mathbf{\hat{v}}$.
Instead, we need to utilize their Cartesian vector representations. The procedure is still relatively straightforward:
\begin{enumerate}
\item State the null hypothesis that the unit vectors are the same, i.e., $H_0: \mathbf{\hat{m}} = \mathbf{\hat{v}}$.
\item Take the dot-product of the mean unit vector with the test unit vector, i.e, $c = \mathbf{\hat{m}} \cdot \mathbf{\hat{v}}$.
\item Since a dot product gives the cosine of the spherical angle between two vectors we find this distance
to be given by $\psi = \cos^{-1} c$.
\item If $\psi$ exceeds $\delta_{95\%}$ then $\mathbf{\hat{v}}$ is outside the cone of confidence and we can reject
the null hypothesis; otherwise we must reserve judgment.
\end{enumerate}
For other levels of confidence we would substitute the general radius specified in (\ref{eq:exactconer}).
\begin{example}
Given the data we just analyzed we want to test if the mean vector we found is compatible with an hypothesis
that says the inclination should be 45\DS and the declination should be 120\DS. First, we find the
unit vector pointing in the same direction as the mean resultant, $\mathbf{m}$. This vector is
\begin{equation}
\mathbf{\hat{m}} = \frac{\mathbf{m}}{|\mathbf{m}|} = \frac{(-0.354, 0.564, 0.715)}{\sqrt{0.354^2 + 0.564^2 + 0.715}} = (-0.3621, 0.5770, 0.7321).
\end{equation}
We evaluate the test vector to be
\begin{equation}
\mathbf{\hat{v}} = (\cos 120^{\circ} \cdot \cos 45^{\circ}, \sin 120^{\circ} \cdot \cos 45^{\circ}, \sin 45^{\circ}) = (-0.3536, 0.6124, 0.7071).
\end{equation}
Hence, the angle between the mean and the test vector is given by their dot product:
\begin{equation}
\psi = \cos^{-1} \left (0.3621 \cdot 0.3536 + 0.5770 \cdot 0.6124 + 0.7321 \cdot 0.7071 \right ) = \cos^{-1} (0.9990) = 2.53^{\circ}.
\end{equation}
To determine the radius of the 95\% confidence cone we use (\ref{eq:cone95}) and find
\begin{equation}
\delta_{95\%} \approx 140^{\circ} / \sqrt{ 35.7 \cdot 6 } = 9.6^{\circ}.
\end{equation}
This means that the true population mean direction probably (i.e., with 95\% level confidence) lies within a spherical cone of
radius $9.6^{\circ}$ centered on the observed mean direction. Clearly, our test vector $\mathbf{v}$ also lies inside this cone.
Therefore, we cannot reject the null hypothesis that they point in the same direction.
\end{example}
\subsection{Test for equality of two mean directions}
\index{Test!equality of two mean directions}
This test is equivalent to the one administered for circular data, relying on the same $F$-statistics and
equations; see Section~\ref{sec:twoDdir} for details.
\begin{example}
We also obtained palaeomagnetic measurements from a nearby second site:
$$
\begin{array}{|l||r|r|r|r|r|r|} \hline
\mbox{Dec} & 65^{\circ} & 72^{\circ} & 51^{\circ} & 75^{\circ} & 50^{\circ} & 45^{\circ} \\ \hline
\mbox{Inc} & 25^{\circ} & 20^{\circ} & 30^{\circ} & 23^{\circ} & 18^{\circ} & 33^{\circ} \\ \hline
\end{array}
$$
We perform the conversion to Cartesian components and find
$$
\begin{array}{|l||r|r|r|r|r|r||l|r|}
\hline
x & 0.38 & 0.29 & 0.55 & 0.24 & 0.61 & 0.59 & \bar{x} & 0.443 \\ \hline
y & 0.82 & 0.89 & 0.67 & 0.89 & 0.73 & 0.59 & \bar{y} & 0.766 \\ \hline
z & 0.42 & 0.34 & 0.50 & 0.39 & 0.31 & 0.54 & \bar{z} & 0.418 \\ \hline
\end{array}
$$
which gives a resultant $R = 5.88$ and the mean resultant (and mean declination, inclination) as
\begin{equation}
\bar{R}_2 = 0.979, \quad \bar{I}_2 = 24.7^{\circ}, \quad \bar{D}_2 = 59.9^{\circ}, \quad n = 6.
\end{equation}
We would like to compare these two mean directions to determine if they are statistically equivalent at the 95\%
confidence level, i.e.,
\begin{equation}
H_0: \bar{I}_1 = \bar{I}_2, \quad \bar{D}_1 = \bar{D}_2.
\end{equation}
The hypothesis test for this equality is exactly the same test we used for 2-D directional data. Again, we use
an \emph{F}-test to determine if the resultant from the pooled data is significantly different from the linear
sum of the two individual resultants, i.e.,
\index{\emph{F}-test}
\index{Test!\emph{F}}
\begin{equation}
F= \frac{(n - 2) (R_1 + R_2 - R_p)}{n - R_1 - R_2},
\end{equation}
where again $n = n_1 + n_2$. We need to find the resultant for the combined data set of 12 points. It is found via (\ref{eq:Rlength}) and
gives $R_p = 10.50$.
We may then normalize by $n$ to find $\bar{R}_p = 0.875$. Our observed \emph{F}-statistic thus becomes
\begin{equation}
F = \frac{10(5.86 + 5.88 - 10.50)}{12 - 5.86 - 5.88} = 47.4.
\end{equation}
The observed \emph{F} value by far exceeds the critical $F_{0.05,1,10} = 4.96$ (i.e., Table~\ref{tbl:Critical_F95})
and we must reject the null hypothesis that the two directions are the same.
\end{example}
\index{Spherical data distributions|)}
\index{Data!spherical|)}
\clearpage
\section{Problems for Chapter \thechapter}
\begin{problem}
Analyze the current directions in surface waters off Hog Neck, MA for a two-day period, as listed in \emph{HogNeck.txt}.
How do the current directions vary during a 24-hour period? Do the current speeds show a similar trend?
\end{problem}
\begin{problem}
Paleocurrent measurements from ripples in sandstones contain information on past current directions.
\begin{enumerate}[label=\alph*)]
\item Use the data in \emph{ripple\_A.txt} from sandstone formation A to test if there is a preferred direction in the data. A
local sedimentologist bravely suggests that the ripples were formed by long-shore currents along a paleo-coastline
trending 200\DS. Do the data support this hypothesis?
\item A second sandstone formation (B) was also examined for paleocurrents (\emph{ripple\_B.txt}).
Do these data suggest there was any change in current direction between the formations
of these two layers?
\end{enumerate}
\end{problem}
\begin{problem}
A geologist surveys a large section of an exposed batholith and measures the strike
direction of numerous vertical joints. The file \emph{joints.txt} contains the azimuths (measured from north toward east)
of these orientations.
\begin{enumerate}[label=\alph*)]
\item At the 95\% level of confidence, is there a preferred orientation in the data? If so, what is the
preferred orientation?
\item Regional tectonic considerations seem to favor a general extensional stress regime in the west-northwest
-- east-southeast orientation. Is this explanation for the joints consistent with the data
at the 95\% level of confidence?
\item 50 km further north another exposure of the batholith reveals additional joints (\emph{morejoints.txt}). Are
they randomly oriented?
\item Do these new joints deviate significantly from the preferred orientation you determined for the first
site?
\end{enumerate}
\end{problem}
\begin{problem}
A structural geologist has measured the strikes and dips of faults in a Jurassic sedimentary
section. The observations (in degrees from north toward east) are recorded below.
Find the mean direction and the mean resultant length.
Is this direction significant at the 95\% level of confidence?
$$
\begin{array}{|c||c|c|c|c|c|c|c|c|c|c|c|c|} \hline
\mbox{\bf{Strike}} & 142^{\circ} & 157^{\circ} & 145^{\circ} & 150^{\circ} & 149^{\circ} & 156^{\circ} & -13^{\circ} & 139^{\circ} & 155^{\circ} & -9^{\circ} & 148^{\circ} & 142^{\circ} \\ \hline
\mbox{\bf{Dip}} & 45^{\circ} & 50^{\circ} & 52^{\circ} & 38^{\circ} & 56^{\circ} & 44^{\circ} & 8^{\circ} & 49^{\circ} & 46^{\circ} & 11^{\circ} & 59^{\circ} & 43^{\circ} \\ \hline
\end{array}
$$
\end{problem}
\begin{problem}
Principal fracturing directions for three oil fields near Odessa, TX are given in \emph{odessa\_north.txt},
\emph{odessa\_northwest.txt} and \emph{odessa\_west.txt}. Compare the mean directions of fractures from
the three fields. Are they significantly different at the 95\% level of confidence?
\end{problem}
\begin{problem}
Measurements of magnetic remanence (given as declination and inclination) from a sandstone in Western
Australia are listed in \emph{Tumblagooda.txt}.
\begin{enumerate}[label=\alph*)]
\item Compute the mean direction and the mean resultant length.
\item Estimate the angular radius of the 95\% confidence cone.
\item Is this direction significant at the 95\% level of confidence?
\end{enumerate}
\end{problem}
\begin{problem}
Scientific drilling in the Caribbean Sea on-board the \emph{Glomar Challenger} resulted in a set of paleomagnetic data (reproduced in \emph{glomar.txt}).
\begin{enumerate}[label=\alph*)]
\item Determine if these data (declination clockwise from north, inclination downward from horizontal) have a
preferred direction at the 95\% level of confidence.
\item Compare this mean paleomagnetic vector to the present field
at this site (declination = 354.3\DS and inclination = 46.4\DS).
Do these two directions differ significantly at the 95\% level of confidence?
What does your analysis suggest for the plate tectonic motions for this region?
\end{enumerate}
\end{problem}