-
Notifications
You must be signed in to change notification settings - Fork 6
/
catalogue_tools.tex
962 lines (685 loc) · 58.6 KB
/
catalogue_tools.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
\section{The Earthquake Catalogue}
The seismicity tools are intended for use in deriving activity rates from an observed earthquake catalogue, which may include both instrumental and historical seismicity. The tools are broken down into five separate libraries: i) Declustering, ii) Completeness, iii) Calculation of Gutenberg-Richter a- and b-value, iv) Statistical estimators of maximum magnitude from seismicity) and v) Smoothed Seismicity. In a common use case it is likely that many of the above methods, particularly recurrence and maximum magnitude estimation, may need to be applied to a selected sub-catalogue (e.g. earthquakes within a particular polygon). The toolkit allows for the creation of a source model containing one or more of the supported OpenQuake seismogenic source typologies, which can be used as a reference for selection, e.g. events within an area source (polygon), events within a distance of a fault etc. The supported input formats for both the catalogue are described below, and the source models in the subsequent chapter.
\subsection{The Catalogue Format and Class}
The input catalogue must be formatted as a comma-separated value file (.csv), with the following attributes in the header line (attributes with an * indicate essential attributes), although the order of the columns need not be fixed:
\begin{table}
\begin{tabular}{|l|l|} \hline
Attribute & Description \\ \hline
eventID* & A unique identifier (integer) for each earthquake in the catalogue \\
Agency & The code (string) of the recording agency for the event solution \\
year* & Year of event (integer) in the range -10000 to present \\
& (events before common era (BCE) should have a negative value)\\
month* & Month of event (integer)\\
day* & Day of event (integer) \\
hour* & Hour of event (integer) - if unknown then set to 0 \\
minute* & Minute of event (integer) - if unknown then set to 0 \\
second* & Second of event (float) - if unknown set to 0.0 \\
timeError & Error in event time (float) \\
longitude* & Longitude of event, in decimal degrees (float) \\
latitude* & Latitude of event, in decimal degrees (float) \\
SemiMajor90 & Length (km) of the semi-major axis of the 90 \% \\
& confidence ellipsoid for location error (float) \\
SemiMinor90 & Length (km) of the semi-minor axis of the 90 \% \\
& confidence ellipsoid for location error (float) \\
ErrorStrike & Azimuth (in degrees) of the 90 \% \\
& confidence ellipsoid for location error (float) \\
depth* & Depth (km) of earthquake (float)\\
depthError & Uncertainty (as standard deviation) in earthquake depth (km) (float)\\
magnitude* & Homogenised magnitude of the event (float) - typically Mw \\
sigmaMagnitude* & Uncertainty on the homogenised magnitude (float) typically Mw \\ \hline
\end{tabular}
\caption{List of Attributes in the Earthquake Catalogue File (* Indicates Essential)}
\label{tab: EQCatalogueFormat}
\end{table}
To load the catalogue using the IPython environment, in an open IPython session type:
%\begin{Verbatim}[frame=single, commandchars=\\\{\}, fontsize=\scriptsize, samepage=true]
\begin{python}
>> from openquake.hmtk.parsers.catalogue import CsvCatalogueParser
>> catalogue_filename = 'path/to/catalogue_file.csv'
>> parser = CsvCatalogueParser(catalogue_filename)
>> catalogue = parser.read_file()
\end{python}
\textbf{N.B. the csv file can contain additional attributes of the catalogue too and will be parsed correctly; however, if the attribute is not one that is specifically recognised by the catalogue class then a message will be displayed indicating:}
\begin{Verbatim}[frame=single, commandchars=\\\{\}, fontsize=\scriptsize, samepage=true]
Catalogue Attribute ... is not a recognised catalogue key
\end{Verbatim}
\textbf{This is expected behaviour and simply indicates that although this data is given in the input file, it is not retained in the data dictionary.}
The variable \verb=catalogue= is an instance of the class openquake.hmtk.seismicity.catalogue.Catalogue, which now contains the catalogue itself (as \verb=catalogue.data=) and some methods that can be applied to the catalogue. The first attribute (\verb=catalogue.data=), is a dictionary where each attribute of the catalogue is either a 1-D numpy vector (for float and integer values) or a python list (for string values). For example, to return a vector containing all the magnitudes in the \verb=magnitude= column of the catalogue simply type:
\begin{python}
>> catalogue.data['magnitude']
array([ 6.5, 6.5, 6. , ..., 4.8, 5.2, 4.1])
\end{python}
The catalogue class contains several helpful methods (called via \verb=catalogue. ...=):
\begin{itemize}
\item \verb=catalogue.get_number_events()= Returns the number of events currently in the catalogue (integer)
\item \verb=catalogue.load_to_array(keys)= Returns a numpy array of floating data, with the columns ordered according to the list of keys. If the key corresponds to a string item (e.g. Agency) then an error will be raised.
\begin{python}[frame=single]
>> catalogue.load_to_array(['year', 'longitude', 'latitude',
'depth', 'magnitude'])
array([[ 1910. , 26.941 , 38.507 , 13.2 , 6.5 ],
[ 1910. , 22.190 , 37.720 , 20.4 , 6.5 ],
[ 1910. , 28.881 , 33.274 , 25.0 , 6.0 ],
...,
[ 2009. , 20.054 , 39.854 , 20.2 , 4.8 ],
[ 2009. , 23.481 , 38.050 , 15.2 , 5.2 ],
[ 2009. , 28.959 , 34.664 , 18.4 , 4.1 ]])
\end{python}
\item \verb=catalogue.load_from_array(keys, data_array)= Creates the catalogue data dictionary from an array, given header as an ordered list of dictionary keys. This can be used in the case where the earthquake catalogue is loaded in a simple ascii format. For example, if the user wishes to load in a catalogue from the Zmap format, which gives the columns as:
\begin{verbatim}
longitude, latitude, year, month, day, magnitude, depth, hour,
minute, second
\end{verbatim}
This file type could be parsed into a catalogue without the need of a specific parser, as follows:
\begin{python}[frame=single]
>> import numpy
# Assuming no headers in the file
# (set skip_header=1 if headers are found)
>> data = numpy.genfromtxt('PATH/TO/ZMAP_FILE.txt',
skip_header=0)
>> headers = ['longitude', 'latitude', 'year', 'month',
'day', 'magnitude', 'depth', 'hour',
'minute', 'second']
# Create instance of a catalogue class
>> from openquake.hmtk.seismicity.catalogue import Catalogue
>> catalogue = Catalogue()
# Load the data array into the catalogue
>> catalogue.load_from_array(data, headers)
\end{python}
\item \verb=catalogue.get_decimal_time()=
Returns the time of the earthquake in a decimal format
\item \verb=catalogue.hypocentres_as_mesh()=
Returns the hypocentres of an earthquake as an instance of the class \\ ``openquake.hazardlib.geo.mesh.Mesh'' (useful for geospatial functions)
\item \verb=catalogue.hypocentres_to_cartesian()=
Returns the hypocentres in a 3D cartesian framework
\item \verb=catalogue.purge_catalogue(flag_vector)=
Purges the catalogue of all \verb=False= events in the boolean vector. Thus is used for removing foreshocks and aftershocks from a catalogue after the application of a declustering algorithm.
\item \verb=catalogue.sort_catalogue_chronologically()=
Sorts an input into chronological order. \\
\emph{N.B. Some methods will implicitly assume that the catalogue is in chronological order, so it is recommended to run this function if you believe that there may be events out of order}
\item \verb=catalogue.select_catalogue_events(IDX)=
Orders the catalogue according to the event order specified in IDX. Behaves the same as \verb=purge_catalogue(IDX)= if IDX is a boolean vector
\item \verb;catalogue.get_depth_distribution(depth_bins, normalisation=False,;\\
\verb; bootstrap=None);
Returns a depth histogram for the catalogue using bins specified by \verb=depth_bins=. If \verb;normalisation=True; then the function will return the histogram as a probability mass function, otherwise the original count will be returned. If uncertainties are reported on depth such that one or more values in \\ \verb=catalogue.data['depthError']= are greater than 0., the function will perform a bootstrap analysis, taking into account the depth error, with the number of bootstraps given by the keyword \verb=bootstrap=.
\begin{python}[frame=single]
# Import numpy and matplotlib
>> import numpy as np
>> import matplotlib.pyplot as plt
# Define depth bins for (e.g)
# 0. - 150 km in intervals of 10 km
>> depth_bins = np.arange(0., 160., 10.)
# Get normalised histograms (without bootstrapping)
>> depth_hist = catalogue.get_depth_distribution(
depth_bins,
normalisation=True)
\end{python}
To generate a simple histogram plot of hypocentral depth, the process below can be followed to produce a depth histogram similar to the one shown in Figure \ref{fig:simple_depth_hist}:
\begin{python}[frame=single]
>> from openquake.hmtk.plotting.seismicity.catalogue_plots import\
plot_depth_histogram
>> depth_bin = 5.0
>> plot_depth_histogram(catalogue,
depth_bin,
filename="/path/to/image.eps",
filetype="eps")
\end{python}
\begin{figure}[htb]
\centering
\includegraphics[trim=10mm 8mm 10mm 10mm, clip, width=12cm]{./figures/simple_depth_histogram.eps}
\caption{Example depth histogram}
\label{fig:simple_depth_hist}
\end{figure}
\item \verb;catalogue.get_magnitude_depth_distribution(magnitude_bins, depth_bins,;\\
\verb; normalisation=False, bootstrap=None);
Returns a two-dimensional histogram of magnitude and hypocentral depth, with the corresponding bins defined by the vectors \verb=magnitude_bins= and \verb=depth_bins=. The options \verb=normalisation= and \verb=bootstrap= are the same as for the one dimensional histogram. The usage is illustrated below:
\begin{python}[frame=single]]
# Define depth bins for (e.g)
# 0. - 150 km in intervals of 55 km
>> depth_bins = np.arange(0., 155., 5.)
# Define magnitude bins (e.g.) 2.5 - 7.6 in intervals of 0.1
>> magnitude_bins = np.arange(2.5, 7.7, 0.1)
# Get normalised histograms (without bootstrapping)
>> m_d_hist = catalogue.get_magnitude_depth_distribution(
magnitude_bins,
depth_bins,
normalisation=True,
bootstrap=None)
\end{python}
To generate a plot of magnitude-depth density, the following function can be used to produce a figure similar to that shown in Figure \ref{fig:mag_depth_density}.
\begin{python}[frame=single]
>> from openquake.hmtk.plotting.seismicity.catalogue_plots import\
plot_magnitude_depth_density
>> magnitude_bin = 0.1
>> depth_bin = 5.0
>> plot_magnitude_depth_density(
catalogue,
magnitude_bin
depth_bin,
logscale=True, \# Logarithmic colour scale
filename="/path/to/image.eps", \# Optional
filetype="eps") \# Optional
\end{python}
\begin{figure}[htb]
\centering
\includegraphics[trim=10mm 10mm 10mm 10mm, clip, width=14cm]{./figures/magnitude_depth_density.eps}
\caption{Example magnitude-depth density plot}
\label{fig:mag_depth_density}
\end{figure}
\item \verb;catalogue.get_magnitude_time_distribution(magnitude_bins, time_bins,;\\
\verb; normalisation=False, bootstrap=None);
Returns a 2D histogram of magnitude with time. \verb=time_bins= are the bin edges for the time windows, in decimal years. To run the function simple follow:
\begin{python}[frame=single]
# Define annual time bins from 1900 CE to 2012 CE
>> time_bins = np.arange(1900., 2013., 1.)
# Define magnitude bins (e.g.) 2.5 - 7.6 in intervals of 0.1
>> magnitude_bins = np.arange(2.5, 7.7, 0.1)
# Get normalised histograms (without bootstrapping)
>> mag_time_hist = catalogue.get_magnitude_time_distribution(
magnitude_bins,
time_bins,
normalisation=True,
bootstrap=None)
\end{python}
To automatically generate a plot, similar to that shown in Figure \ref{fig:mag_time_density} , run the following:
\begin{python}[frame=single]
>> from openquake.hmtk.plotting.seismicity.catalogue_plots import\
plot_magnitude_time_density
>> magnitude_bin_width = 0.1
>> time_bin_width = 0.1
>> plot_magnitude_time_density(catalogue,
magnitude_bin_width,
time_bin_width,
filename="/path/to/image.eps",
filetype="eps")
\end{python}
\begin{figure}[htb]
\centering
\includegraphics[trim=10mm 8mm 10mm 10mm, clip, width=12cm]{./figures/magnitude_time_density.eps}
\caption{Example magnitude-time density plot}
\label{fig:mag_time_density}
\end{figure}
\end{itemize}
\subsection{The ``Selector'' Class}
In the process of constructing a PSHA seismogenic source model from seismicity it is necessary to select sub-sets of the earthquake catalogue, usually for calculating earthquake recurrence statistics pertinent to a particular region or seismogenic source. As catalogue selection is such a prevalent aspect of the source modelling process, the selection is done inside the HMTK via the use of a "Selector" tool. This tools is a container for all methods associated with the selection of sub-catalogues from a given earthquake catalogue. It will be seen in due course that later methods relating to the selection of the catalogue for a particular source require as an input an instance of the selector class, rather than the catalogue itself.
To setup the ``Selector'' tool:
\begin{python}[frame=single]
>> from openquake.hmtk.seismicity.selector import CatalogueSelector
# Assuming that there already exists a
# catalogue named ''catalogue1''
>> selector1 = CatalogueSelector(catalogue1,
create_copy=True)
\end{python}
The optional keyword \verb=create_copy= ensures that when the events not selected are purged from the catalogue a ``deepcopy'' is taken of the original catalogue. This ensures that the original catalogue remains unmodified when a subset of events is selected.
The catalogue selector class has the following methods:
\verb;.within_polygon(polygon, distance=None);
Selects events within a polygon described by the class \verb=openquake.hazardlib.geo.=\\\verb=polygon.Polygon=. \verb=distance= is the distance (in km) to use as a buffer, if required. Optional keyword arguments \verb=upper_depth= and \verb=lower_depth= can be used to limit the depth range of the catalogue returned by the selector to only those events whose hypocentres are within the specified depth limits.
\verb;.circular_distance_from_point(point, distance, distance_type="epicentral");
Selects events within a distance from the a location. The location (\verb=point=) is an instance of the openquake.hazardlib.geo.point.Point class, whilst \verb=distance= is the selection distance (km) and \verb=distance_type= can be either "epicentral" or "hypocentral".
\verb;.cartesian_square_centred_on_point(point, distance);
Selects events within a square of side length \verb=distance=, on a location (represented as an openquake \verb=Point= class).
\verb;.within_joyner_boore_distance(surface, distance);
Returns earthquakes within a distance (km) of the surface projection (``Joyner-Boore'' distance) of a fault surface. The fault surface must be defined as an instance of the class \\\verb=openquake.hazardlib.geo.surface.simple_fault.SimpleFaultSurface= or\\ \verb=openquake.hazardlib.geo.surface.complex_fault.ComplexFaultSurface=.
\verb;.within_rupture_distance(surface, distance);
Returns earthquakes within a distance (km) of a fault surface. The fault surface must be defined as an instance of the class \\\verb=openquake.hazardlib.geo.surface.simple_fault.SimpleFaultSurface= or\\ \verb=openquake.hazardlib.geo.surface.complex_fault.ComplexFaultSurface=.
\verb;.within_time_period(start_time=None, end_time=None);
Selects earthquakes within a time period. Times must be input as instances of a \verb=datetime= object. For example:
\begin{python}[frame=single]
>> from datetime import datetime
>> selector1 = CatalogueSelector(catalogue1, create_copy=True)
# Early time limit is 1 January 1990 00:00:00
>> early = datetime(1990, 1, 1, 0, 0, 0)
# Late time limit is 31 December 1999 23:59:59
>>: late = datetime(1999, 12, 31, 23, 59, 59)
>> catalogue_nineties = selector1.within_time_period(
start_time=early,
end_time=late)
\end{python}
\verb;.within_depth_range(lower_depth=None, upper_depth=None);
Selects earthquakes whose hypocentres are within the range specified by the lower depth limit (\verb=lower_depth=) and the upper depth limit (\verb=upper_depth=), both in km.
\verb;.within_magnitude_range(lower_mag=None, upper_mag=None);
Selects earthquakes whose magnitudes are within the range specified by the lower limit (\verb=lower_mag=) and the upper limit (\verb=upper_mag=).
\section{Declustering}
To identify Poissonian rate of seismicity, it is necessary to remove foreshocks/aftershocks/swarms from the catalogue. The Modeller's Toolkit contains, at present, two algorithms to undertake this task, with more under development.
\subsection{\textcite{GardnerKnopoff1974}}
The most widely applied simple windowing algorithm is that of
\textcite{GardnerKnopoff1974}. Originally conceived for Southern California,
the method simply identifies aftershocks by virtue of fixed time-distance
windows proportional to the magnitude of the main shock. Whilst this
premise is relatively simple, the manner in which the windows are
applied can be ambiguous. Four different possibilities can be
considered \parencite{LuenStark2012}:
\begin{enumerate}
\item Search events in magnitude-descending order. Remove events if it is
in the window of the largest event
\item Remove every event that is inside the window of a previous event,
including larger events
\item An event is in a cluster if, and only if, it is in the window of at
least one other event in the cluster. In every cluster remove all
events except the largest
\item In chronological order, if the $i^{th}$ event is in the window of a
preceding larger shock that has not already been deleted, remove it.
If a larger shock is in the window of the $i^{th}$ event, delete the
$i^{th}$ event. Otherwise retain the $i^{th}$ event.
\end{enumerate}
It is the first of the four options that is implemented in the current
toolkit, whilst others may be considered in future. The algorithm is
capable if identifying foreshocks and aftershocks, simply by applying
the windows forward and backward in time from the mainshock.
No distinction is made between primary aftershocks (those resulting
from the mainshock) and secondary or tertiary aftershocks (those
originating due to the previous aftershocks); however, it is assumed
all would occur within the window.
Several modifications to the time and distance windows have been
suggested, which are summarised in \textcite{vanStiphout2012}. The windows
originally suggested by \textcite{GardnerKnopoff1974} are approximated by:
\begin{equation}\begin{split}
\mbox{distance (km)} = &10^{0.1238 M + 0.983}\\
\mbox{time (decimal years)} = &
\begin{cases} 10^{0.032 M + 2.7389} & \text{if $M \geq 6.5$} \\
10^{0.5409 M - 0.547} & \mbox{otherwise} \end{cases}\end{split}
\end{equation}
An alternative formulation is proposed by Gr\"unthal (as reported in \textcite{vanStiphout2012}):
\begin{equation}\begin{split}
\mbox{distance (km)} = & e^{1.77 + \left( {0.037 + 1.02 M} \right)^2} \\
\mbox{time (decimal years)} = & \begin{cases} |e^{-3.95+ \left( {0.62 + 17.32 M}
\right)^2}| & \text{if $M \geq 6.5$ } \\ 10^{2.8 + 0.024 M} &
\text{otherwise} \end{cases}\end{split}
\end{equation}
A further alternative is suggested by \textcite{Uhrhammer1986}
%
\begin{equation}
\mbox{distance (km)} = e^{-1.024 + 0.804 M} \quad \mbox{time (decimal years)} =
e^{-2.87 + 1.235 M}
\end{equation}
A comparison of the expected window sizes with magnitude are shown for
distance and time (Figure \ref{fig:declust_scaling}).
\begin{figure}[htb]
\centering
\begin{subcaption}
\centering
\includegraphics[width=8cm]{./figures/declustering_distance_windows.eps}
\end{subcaption}
\begin{subcaption}
\centering
\includegraphics[width=8cm]{./figures/declustering_time_windows.eps}
\end{subcaption}
\caption{Scaling of declustering time and distance windows with magnitude}
\label{fig:declust_scaling}
\end{figure}
The \textcite{GardnerKnopoff1974} algorithm and its derivatives represent
are most computationally straightforward approach to declustering. The \verb=time_dist_windows= attribute indicates the choice of the
time and distance window scaling model from the three listed. As
the current version of this algorithm considers the events in a
descending-magnitude order, the parameter \verb=foreshock_time_window=
defines the size of the time window used for searching for foreshocks,
as a fractional proportion of the size of the aftershock window (the
distance windows are always equal for both fore- and aftershocks).
So for an evenly sized time window for foreshocks and aftershocks,the\\
\verb=foreshock_time_window= parameter should equal 1. For shorter or longer
foreshock time windows this parameter can be reduced or increased respectively.
To run a declustering analysis on the earthquake catalogue it is necessary to set-up the configuration using a python dictionary (see Appendix \ref{sec:python_guide}). A config file for the \textcite{GardnerKnopoff1974} algorithm, using for example the \textcite{Uhrhammer1986} time-distance windows with equal sized time window for aftershocks and foreshocks, would be created as shown:
\begin{python}[frame=single]
>> from openquake.hmtk.seismicity.declusterer.distance_time_windows import\
UhrhammerWindow
>> declust_config = {
'time_distance_window': UhrhammerWindow(),
'fs_time_prop': 1.0}
\end{python}
To run the declustering algorithm simply import and run the algorithm as shown:
\begin{python}[frame=single]
>> from openquake.hmtk.seismicity.declusterer.dec_gardner_knopoff import\
GardnerKnopoffType1
>> declustering = GardnerKnopoffType1()
>> cluster_index, cluster_flag = declustering.decluster(
catalogue,
declust_config)
\end{python}
There are two outputs of a declustering algorithm: \verb=cluster_index= and \verb=cluster_flag=. Both are numpy vectors, of the same length as the catalogue, containing information about the clusters in the catalogue. \verb=cluster_index= indicates the cluster to which each event is assigned (0 if not assigned to a cluster). \verb=cluster_flag= indicates whether an event is a non-Poissonian event, in which case the value is assigned to 1, or a mainshock, the value is assigned as 0. This output definition is the same for all declustering algorithms.
At this point the user may wish to either retain the catalogue in its current format, in which case they may wish to add on the clustering information into another attribute of the catalogue.data dictionary, or they may wish to purge the catalogue of non-Poissonian events.
To simply add the clustering information to the data dictionary simply type:
\begin{python}[frame=single]
>> catalogue.data['Cluster_Index'] = cluster_index
>> catalogue.data['Cluster_Flag'] = cluster_flag
\end{python}
Alternatively, to purge the catalogue of non-Poissonian events:
\begin{python}[frame=single]
>> mainshock_flag = cluster_flag == 0
>> catalogue.purge_catalogue(mainshock_flag)
\end{python}
\subsection{AFTERAN \parencite{Musson1999PSHABalkan}}
A particular development of the standard windowing approach is introduced in the program AFTERAN \parencite{Musson1999PSHABalkan}. This is a modification of the \textcite{GardnerKnopoff1974} algorithm, using a moving time window rather than a fixed time window. In AFTERAN, considering each earthquake in order of descending magnitude, events within a fixed distance window are identified (the distance window being those suggested previously). These events are searched using a moving time window of T days. For a given mainshock, non Poissonian events are identified if they occur both within the distance window and the initial time window. The time window is then moved, beginning at the last flagged event, and the process repeated. For a given mainshock, all non-Poissonian events are identified when the algorithm finds a continuous period of T days in which no aftershock or foreshock is identified.
The theory of the AFTERAN algorithm is broadly consistent with that of \textcite{GardnerKnopoff1974}. This algorithm, whilst a little more computationally complex, and therefore slower, than the \textcite{GardnerKnopoff1974} windowing approach, remains simple to implement.
As with the \textcite{GardnerKnopoff1974} function, the \verb=time_dist_window= attribute indicates the choice of the time and distance window scaling model. The parameter \verb=time_window= indicates the size (in days) of the moving time window used to identify fore- and aftershocks. The following example will show how to run the AFTERAN algorithm, using the \textcite{GardnerKnopoff1974} definition of the distance windows, and a fixed-width moving time window of 100 days.
\begin{python}[frame=single]
>> from openquake.hmtk.seismicity.declusterer.dec_afteran import\
Afteran
>> from openquake.hmtk.seismicity.declusterer.distance_time_windows import\
GardnerKnopoffWindow
>> declust_config = {
'time_distance_window': GardnerKnopoffWindow(),
'time_window': 100.0}
>> declustering = Afteran()
>> cluster_index, cluster_flag = declustering.decluster(
catalogue,
declust_config)
\end{python}
%::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
\section{Completeness}
In the earliest stages of processing an instrumental seismic catalogue to derive inputs for seismic hazard analysis, it is necessary to determine the magnitude completeness threshold of the catalogue. To outline the meaning of the term ''magnitude completeness'' and the requirements for its analysis as an input to PSHA, the terminology of \textcite{MignanWoessner2012} is adopted. This defines the magnitude of completeness as the ''lowest magnitude at which 100 \% of the events in a space-time volume are detected \parencite{RydelekSacks1989, WoessnerWiemer2005}''. Incompleteness of an earthquake catalogue will produce bias when determining models of earthquake recurrence, which may have a significant impact on the estimation of hazard at a site. Identification of the completeness magnitude of an earthquake catalogue is therefore a clear requirement for the processing of input data for seismic hazard analysis.
It should be noted that this summary of methodologies for estimating completeness is directed toward techniques that can be applied to a ''typical'' instrumental seismic catalogue. We therefore make the assumption that the input data will contain basic information for each earthquake such as time, location, magnitude. We do not make the assumption that network-specific or station-specific properties (e.g., configuration, phase picks, attenuation factors) are known a priori. This limits the selection of methodologies to those classed as estimators of ''sample completeness'', which defines completeness on the basis of the statistical properties of the earthquake catalogue, rather than ''probability-based completeness'', which defines the probability of detection given knowledge of the properties of the seismic network \parencite{SchorlemmerWoessner2008}. This therefore excludes the methodology of \textcite{SchorlemmerWoessner2008}, and similar approaches such as that of \textcite{Felzer2008}
The current workflows assume that completeness will be applied to the whole catalogue, ideally returning a table of time-varying completeness. The option to explore spatial variation in completeness is not explicitly supported, but could be accommodated by an appropriate configuration of the toolkit.
In the current version of the Modeller's Toolkit the \textcite{Stepp1971} methodology for analysis of catalogue completeness is implemented. Further methods are in development, and will be input in future releases.
%\subsection{User-defined Table}
%
%This is simply a filtering that will remove from further consideration any events outside of the completeness bounds defined by the user. The table represents the time variation in $M_C$ and can be input as a separate file (in comma-separated value format) in the following format.
%\begin{Verbatim}[frame=single, commandchars=\\\{\}, fontsize=\scriptsize]
%1990.0, 4.0\\
%1960.0, 5.0\\
%1900.0, 6.0\\
%1700.0, 7.0\\
%\end{Verbatim}
%
%The left-hand column represents the earliest year at which the earthquake is complete at the corresponding magnitude in the right-hand column. \\
%\textbf{Important: The values in the completeness file must be entered from most-recent to oldest!}
\subsection{\cite{Stepp1971}}
This is one of the earliest analytical approaches to estimation of completeness magnitude. It is based on estimators of the mean rate of recurrence of earthquakes within given magnitude and time ranges, identifying the completeness magnitude when the observed rate of earthquakes above $M_C$ begins to deviate from the expected rate. If a time interval ($T_i$) is taken, and the earthquake sequence assumed Poissonian, then the unbiased estimate of the mean rate of events per unit time interval of a given sample is:
\begin{equation}
\lambda = \frac{1}{n} \sum_{i = 1}^{n} T_i
\end{equation}
with variance $\sigma_{\lambda}^{2} = \lambda / n$. Taking the unit time interval to be 1 year, the standard deviation of the estimate of the mean is:
\begin{equation}
\sigma_{\lambda} = \sqrt{\lambda} / \sqrt{T}
\end{equation}
where $T$ is the sample length. As the Poisson assumption implies a stationary process, $\sigma_{\lambda}$ behaves as $1/\sqrt{T}$ in the sub-interval of the sample in which the mean rate of occurrence of a magnitude class is constant. Time variation of $M_C$ can usually be inferred graphically from the analysis, as is illustrated in Figure \ref{fig:SteppFigExample1}. In this example, the deviation from the $1/\sqrt{T}$ line for each magnitude class occurs at around 40 years for $4.5 < M < 5$, 100 years for $5.0 < M < 6.0$, approximately 150 years for $6.0 < M < 6.5$ and 300 years for $M > 6.5$. Knowledge of the sources of earthquake information for a given catalogue may usually be reconciled with the completeness time intervals.
\begin{figure}[htb]
\centering
\includegraphics[height=10cm, keepaspectratio=true]{./figures/C2Fig1SteppFig1.eps}
\caption{Example of Completeness Estimation by the \textcite{Stepp1971} methodology}
\label{fig:SteppFigExample1}
\end{figure}
The analysis of \textcite{Stepp1971} is a coarse, but relatively robust, approach to estimating the temporal variation in completeness of a catalogue. It has been widely applied since its development. The accuracy of the completeness magnitude depends on the magnitude and time intervals considered, and a degree of judgement is often needed to determine the time at which the rate deviates from the expected values. It has tended to be applied to catalogues on a large scale, and for relatively higher completeness magnitudes.
To translate the methodology from a largely graphical methods into a computational method the completeness period needs to be identified by automatically identifying the point at which the gradient of the observed values decreases with respect to that expected from a Poisson process (see \ref{fig:SteppFigExample1}). In the implementation found within the current toolkit, the divergence point is identified by fitting a two-segment piecewise linear function to the observed data. Although a two-segment piecewise linear function is normally fit with four parameters (intercept, $slope_1$, $slope_2$ and crossover point), by virtue of the assumption that for the complete catalogue the rate is assumed to be stationary such that $\sigma_{\lambda} = \frac{1}{\sqrt{T}}$ the slope of the first segment can be fixed as $-0.5$, and the second slope should be constrained such that $slope_2 \leq -0.5$, whilst the crossover point ($x_c$) is subject to the constraint ($x_c \geq 0.0$). Thus it is possible to fit the two-segment linear function using constrained optimisation with only three free parameters. For this purpose the toolkit minimises the residual sum-of-squares of the model fit using numerical optimisation.
To run the \textcite{Stepp1971} algorithm the configuration parameters should be entered in the form of a dictionary, such as the example shown below:
\begin{python}[frame=single]
comp_config = {'magnitude_bin': 0.5,
'time_bin': 5.,
'increment_lock': True}
\end{python}
The algorithm has three configurable options. The \verb=time_bin= parameter describes the size of the time window in years, the \verb=magnitude_bin= parameter describes the size of the magnitude bin, sensitivity is as described previously. The final option (\verb=increment_lock=) is an option that is used to ensure consistency in the results to avoid the completeness magnitude increasing for the latest intervals in the catalogue simply due to the variability associated with the short duration. If \verb=increment_lock= is set to \verb=True=, the program will ensure that the completeness magnitude for shorter, more recent windows is less than or equal to that of older, longer windows. This is often a condition for some recurrence analysis tools, so it may be advisable to set this option to true in certain workflows. Otherwise it should be set to \verb=False to show the apparent variability=. Some degree of judgement is necessary here. In particular it is expected that the user may be aware of circumstances particular to their catalogue for which a recent increase in completeness magnitude is expected (for example, a certain recording network no longer operational).
The process of running the algorithm is shown below:
\begin{python}[frame=single]
>> from openquake.hmtk.seismicity.completeness.comp_stepp_1971 import\
Stepp1971
>> completeness_algorithm = Stepp1971()
>> completeness_table = completeness_algorithm.completeness(
catalogue,
comp_config)
>> completeness_table
array([[ 1990. , 4.25],
[ 1962. , 4.75],
[ 1959. , 5.25],
[ 1906. , 5.75],
[ 1906. , 6.25],
[ 1904. , 6.75],
[ 1904. , 7.25]])
\end{python}
As shown in the resulting \verb=completeness_table=, the completeness algorithm will output the time variation in completeness (in this example with the \verb=increment_lock= set) in the form of a two-column table with column 1 indicating the completeness year for the magnitude bin centred on the magnitude value found in column 2.
At present, it may be the case that the user wishes to enter a time-varying completeness results for use in subsequent functions, based on alternative methods or on judgement. This can be entered in the \verb=completeness_table= setting, as in the example shown here (take note of the requirements for the square brackets):
\begin{python}[frame=single]
completeness_table: [[1990., 4.0],
[1960., 5.0],
[1930., 6.0],
[1900., 6.5]]
\end{python}
If a \verb=completeness_table= is input then this will override the selection of the completeness algorithm, and the calculation will take the values in \verb=completeness_table= directly.
%::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
\section{Recurrence Models}
The current sets of tools are intended to determine the parameters of the \textcite{GutenbergRichter1944} recurrence model, namely the a- and b-value. It is expected that in the most common use case the catalogue that is input to these algorithms will be declustered, with a time-varying completeness defined according to a \verb=completeness_table= of the kind shown previously. If no \verb=completeness_table= is input the algorithm will assume the input catalogue is complete above the minimum magnitude for its full duration.
\subsection{\textcite{Aki1965}}
The classical maximum likelihood estimator for a simple unbounded \textcite{GutenbergRichter1944} model is that of \textcite{Aki1965}, adapted for binned magnitude data by \textcite{Bender1983}. It assumes a fixed completeness magnitude ($M_C$) for the catalogue, and a simple power law recurrence model. It does not explicitly take into account magnitude uncertainty.
\begin{equation}
b = \frac{ \log_{10} \left( e \right)}{ \bar{m} - m_0 + \left( {\frac{\Delta M}{2}} \right)}
\end{equation}
\noindent where $\bar{m}$ is the mean magnitude, $m_0$ the minimum magnitude and $\Delta M$ the discretisation interval of magnitude within a given sample.
\subsection{Maximum Likelihood}
This method adjusts the \textcite{Aki1965} and \textcite{Bender1983} method to incorporate for time variation in completeness. The catalogue is divided into
into S sub-catalogues, where each sub-catalogue corresponds to a period
with a corresponding $M_C$. An average a- and b-value (with uncertainty) is returned by taking
the mean of the a- and b-value of each sub-catalogue, weighted by
the number of events in each sub-catalogue.
\begin{equation}
\hat{b} = \frac{1}{S} \sum_{i = 1}^{S} w_i b_i
\end{equation}
\begin{python}[frame=single]
>> mle_config = {'magnitude_interval': 0.1,
'Average Type': 'Weighted',
'reference_magnitude': None}
>> from openquake.hmtk.seismicity.occurrence.b_maximum_likelihood import\
BMaxLikelihood
>> recurrence = BMaxLikelihood()
>> bval, sigmab, aval, sigmaa = recurrence.calculate(
catalogue,
mle_config,
completeness=completeness_table)
\end{python}
Where \verb=magnitude_window= indicates the size of the magnitude bin, \verb=recurrence_algorithm= and \verb=reference_magnitude= the magnitude for which the output calculates that rate greater than or equal to (set to \verb=0= for $10^{a}$).
\subsection{\textcite{KijkoSmit2012}}
A recent adaption of the \textcite{Aki1965} estimator of b-value for a catalogue containing different completeness periods has been proposed by \textcite{KijkoSmit2012}. Dividing the earthquake catalogue into $s$ subcatalogues of $n_i$ events with corresponding completeness magnitudes $m_{c_i}$ for $i = 1, 2, ..., s$, the likelihood function of $\beta$ where $\beta = b \ln \left( {10.0} \right)$ is given as:
\begin{equation}
\mathbf{L} = \prod_{i = 1}^{s} \prod_{j = 1}^{n_i} \beta \exp(\left[ {-\beta \left( {m_j^i - m_{min}^i } \right) } \right])
\end{equation}
\noindent which gives a maximum likelihood estimator of $\beta$:
\begin{equation}
\beta = \left( {\frac{r_1}{\beta_1} + \frac{r_2}{\beta_2} + \dots + \frac{r_s}{\beta_s}} \right)^{-1}
\end{equation}
\noindent where $r_i = n_i / n$ and $n = \sum_{i = 1}^{s} n_i$ above the level of completeness $m_i$.
\begin{python}[frame=single]]
>> kijko_smit_config = {'magnitude_interval': 0.1,
'reference_magnitude': None\}
\end{python}
\subsection{\textcite{Weichert1980}}
Recognising the typical conditions of an earthquake catalogue, \textcite{Weichert1980} developed a maximum likelihood estimator of $b$ for grouped magnitudes and unequal periods of observation. The likelihood formulation for this approach is:
\begin{equation}
\mathbf{L} \left( {\beta | n_i, m_i, t_i} \right) = \frac{ N!}{\prod_i n_i!} \prod_i p_{i}^{n_i}
\end{equation}
where $\mathbf{L}$ is the likelihood estimator of $\beta$, $n$ the number of earthquakes in magnitude bin m with observation period t. The parameter $p$ is defined as:
\begin{equation}
p_i = \frac{t_i \exp \left( {-\beta m_i} \right) }{\sum_j t_j \exp \left( {-\beta m_j} \right)}
\end{equation}
The extremum of $\ln \left( {\mathbf{L}}\right)$ is found at:
\begin{equation}
\frac{\sum_i t_i m_i \exp \left( {-\beta m_i} \right)}{\sum_j t_j \exp \left( {-\beta m_j} \right)}
\end{equation}
The computational implementation of this method is given as an appendix to \textcite{Weichert1980}. This formulation of the maximum likelihood estimator for b-value, and consequently seismicity rate, is in widespread use, with applications in many national seismic hazard analysis \parencite[e.g.][]{usgsNSHM1996,usgsNSHM2002}. The algorithm has been demonstrated to be efficient and unbiased for most applications. It is recognised by \textcite{Felzer2008} that an implicit assumption is made regarding the stationarity of the seismicity for all the time periods.
To implement the \textcite{Weichert1980} recurrence estimator, the configuration properties are defined as:
\begin{python}[frame=single]
>> weichert_config = {`magnitude_interval': 0.1,
`reference_magnitude': None,
# The remaining parameters are optional
`bvalue': 1.0,
`itstab': 1E-5,
`maxiter': 1000}
\end{python}
As the \textcite{Weichert1980} algorithm is reaches the MLE estimation by iteration then three additional optional parameters can control the iteration process: \verb=bvalue= is the initial guess for the b-value, \verb=itstab= the difference in b-value in order to reach convergence, and \verb=maxiter= the maximum number of iterations. \footnote{The iterative nature of the \textcite{Weichert1980} algorithm can result in very slow convergence and unstable behaviour when the magnitudes infer b-values that are very small, or even negative. This can occur when very few events are in the resulting catalogue, or when the magnitudes converge within a narrow range.}
%::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
\section{Maximum Magnitude}
The estimation of the maximum magnitude for use in seismic hazard analysis is a complex, and often controversial, process that should be guided by information from geology and the seismotectonics of a seismic source. Estimation of maximim magnitude from the observed (instrumental and historical) seismicity can be undertaken using methods assuming a truncated \cite{GutenbergRichter1944} model, or via non-parametric methods that are independent any assumed functional form.
\subsection{\textcite{Kijko2004}}
Three different estimators of maximum magnitude are given by \textcite{Kijko2004}, each depending on a different set of assumptions:
\begin{enumerate}
\item ''Fixed b-value'': Assumes a single b-value with no uncertainty
\item ''Uncertain b-value'': Assumes and uncertain b-value defined by an expected b and the standard deviation
\item ''Non-Parametric Gaussian'': Assumes no functional form (can be applied to seismicity observed to follow a more characteristic distribution)
\end{enumerate}
Each of these estimators assumes the general form:
\begin{equation}
m_{max} = m_{max}^{obs} + \Delta
\end{equation}
where $\Delta$ is an increment that is dependent on the estimator used.
The uncertainty on $m_{max}$ is also defined according to:
\begin{equation}
\sigma_{m_{max}} = \sqrt{\sigma_{m_{max}^{obs}}^2 + \Delta^{2}}
\end{equation}
In the three estimators some lower bound magnitude constraint must be defined. For those estimators that assume an exponential recurrence model the lower bound magnitude must be specified by the users. For the non-Parametric Gaussian method and explicit lower bound magnitude does not have to be specified; however, the estimation is conditioned upon the largest N magnitudes, where N must be specified by the user.
If the user wishes to input a maximum magnitude that is larger than that observed in the catalogue (e.g. a known historical magnitude), this can be specified in the config file using \verb=input_mmax= with the corresponding uncertainty defined by \\ \verb=input_mmax_uncertainty=. If these are not defined (i.e. set to \verb=None=) then the maximum magnitude will be taken from the catalogue.
All three estimators require an iterative solution, therefore additional parameters can be specified in the configuration file that control the iteration process: \verb=tolerance= difference in $M_Max$ estimate for the algorithm to be considered converged, and \\ \verb=maximum_iterations= the maximum number of iterations for stability.
\subsubsection{''Fixed b-value''}
For a catalogue of $n$ earthquakes, whose magnitudes are distributed by a \textcite{GutenbergRichter1944} distribiution with a fixed "b" value, the increment of maximum magnitude is determined via:
\begin{equation}
\Delta = \int\limits_{m_{min}} ^{m_{max}} \left[ {\frac{1 - \exp \left[ {-\beta \left( {m - m_{min}} \right)} \right]}{1 - \exp \left[ {-\beta \left( {m_{max}^{obs} - m_{min}} \right) } \right]}} \right] ^n dm
\end{equation}
The execution of the \textcite{Kijko2004} ''fixed-b'' algorithm is as follows:
\begin{python}[frame=single]
>> mmax_config = {`input_mmax': 7.6,
`input_mmax_uncertainty': 0.22,
`b-value': 1.0,
`input_mmin': 5.0,
`tolerance': 1.0E-5, \# Default
`maximum_iterations': 1000\} \# Defaults
>> from openquake.hmtk.seismicity.max_magnitude.kijko_sellevol_fixed_b\
import KijkoSellevolFixedb
>> mmax_estimator = KijkoSellevolFixedb()
>> mmax, mmax_uncertainty = mmax_estimator.get_mmax(catalogue,
mmax_config)
\end{python}
\subsubsection{''Uncertain b-value''}
For a catalogue of $n$ earthquakes, whose magnitudes are distributed by a \textcite{GutenbergRichter1944} distribiution with an uncertain "b" value, characterised by and expected term ($b$) and a corresponding undertainty ($\sigma_b$), the increment of maximum magnitude is determined via:
\begin{equation}
\Delta = \left( {C_{\beta}} \right)^n \int\limits_{m_min}^{m_max} \left[ {1 - \left( {\frac{p}{p + m - m_{min}}} \right) ^q} \right]^n dm
\end{equation}
where $\beta = b \ln \left( {10.0} \right)$, $p = \beta / \left( {\sigma_{\beta}} \right) ^ 2$, $q = \left( {\beta / \sigma_{\beta}} \right) ^ 2$ and $C_{\beta}$ is a normalising coefficient determined via:
\begin{equation}
C_{\beta} = \frac{1}{1 - \left[ {p / \left( {p + m_{max} - m_{min}} \right) } \right]^q}
\end{equation}
In both the fixed and uncertain ''b'' case a minimum magnitude will need to be input into the calculation. If this value is lower than the minimum magnitude observed in the catalogue the iterator may not stabilise to a satisfactory value, so it is recommended to use a minimum magnitude that is greater than the minimum found in the observed catalogue.
The execution of the ''uncertain b-value'' estimator is undertaken in a very similar to that of the fixed b-value, the only additional parameter being the \verb=sigma-b= term:
\begin{python}[frame=single]
>> mmax_config = {'input_mmax': 7.6,
'input_mmax_uncertainty': 0.22,
'b-value': 1.0,
'sigma-b': 0.15
'input_mmin': 5.0,
'tolerance': 1.0E-5,
'maximum_iterations': 1000}
>> from openquake.hmtk.seismicity.max-magnitude.kijko_sellevol_bayes\
import KijkoSellevolBayes
>> mmax_estimator = KijkoSellevolBayes()
>> mmax, mmax_uncertainty = mmax_estimator.get_mmax(catalogue,
mmax_config)
\end{python}
\subsubsection{Non-Parametric Gaussian}
The non-parametric Gaussian estimator for maximum magnitude $m_{max}$ is defined as:
\begin{equation}
\Delta = \int\limits_{m_{min}}^{m_{max}} \left[ {\frac{\sum_{i = 1}^{n} \left[ {\Phi \left( {\frac{m - m_i}{h}} \right) - \Phi \left( {\frac{m_{min} - m_i}{h}} \right)} \right]}{\sum_{i = 1}^{n} \left[ {\Phi \left( {\frac{m_{max} - m_i}{h}} \right) - \Phi \left( {\frac{m_{min} - m_i}{h}} \right)} \right]}} \right]^n dm
\end{equation}
where $m_{min}$ and $m_{max}$ are the minimum and maximum magnitudes from a set of $n$ events, $\Phi$ is the standard normal cumulative distribution function. $h$ a kernel smoothing factor:
\begin{equation}
h = 0.9 \times min\left( {\sigma, IQR / 1.34} \right) \times n^{-1 / 5}
\end{equation}
with $\sigma$ the standard deviation of a set of n earthquakes with magnitude $m_{i}$ where $i = 1, 2, ... n$, and $IQR$ the inter-quartile range.
Therefore the uncertainty on $m_{max}$ is conditioned primarily on the uncertainty of the largest observed magnitude. As in many catalogues the largest observed magnitude may be an earlier historical event, which will be associated with a large uncertainty, this estimator tends towards large uncertainties on $m_{max}$.
Due to the need to define some additional parameters the configuration file is slightly different. No b-value or minimum magnitude needs to be specified; however, the algorithm will consider only the largest \verb=number_earthquakes= magnitudes (or all magnitudes if the number of observations is smaller). The algorithm also numerically approximates the integral of the Gaussian pdf, so \verb=number_samples= is the number of samples of the distribution. The rest of the execution remains the same as for the exponential recurrence estimators of $M_{max}$:
\begin{python}[frame=single]
>> mmax_config = {'input_mmax': 7.6,
'input_mmax_uncertainty': 0.22,
'number_samples': 51, # Default
'number_earthquakes': 100 # Default
'tolerance': 1.0E-5,
'maximum_iterations': 1000}
>> from openquake.hmtk.seismicity.max-magnitude.kijko_nonparametric_gaussian\
import KijkoNonParametricGaussian
>> mmax_estimator = KijkoNonParametricGaussian()
>> mmax, mmax_uncertainty = mmax_estimator.get_mmax(catalogue,
mmax_config)
\end{python}
\subsection{Cumulative Moment \parencite{MakropoulosBurton1983}}
The cumulative moment release method is an adaptation of the cumulative strain energy release method for estimating $m_{max}$ originally proposed by \textcite{MakropoulosBurton1983}. Another method based on a pseudo-graphical formulation, an estimator of maximum magnitude can be derived from a plot of cumulative seismic moment release with time. The average slope of this plot indicates the mean moment release for the input catalogue in question. Two further straight lines are defined with gradients equal to that of the slope of mean cumulative moment release, both enveloping the cumulative plot. The vertical distance between these two lines indicates the total amount of moment that may be released in the region, if no earthquakes were to occur in the corresponding time (i.e. the distance between the upper and lower bounding lines on the time axis). This concept is illustrated in Figure \ref{fig:Cumulative_Moment}.
\begin{figure}[htb]
\centering
\includegraphics[height=6cm, keepaspectratio=true]{./figures/Cumulative_Moment.eps}
\caption{Illustratation of Cumulative Moment Release Concept}
\label{fig:Cumulative_Moment}
\end{figure}
The cumulative moment estimator of $m_{max}$, whilst simple in concept, has several key advantages. As a non-parametric method it is independent of any assumed probability distribution and cannot estimate $m_{max}$ lower than the observed $m_{max}$. It is also principally controlled by the largest events in the catalogue, this making it relative insensitive to uncertainties in completeness or lower bound threshold. In practice, this estimator, and to some extent that of \textcite{Kijko2004} are dependent on having a sufficiently long record of events relative to the strain cycle for the region in question, such that the estimate of average moment release is stable. This will obviously depend on the length of the catalogue, and for some regions, particularly those in low strain intraplate environments, it is often the case that $m_{max}$ will be close to the observed $m_{max}$. Therefore it may be the case that it is most appropriate to use these techniques on a larger scale, either considering multiple sources or an appropriate tectonic proxy.
For the cumulative moment estimator it is possible to take into account the uncertainty on $m_{max}$ by applying bootstrap sampling to the observed magnitudes and their respective uncertainties. This has the advantage that $\sigma_{m_{max}}$ is not controlled by the uncertainty on the observed $m_{max}$, as it is for the \textcite{Kijko2004} algorithm. Instead it takes into account the uncertainty on all the magnitudes in the catalogue. The cost of this, however, is that this method is more computationally intensive, and therefore slower, than \textcite{Kijko2004}, depending on the number of bootstrap samples the user chooses.
The algorithm is slightly simpler to run than the \textcite{Kijko2004} methods; however, due to the bootstrapping process it is slightly slower. It is run as per the following example:
\begin{python}[frame=single]
>> mmax_config = {'number_bootstraps': 1000}
>> from openquake.hmtk.seismicity.max_magnitude.cumulative_moment_release\
import CumulativeMoment
>> mmax_estimator = CumulativeMoment()
>> mmax, mmax_uncertainty = mmax_estimator.get_mmax(catalogue,
mmax_config)
\end{python}
For the cumulative moment algorithm the only user configurable parameter is the \\ \verb=number_bootstraps=, which is the number of samples used during the bootstrapping process.
\section{Smoothed Seismicity}
The use of smoothed seismicity in seismic hazard analysis has generally become a common way of characterising distributed seismicity, for which the seismogenic source are defined exclusively from the uncertain locations of observed seismicity. There are many different methods for smoothing the catalogue, adopting different smoothing kernels or making different correction factors to compensate for spatial and/or temporal completeness.
\subsection{\textcite{frankel1995}}
A smoothed seismicity method that has one of the clearest precedents for use in seismic hazard analysis is that of \textcite{frankel1995}, originally derived to characterise the seismicity of the Central and Eastern United States as part of the 1996 National Seismic Hazard Maps of the United States. The method applies a simple isotropic Gaussian smoothing kernel to derive the expected rate of events at each cell $\tilde{n}_i$ from the observed rate $n_j$ of seismicity in a grid of $j$ cells. This kernel takes the form:
\begin{equation}
\tilde{n_i} = \frac{\sum_j n_j e^{d_{ij}^2 / c^2}}{\sum_j e^{d_{ij}^2 / c^2}}
\end{equation}
In the implementation of the algorithm, two steps are taken that we prefer to make configurable options here. The first step is that the time-varying completeness is accounted for using a correction factor ($t_f$) based on the \textcite{Weichert1980} method:
\begin{equation}
t_f = \frac{\sum_i e^{-\beta m_{c_i}}}{\sum_i T_i e^{-\beta m_{c_i}}}
\end{equation}
where $m_{c_i}$ the completeness magnitude corresponding to the mid-point of each completeness interval, and $T_i$ the duration of the completeness interval. The completeness magnitude bins must be evenly-spaced; hence, within the application of the progress a function is executed to render the input completeness table to one in which the magnitudes are evenly spaced with a width of 0.1 magnitude units.
\subsection{Implementing the Smoothed Seismicity Analysis}
The smoothed seismicity separates out the core implementation (i.e. the gridding, counting and execution of the code) and the choice of kernel. An example of the execution process is as follows:
The first stage is to upload the catalogue into an instance of the catalogue class
\begin{python}[frame=single]
>> input_file = 'path/to/input_file.csv'
>> from openquake.hmtk.parsers.catalogue.csv_catalogue_parser import\
CsvCatalogueParser
>> parser = CsvCatalogueParser(input_file)
>> catalogue = parser.read_file()
\end{python}
Next setup the smoothing algorithm using and the corresponding kernel:
\begin{python}[frame=single]
# Imports the smoothed seismicity algorithm
>> from openquake.hmtk.seismicity.smoothing.smoothed_seismicity import\
SmoothedSeismicity
# Imports the Kernel function
>> from openquake.hmtk.seismicity.smoothing.kernels.isotropic_gaussian\
import IsotropicGaussian
# Grid limits should be set up as
# [min_long, max_long, spc_long,
# min_lat max_lat, spc_lat,
# min_depth, max_depth, spc_depth]
>> grid_limits = [0., 10., 0.1, 0., 10., 0.1, 0., 60., 30.]
# Assuming a b-value of 1.0
>> smooth_seis = SmoothedSeismicity(grid_limits,
use_3d=True,
bvalue=1.0)
\end{python}
The smoothed seismicity function needs to be set up with three variables: i) the extent (and spacing) of the grid, ii) the choice to use 3D smoothing (i.e. distances are taken as hypocentral rather than epicentral) and iii) the input b-value. The extent of the grid can also be defined from the catalogue. If preferred the user need only specify the spacing of the longitude-latitude grid (as a single floating point value), then the grid will be defined by taking the bounding box of the earthquake catalogue and extended by the total smoothing length (i.e. the bandwidth (in km) multiplied by the maximum number of bandwidths).
To run the smoothed seismicity analysis, the configurable parameters are: \verb=BandWidth= the bandwidth of the Gaussian kernel (in km), \verb=Length_Limit= the number of bandwidths considered as a maximum smoothing length, and \verb=increment= chooses whether to output the incremental a-value (for consistency with the original \textcite{frankel1995} methodology) or the cumulative a-value (corresponding to the a-value of the Gutenberg-Richter model).
The algorithm requires two essential inputs (the earthquake catalogue and the config file), and three optional inputs:
\begin{itemize}
\item \verb=completeness_table= A table of completeness magnitudes and their corresponding completeness years (as output from the completeness algorithms)
\item \verb=smoothing_kernel= An instance of the required smoothing
kernel class (currently only Isotropic Gaussian is supported - and will be used if not specified)
\item \verb=end_year= The final year of the catalogue. This will be taken as the last year found in the catalogue, if not specified by the user
\end{itemize}
The analysis is then run via:
\begin{python}[frame=single]
# Set up config (e.g. 50 km band width, up to 3 bandwidths)
>> config = {`Length_Limit': 3.,
`BandWidth': 50.,
'increment': True}
# Run the analysis!
>> output_data = smooth_seis.run_analysis(
catalogue,
config,
completeness_table,
smoothing_kernel=IsotropicGaussian(),
end_year=None)
# To write the resulting data to a csv file
>> smooth_seis.write_to_csv(`path/to/output_file.csv')
\end{python}
The resulting output will be a csv file with the following columns:
\begin{Verbatim}[frame=single, commandchars=\\\{\}, fontsize=\scriptsize]
Longitude, Latitude, Depth, Observed Count, Smoothed Rate, b-value
\end{Verbatim}
\noindent where \verb=Observed Count= is the observed number of earthquakes in each cell, and \\
\verb=Smoothed Rate= is the smoothed seismicity rate.