-
Notifications
You must be signed in to change notification settings - Fork 1
/
Chapter_2_More_Practice_with_RInstat.qmd
574 lines (442 loc) · 30.9 KB
/
Chapter_2_More_Practice_with_RInstat.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
# More Practice with R-Instat
## Introduction
To use climatic data fully it is important to be able to deliver
products. The two examples in this chapter describe the steps and the
endpoint in this process. Data are supplied in the right form for the
analysis. The objectives are specified, and your task is to prepare the
tables and graphs for a report and a presentation.
Some familiarity with R-Instat is assumed. There are two initial
tutorials and following those is enough preparation. If you have already
used a statistics package before, then the examples below may be
sufficient for you, even without the tutorials. This chapter is also
designed to provide practice with R-Instat.
The first problem builds on a study in Southern Zambia. This is the most
drought-prone area of the country. Everyone knew that there is \'climate
change\'! Some farmers were emigrating North, citing climate change as
their reason. However, a local non-governmental organization (NGO)
called the Conservation Farming Unit, questioned this reasoning for the
rainfall data. They are not convinced that any climate change has
necessarily affected the farming practices. They, therefore,
commissioned a study that used daily climatic data from several stations
in Southern Zambia. The results were supplied as a report, and
presentations of the results were also made to the NGO and to the local
FAO Officers. The results confirmed evidence of climate change in the
temperature data, but not in the rainfall. The key conclusions were
later made into short plays that were broadcast on local radio and
played at village meetings.
Here we use data from Moorings, a site in Southern Zambia. The daily
data, on rainfall, are from 1922 to 2009. Here, partly for simplicity,
we largely use the monthly summaries.
For the work, we draw an analogy with the preparation of a meal. The
first key requirement is that you have the food, which here is the
climatic data. In a real meal, the food may be supplied in a form that
is ready for cooking, or it may need preparation prior to cooking. Here
the data are in pre-packed form, so the analysis can proceed quickly.
You also need the right tools. In a kitchen, they are the saucepans,
etc, while here they are just the computer, together with the required
software.
You need some general cooking skills. These are the basic computing
skills, plus initial skills of R-Instat, at least from the tutorial.
Finally, your objectives must be clear. This corresponds to having a
specific meal in mind so that a recipe can be used. Of course, you may
have to adapt slightly as you go along. You might find some oddities in
the data, just as cooks must improvise if they suddenly find that one of
the ingredients is not available.
If everything is well organized, the cook can prepare the meal very
quickly. This is just what is done in the products in this chapter. This
leaves time to make sure the dishes, for us the results, are presented
attractively. Then users will enjoy consuming what is presented.
Section 2.2 describes the data for this first task. Trends in the
rainfall are examined in Section 2.3. A second problem, in Section 2.4,
examines whether satellite data on sunshine hours resembles
corresponding station data. Daily data from Dodoma, Tanzania, are used.
The data for each of these case studies are in the R-Instat library. The
presentation is designed so users can repeat the analyses on their
laptops.
Graphs are produced in each of these sections and the general methods
for graphics in R-Instat is outlined in Section 2.5. Section 3.5 then
adds a warning. R-Instat provides an easy-to-use click and point way of
using the R programming language. It should help users to solve may
problems. But a click-and-point system is not the right tool for all
problems. We describe a problem that may require more programming
skills, at least if you wish to prevent your computer from laughing at
you!
This chapter demonstrates R-Instat as a simple general statistics
package and the File, Prepare and Describe menus are used. It
illustrates that a general statistics package is an appropriate tool for
many climatic problems. It is also designed to consolidate your
experience in using R-Instat. The special climatic menu is introduced in
chapter 4.
## The Moorings data
Monthly data are used in this part of the chapter. Daily data are the
starting point in most of this guide because many of the objectives
require daily data. But here the emphasis is on objectives for which the
monthly data are suitable.
The data are already in an R-Instat file. Hence, they can be opened from
the library in R-Instat library.
From the opening screen in R-Instat, select ***File \> Open From
Library*** as shown in Fig. 2.2a. Choose ***Load From Instat
Collection***, Then ***Browse*** to the ***Climatic*** directory then to
***Zambia***. ***Select*** the file called ***Moorings_July.rds*** to
give the screen shown in Fig. 2.2b. Press ***Ok***.
| ***Fig. 2.2a File \> Open from Library*** | ***Fig. 2.2b Ready to import Moorings.RDS*** |
|:----------------------------------------:|:-------------------------------------------:|
| ***Climatic \> Zambia \> Moorings.RDS*** | |
| ![](figures/Fig2.2a.png){width="3.36in" height="3.06in"} | ![](figures/Fig2.2b.png){width="2.43in" height="2.75in"} |
The resulting data are shown in Fig. 2.2c. There are 2 data frames. The
one called Moorings has daily data.
***Move to the second data frame*** as shown in Fig. 2.2c which shows
the monthly totals. They are the total rainfall in mm and the total
number of rain days. A rain day was defined as a day with more than
0.85mm[^1].
| ***Fig. 2.2c The Moorings monthly data*** | ***Fig. 2.2d Boxplot dialogue on the Describe menu*** |
|:----------------------------------------:|:---------------------------------------------------:|
| | ***Describe \> Specific \> Boxplot*** |
| ![](figures/Fig2.2c.png){width="1.844in" height="3.474in"} | ![](figures/Fig2.2d.png){width="3.096in" height="2.329in"} |
Rainfall in Southern Zambia is from November to April. Hence, we analyze
the data by season, rather than by year. There are 88 seasons from 1922
to 2009 and 1056 monthly values, as indicated in Fig. 2.2c.
The task is to write a short report that describes the patterns of
rainfall. One aim is to assess whether there is obvious evidence of
change in the pattern of rainfall. This evidence might justify
requesting the data from multiple stations, to undertake a more detailed
study. The first step is to explore the data, and then consider how
appropriate results could be presented. To explore we start with a
boxplot to show the seasonal pattern of the rainfall totals.
Choose the Boxplot dialogue from the Describe menu, with ***Describe \>
Specific \> Boxplot***, as shown in Fog. 2.2d. Complete the dialogue as
shown in Fig. 2.2e. The resulting graph is shown in Fig. 2.2f[^2]. This
shows the total rainfall was typically 200mm in each of December to
February. There was always some rain in each of these months, and the
records were over 500mm.
| ***Fig. 2.2e Completed boxplot dialogue*** | ***Fig. 2.2f Boxplot of monthly rainfall totals*** |
|:----------------------------------------:|:---------------------------------------------------:|
| ***Describe \> Specific \> Boxplot*** | |
| ![](figures/Fig2.2e.png){width="3.101in" height="3.540in"} | ![](figures/Fig2.2f.png){width="2.966in" height="3.095in"} |
Change the variable from rain to ***raindays*** in Fig. 2.2e to give the
corresponding boxplots for the number of raindays in the month, Fig.
2.2g. This shows that typically one day in two are rainy in December to
February. Occasionally most of the days are rainy.
Boxplots are essentially a 5-number summary of the data, (with potential
outliers also shown). The ***Prepare \> Column: Reshape \> Column
Summaries, Fig. 2.2h,*** dialogue can provide the same summaries
numerically.
| ***Fig. 2.2g The number of rain days*** | ***Fig. 2.2h Summary dialogue on the Prepare menu*** |
|:----------------------------------------:|:---------------------------------------------------:|
| ![](figures/Fig2.2g.png){width="3.281in" height="3.440in"} | ![](figures/Fig2.2h.png){width="2.724in" height="2.691in"}|
Summarise both the monthly totals and the number of raindays, with the
month as the factor, as shown in Fig. 2.2i. Then choose the Summaries
button and complete the sub-dialogue as shown in Fig. 2.2j.
| ***Fig. 2.2i Summary dialogue*** | ***Fig. 2.2j Summaries sub-dialogue*** |
|:----------------------------------------:|:---------------------------------------------------:|
| ***Prepare \> Column: Reshape \> Column Summaries*** | |
| ![](figures/Fig2.2i.png){width="3.206in" height="3.343in"} | ![](figures/Fig2.2j.png){width="2.734in" height="3.674in"} |
The results are in a third data frame. It just has 12 rows as shown in
Fig. 2.2k. The summaries are clearer if they are in order (which we did
already for Fig. 2.2k).
***Right-click in the name field*** of this data frame and choose the
option to ***Reorder columns***, Fig. 2.2l.
| ***Fig. 2.2k Resulting summary data*** | ***Fig. 2.2l Right-click menu to reorder columns*** |
|:----------------------------------------:|:---------------------------------------------------:|
| ![](figures/Fig2.2k.png){width="3.517in" height="2.443in"} | ![](figures/Fig2.2l.png){width="2.419in" height="2.464in"} |
In the Reorder dialogue, use the ***arrow keys*** to change the position
of the columns in the data frame.
With the summaries in a sensible order, they are now transferred to the
results (output) window.
| ***Fig. 2.2m Reorder the resulting columns*** | ***Fig. 2.2n Simplify column names*** |
|:----------------------------------------:|:---------------------------------------------------:|
| ***Right-click \> Reorder column(s)*** | ***Right-click \> Rename column(s)*** |
| ![](figures/Fig2.2m.png){width="3.098in" height="2.219in"} | ![](figures/Fig2.2n.png){width="3.026in" height="2.083in"} |
Before this, we renamed some of the columns to give shorter names. This
again used the ***right-click*** menu, Fig 2.2l. The rename dialogue is
shown in Fig. 2.2n.
| ***Fig. 2.2o View Data dialogue*** | ***Fig. 2.2p The Monthly number of rain days*** |
|:----------------------------------------:|:---------------------------------------------------:|
| ***Prepare \> Data Frame \> View Data*** | |
| ![](figures/Fig2.2o.png){width="2.712in" height="2.720in"} | ![](figures/Fig2.2p.png){width="3.378in" height="2.302in"} |
Now use the ***Prepare \> Data Frame \> View Data*** dialogue, Fig.
2.2o, to transfer the rainfall totals and then the number of rain days
to the results window. The results for the number of rain days are shown
in Fig. 2.2p.
## The objectives
Section 2.2 explored the data and examined the seasonal pattern of the
rainfall at Moorings. It also made use of the three menus, File, Prepare
and Describe and well as the right-click menu. The main objective,
however, was to see if there is evidence of rainfall change rather than
to investigate the seasonal pattern.
We first examine the annual totals and the total number of rain days.
These are the totals from July to June, so they cover each season.
Some "housekeeping" is a preliminary. The 3^rd^ data-frame is no longer
needed. Right-click on the bottom tab a and choose the option to delete,
Fig. 2.3a. The dialogue shown in Fig. 2.3b opens. Just press ok.
| ***Fig. 2.3a Right-click on the bottom tab*** | ***Fig. 2.3b Delete a data frame*** |
|:----------------------------------------:|:---------------------------------------------------:|
| ![](figures/Fig2.3a.png){width="2.15in" height="1.688in"} | ![](figures/Fig2.3b.png){width="3.808in" height="2.323in"} |
Use ***Prepare \> Column: Reshape \> Column Summaries*** and complete
the dialogue and sub-dialogue as shown in Fig. 2.3c and Fig. 2.3d to
produce the seasonal totals.
| ***Fig. 2.3c Produce the annual totals*** | ***Fig. 2.3d The Summaries sub-dialogue*** |
|:----------------------------------------:|:---------------------------------------------------:|
| ***Prepare \> Column: Reshape \> Column Summaries*** | |
| ![](figures/Fig2.3c.png){width="3.194in" height="3.340in"} | ![](figures/Fig2.3d.png){width="2.831in" height="3.806in"} |
The results are shown in Fig. 2.3 e after the steps explained below.
First, notice in Fig. 2.3e that there were only 4 months in the first
season, and the annual summary was therefore set to missing[^3].
| ***Fig. 2.3e Resulting annual data*** | ***Fig. 2.3f Menu for a text substring*** |
|:----------------------------------------:|:---------------------------------------------------:|
| ![](figures/Fig2.3e.png){width="2.941in" height="3.522in"} | ![](figures/Fig2.3f.png){width="3.028in" height="2.812in"} |
A numeric column for the year (season) is needed for the time series
graphs. Hence, as shown below, we produce the second column, called
s_yr, also shown in Fig. 2.3e.
Use ***Prepare \> Column: Text \> Transform***, Fig. 2.3f. Complete the
resulting dialogue, as shown in Fig. 2.3g, to give just the starting
year of the season. The resulting variable is shown in Fig. 2.3e.
| ***Fig. 2.3g The Substring Option*** | ***Fig. 2.3h Convert Column to Numeric*** |
|:----------------------------------------:|:---------------------------------------------------:|
| ***Prepare \> Column: Text \> Transform*** | |
| ![](figures/Fig2.3c.png){width="3.460in" height="3.001in"} | ![](figures/Fig2.3d.png){width="2.004in" height="2.891in"} |
Use the ***right-click*** menu, Fig. 2.3h to convert the resulting s_yr
column to numeric.
After a little further housekeeping from the right-click menu, to
***rename***, ***re-orde***r and ***delete*** columns, the annual data
are as shown in Fig. 2.3e above.
Now for the time-series graphs. They can be produced using the
***Describe \> Specific \> Line Plot*** dialogue, but this type of graph
is just what is needed for the PICSA-style rainfall graphs, so we use
the special climatic menu for the first time.
Use ***Climatic \> PICSA \> Rainfall Graph***. Complete as shown in Fig.
2.3i. Press the **PICSA Options** button and complete the Lines ab as
shown in Fig. 2.3j to add (and label) a horizontal line for the mean.
| ***Fig. 2.3i PICSA Rainfall graph dialogue*** | ***Fig. 2.3j Add a line showing the mean*** |
|:----------------------------------------:|:---------------------------------------------------:|
| ***CLimatic \> PICSA \> Rainfall Graph*** | |
| ![](figures/Fig2.3c.png){width="2.796in" height="3.340in"} | ![](figures/Fig2.3d.png){width="3.264in" height="2.318in"} |
The resulting graph is shown in Fig 2.3k[^4]. ***Return to the
dialogue*** and put ***raindays*** as the y-variable to give the results
in Fig. 2.3l.
| ***Fig. 2.3k Seasonal rainfall totals*** | ***Fig. 2.3l Number of rain days*** |
|:----------------------------------------:|:---------------------------------------------------:|
| ![](figures/Fig2.3k.png){width="3.017in" height="1.502in"} | ![](figures/Fig2.3l.png){width="2.872in" height="1.429in"} |
These graphs indicate large inter-annual variability, but they don't
seem to show a trend. That is important because, if you can attribute
your farming problems to climate change, then there may be nothing you
can do. But coping with the variability is what farmers have always had
to do.
With results such as shown in Fig. 2.3k and 2.3l you can start comparing
risks for different options in your farming and in other enterprises.
That sort of idea is discussed in PICSA workshops.
Some may find the graph shown above to be convincing evidence that, with
rainfall, the pressing problem is variability, rather than change. We
stress that there IS climate change, and similar graphs with temperature
data show a trend. If the temperatures have changed, then the "system"
has changed, and it follows that other elements including rainfall will
be affected. Currently, however, with this sort of analysis, it is
usually not yet possible to determine which way the pattern of rainfall
may change. It is difficult to detect a small change when the
inter-annual variability is so large. And, even if a change is detected,
coping as well as possible with the variability must be a good thing to
do.
Some people are not convinced by graphs such as are shown above. A
common statement is that the annual totals that might still be similar,
but the season is shorter, because planting is delayed, etc. We examine
this in more detail in Chapter 7. There the daily data are used to
define the start, end and length of the season as well as to examine dry
spells and extremes during the season. With the monthly total, the
examination can start by repeating the analysis above, but just for
November and December, when the season starts.
***Return to the monthly data frame*** and ***filter*** to examine just
those months. So, make sure you are on the monthly data. ***Right
click*** as usual and choose ***Filter***, Fig. 2.3m
| ***Fig. 2.3m Right-click for Filter*** | ***Fig. 2.3n The filter dialogue*** |
|:----------------------------------------:|:---------------------------------------------------:|
| ![](figures/Fig2.3c.png){width="2.886in" height="6.250in"} | ![](figures/Fig2.3l.png){width="6.944in" height="3.659in"} |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
***Fig. 2.3m Right-click for Filter*** ***Fig. 2.3n The filter dialogue***
------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------
![](media/image1104.png){width="2.8858759842519683in" ![](media/image1106.png){width="6.944444444444444e-3in" height="6.250328083989501e-2in"}![](media/image1106.png){width="6.944444444444444e-3in"
height="3.6586843832021in"} height="6.250328083989501e-2in"}![C:\\Users\\ROGERS\~1\\AppData\\Local\\Temp\\SNAGHTML1993cb5.PNG](media/image1107.png){width="3.2437456255468065in"
height="3.2540988626421696in"}
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
In Fig. 2.3n, click to ***Define new Filter***. Complete the
sub-dialogue as shown in Fig. 2.3o. The steps are as follows:
1) ***Choose the month*** column
2) ***Select Nov and Dec*** as shown in Fig. 2.3p
3) Click to ***Add Condition***
4) Press ***Return***
| ***Fig. 2.3o Define the filter*** | ***Fig. 2.3p The filtered data*** |
|:----------------------------------------:|:---------------------------------------------------:|
| ***Prepare \> Column: Reshape \> Column Summaries*** | |
| ![](figures/Fig2.3o.png){width="3.992in" height="2.858in"} | ![](figures/Fig2.3p.png){width="2.056in" height="2.975in"} |
Back on the main filter dialogue, just ***press Ok***. The data are now
as shown in Fig. 2.3p. The first column is in red and this shows a
filter is in operation. Also, at the bottom of the data frame, you see
there are now 176 rows (months) of data to analyse, out of the original
1062 rows.
The other data have not gone away. If ever you wish to return, then just
press right-click as before, Fig. 2.3m and choose the last option to
***Remove Current Filter***.
Now it is quick to repeat the steps above for this analysis. It is
simpler to recall the last dialogues as shown in Fig. 2.3q.
| ***Fig. 2.3q Recall the last dialogues*** | ***Fig. 2.3r The Column Statistics dialogue*** |
|:----------------------------------------:|:---------------------------------------------------:|
|![](figures/Fig2.3q.png){width="2.604in" height="2.746in"} | ![](figures/Fig2.3r.png){width="3.315in" height="3.458in"} |
The Column Statistics dialogue and sub-dialogue remain completed from
before, Fig. 2.3r. So just ***press Ok***.
The new columns have added to the existing annual sheet. So, go straight
to the PICSA Rainfall Graphs dialogue again. ***Choose the new
variable*** for the November-December totals and press OK. The mean is
now 286mm for the 2 months. Repeat for the number of rain days to give
the graphs for the filtered data, see Fig. 2.3s and 2.3t.
| ***Fig. 2.3s Rainfall totals Nov-Dec*** | ***Fig. 2.3t Number of rain days Nov-Dec*** |
|:----------------------------------------:|:---------------------------------------------------:|
|![](figures/Fig2.3s.png){width="2.967in" height="1.476in"} | ![](figures/Fig2.3t.png){width="3.032in" height="1.509in"} |
One feature of the data in Fig. 2.3s is that had the record started in
1981, then it might have given an impression of an upward trend in the
rainfall total. The longer record shows that this sort of conclusion
should be treated with considerable caution!
These graphs start to satisfy the objective of examining the rainfall
data in Southern Zambia for trends. The results should be considered as
provisional if only because they are from just a single station and only
use the monthly data. We suggest they make a case for a more complete
analysis with multiple stations.
## Comparing satellite and station data
Our proposed objective is to report on the feasibility of using
satellite estimates of sunshine hours to supplement the information from
station data. The station network for sunshine or radiation is sparse in
many countries. Where it exists the records often have many missing
values.
Estimates of daily hours of sunshine are also available from the
EUMETSAT CMSAF (Climate Monitoring Satellite Application Facility) for
about 5km square pixels. The data are available from 1983 and may be
downloaded free of charge. These data are in NetCDF files and examples
from a few locations have been downloaded and are the R-Instat library.
Data from Dodoma, Tanzania was analysed in the second tutorial and the
same dataset is used here. In this exercise, these data are merged with
the corresponding satellite data and the two variables are then
compared.
As in the tutorial use ***File \> Open From Library***. Choose the
***Instat collection***. Browse to the ***Climatic directory*** and
choose the ***Original Climatic Guide datasets***. Choose just the
***Dodoma*** sheet, Fig. 2.4a.
| ***Fig. 2.4a Importing the Dodoma data*** | ***Fig. 2.4b The Summaries sub-dialogue*** |
|:----------------------------------------:|:---------------------------------------------------:|
| ![](figures/Fig2.4a.png){width="3.179in" height="2.681in"} | ![](figures/Fig2.4b.png){width="2.830in" height="2.695in"} |
Once imported use ***Prepare \> Column: Date \> Make Date***, Fig. 2.4b
to construct a date column from the Year, Month, Day columns. Name the
resulting column as ***Date***, see Fig. 2.4b.
Now use ***File \> Import and Tidy NetCDF File***, Fig. 2.4c. Choose the
option ***From Library*** and the file that starts ***CMSAF_SDU*** (for
sunshine duration). This file contains just the data from the nearest
pixel to the Dodoma station data.
| ***Fig. 2.4c Import the CMSAF Satellite data*** | ***Fig. 2.4d Change the name to Date*** |
|:----------------------------------------:|:---------------------------------------------------:|
|![](figures/Fig2.4c.png){width="2.988in" height="1.781in"} | ![](figures/Fig2.4d.png){width="3.040in" height="1.774in"} |
Once imported, ***right-click*** to change the name of the last column
from ***time_date*** to ***Date***, i.e. to the same name as in the
station data, Fig. 2.4d[^5].
| ***Fig. 2.4e The Merge dialogue*** | ***Fig. 2.4f Sub-dialogue to add just SDU*** |
|:----------------------------------------:|:---------------------------------------------------:|
| ![](figures/Fig2.4e.png){width="3.111in" height="2.508in"} | ![](figures/Fig2.4f.png){width="3.245in" height="2.277in"} |
Use ***Prepare \> Column: Reshape \> Merge*** and complete the dialogue
as shown in Fig. 2.4e. It chooses to match on the Date columns, which is
what we want. That is why we gave them the same name in the two data
frames[^6].
The SDU column is all we need from the satellite data. So, press the
***Merge Options*** button in Fig. 2.4e. Use the ***Columns to
Include*** tab and complete as shown in Fig. 2.4f. Press ***Return***
and then Ok.
Now check, using ***Describe \> One Variable \> Summarise*** on the
merged data. Choose all the columns and press Ok. The results are in
Fig. 2.4g.
| ***Fig. 2.4g Results from One Variable Summarise*** | ***Fig. 2.4h Generate columns from the Date*** |
|:----------------------------------------:|:---------------------------------------------------:|
| ***Describe \> One Variable \> Summarise*** | ***Prepare \> Column: Date \> Make Date***
| ![](figures/Fig2.4g.png){width="3.249in" height="2.516in"} | ![](figures/Fig2.4h.png){width="2.799in" height="3.594in"} |
An encouraging sign in Fig. 2.4g is that summary statistics for the Sunh
(from the station) and SDU (from the satellite) are almost identical.
The maximum of 16 hours for SDU is a little concerning, because that is
probably longer than the maximum day length at Dodoma.
Some "housekeeping" is useful, because the results in Fig. 2.4g also
show there are some missing values in the year and day of month
column[^7].
***Right-click*** and ***delete the first 3 columns***. Then generate
them again (without missing values) using ***Prepare \> Column: Date \>
Use Date*** dialogue, Fig. 2.4h. Then use Describe \> One Variable \>
Summarise again to confirm the new columns do not have missing values.
| ***Fig. 2.4i Resulting merged data*** | ***Fig. 2.4j Correlations dialogue*** |
|:----------------------------------------:|:---------------------------------------------------:|
| ![](figures/Fig2.4g.png){width="3.285in" height="2.760in"} | ![](figures/Fig2.4j.png){width="2.792in" height="2.812in"} |
***Right-Click*** and choose ***Reorder Column(s)***. The resulting data
should be like that shown in Fig. 2.4i.
We are now ready to compare the satellite data (SDU) with the station
values (sunh).
| ***Fig. 2.4k Correlations sub-dialogue*** | ***Fig. 2.4l Results*** |
|:----------------------------------------:|:---------------------------------------------------:|
| ![](figures/Fig2.4k.png){width="2.906in" height="3.124in"} | ![](figures/Fig2.4l.png){width="3.094in" height="3.094in"} |
Use ***Describe \> Multivariate \> Correlations***. and enter SDU and
sunh, Fig. 2.4j. Click on ***Options*** and choose ***Scatter Matrix***.
Fig. 2.4k. The results are in Fig. 2.4l.
The results in Fig. 2.4l look promising. The shape of the satellite
(bottom right) and station data (top left) look similar and the
correlation is a reasonably satisfactory 0.87.
What is next? These data (both sunh and SDU) are time series. Time
series have seasonality, and this should usually be reflected in the
analysis.
So, return to the ***correlations dialogue*** and sub-dialogue, Fig.
2.4k and ***add the month factor***.
| ***Fig. 2.4m Including months in the analysis*** | ***Fig. 2.4n The Histogram dialogue*** |
|:----------------------------------------:|:---------------------------------------------------:|
|![](figures/Fig2.4m.png){width="2.931in" height="2.931in"} | ![](figures/Fig2.4n.png){width="3.064in" height="2.494in"} |
The results in Fig. 2.4m show that the shape of both variables depends
on the month. In particular (as expected) there is often less sun in the
rainy season (November to April) and the correlations are then higher.
The display, in Fig. 2.4m, is also confusing as there are now too many
groups to see clearly what is happening.
It is time to split up the components of the results in Fig. 2.4m to
compare the satellite and station data in more detail.
Use ***Describe \> Specific \> Histogram***, Fig. 2.4n.
| ***Fig. 2.4o A set of density graphs*** | ***Fig. 2.4p Include facets in the graph*** |
|:----------------------------------------:|:---------------------------------------------------:|
| ![](figures/Fig2.4o.png){width="2.856in" height="2.822in"} | ![](figures/Fig2.4p.png){width="3.222in" height="2.833in"} |
Change the button at the top of Fig. 2.4o to Density and click on Plot
Options.
In the sub-dialogue, Fig. 2.4p tick the checkbox to include facets and
include the month factor.
| ***Fig. 2.4q Multiple variables*** | ***Fig. 2.4r Resulting graphs*** |
|:----------------------------------------:|:---------------------------------------------------:|
| ***Describe \> One Variable \> Summarise*** | ***Prepare \> Column: Date \> Make Date***
| ![](figures/Fig2.4q.png){width="1.838in" height="1.876in"} | ![](figures/Fig2.4r.png){width="4.200in" height="2.090in"} |
Return to the main dialogue, and click to include multiple variables,
Fig. 2.4q. Include Sunh and SDU and press Ok. The graphs from Fig. 2.4m
are now overlaid, so they can easily be compared, and displayed
separately for each month. Fig. 2.4r shows the pattern is similar in
each month. We also see the sharp peaks in the dry months, particularly
from June to October, when most days have about 10 hours of sunshine per
day.
Fig. 2.4r shows the pattern of sunshine is similar from the satellite
and station data. It does not, however, show whether a day in any month
with more sunshine at the station (sunh), also had more sunshine from
the satellite data (SDU). For this, we look at the scatterplot from Fig.
2.4m again broken into the monthly facets.
Use ***Describe \> Specific \> Scatterplot*** and complete the dialogue
as shown in Fig. 2.4s. Press on ***Plot Options*** and include the
months as facets, Fig. 2.4t, just as earlier in Fig. 2.4p.
| ***Fig. 2.4s Scatterplot dialogue*** | ***Fig. 2.4t Plotting sub-dialogue*** |
|:----------------------------------------:|:---------------------------------------------------:|
| ![](figures/Fig2.4g.png){width="3.050in" height="2.945in"} | ![](figures/Fig2.4h.png){width="2.893in" height="2.466in"} |
The resulting set of graphs is shown in Fig. 2.4u.
-----------------------------------------------------------------------
***Fig. 2.4u Scatterplots for each month***
-----------------------------------------------------------------------
![](figures/Fig2.4u.png){width="6.131447944006999in"
height="3.0514577865266843in"}
-----------------------------------------------------------------------
Our initial objective was to examine whether the satellite estimates may
be useful in Tanzania to supplement the station data. The results are
promising, but this just the start. There are many possible next steps,
including an examination of the occasions when the two variables differ
substantially. The analysis should also be extended to multiple
stations. We also need more numerical summaries to measure how close the
two variables are. Chapter 10 considers this subject in more detail.
## In conclusion
[To be added]{.mark}