-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path06-Wednesday.Rmd
2261 lines (1949 loc) · 119 KB
/
06-Wednesday.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Programme And Abstracts For Wednesday 13^th^ Of December {#Wednesday .unnumbered}
<div id = "talk_009"><p class="contribBanner">Wednesday 13<sup>th</sup> 10:30 098 Lecture Theatre (260-098)</p></div>
## Promoting Your R Package {.unnumbered}
<p style="text-align:center">
Hadley Wickham<br />
RStudio<br />
</p>
<span>**Abstract:**</span> Your new statistical or data science tool is much more likely to be used
if you provide it in a convenient form, like an R package. But how do
people find out that your R package exists? I’ll provide a comprehensive
overview of the options, including creating excellent documentation
(with roxygen2) and vignettes (with rmarkdown), creating a package
website (with pkgdown), and promoting your work on social media.
<span>**Keywords:**</span> R packages, websites
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_158"><p class="contribBanner">Wednesday 13<sup>th</sup> 10:30 OGGB4 (260-073)</p></div>
## A Smoothing Filter Modelling Approach For Time Series {.unnumbered}
<p style="text-align:center">
Marco Reale^1^, Granville Tunnicliffe Wilson^2^, and John Haywood^3^<br />
^1^University of Canterbury<br />
^2^Lancaster University<br />
^3^Victoria University of Wellington<br />
</p>
<span>**Abstract:**</span> We introduce different representations of a
new model for time series based on repeated application of a filter to
the original data. They can represent correlation structure to higher
lags with fewer coefficients and they can provide a robust prediction at
higher lead times.
<span>**Keywords:**</span> Time series, smooting, parsimonious models
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_003"><p class="contribBanner">Wednesday 13<sup>th</sup> 10:30 OGGB5 (260-051)</p></div>
## Online Learning For Bayesian Nonparametrics: Weakly Conjugate Approximation {.unnumbered}
<p style="text-align:center">
Yongdai Kim^1^, Kuhwan Jeong^1^, Byungyup Kang^2^, and Hyoju Chung^2^<br />
^1^Seoul National University<br />
^2^NAVER Corp.<br />
</p>
<span>**Abstract:**</span> We propose a new online learning method for
Bayesian nonparametric (BNP) models so called <span>*weakly conjugate
approximation*</span> (WCA). We consider classes of BNP priors which are
weakly conjugate. Here, ‘weakly conjugate prior’ means that the
resulting posterior can be easily approximated by an efficient MCMC
algorithm.
Suppose the whole data set is divided into two groups, say
${{\bf x}}=({{\bf x}}^{old},{{\bf x}}^{new}).$ Then, the Bayes rule
implies
$p(\theta|{{\bf x}}) \propto p({{\bf x}}^{new}|\theta) p(\theta|{{\bf x}}^{old}),$
where $\theta$ is the parameter. WCA replaces
$p(\theta|{{\bf x}}^{old})$ with $p^{wk}(\theta|\eta)$ where the proxy
parameter $\eta$ is estimated by minimizing the Kullback-Leibler (KL)
divergence
${\mathbb{E}}_{p(\theta|{{\bf x}}^{old})}\left\{ \log p(\theta|{{\bf x}}^{old}) - \log p^{wk}(\theta|\eta)\right\}.$
It can be easily approximated when we can generate samples from
$p(\theta|{{\bf x}}^{old}).$ To be more specific, suppose
$\theta_1,\ldots,\theta_M$ are samples generated from
$p(\theta|{{\bf x}}^{old}).$ Then, we can estimate $\eta$ by minimizing
$\sum_{j=1}^M\left\{ \log p(\theta_j|{{\bf x}}^{old}) - \log p^{wk}(\theta_j|\eta)\right\}/M.$
To apply WCA for online learning with multiple batches, suppose the
whole data ${{\bf x}}$ are divided into multiple small batches as
${{\bf x}}=({{\bf x}}^{[1]},\ldots,{{\bf x}}^{[S]}).$ A WCA algorithm
sequentially approximates
$p(\theta|{{\bf x}}^{[1]},\ldots,{{\bf x}}^{[s]})$ by
$p^{wk}(\theta|\eta_s),$ where $\eta_s$ is the proxy parameter
minimizing the approximated KL divergence. Since $p^{wk}(\theta|\eta)$
is weakly conjugate, we can easily generate samples from
$p({{\bf x}}^{[s]}|\theta)p^{wk}(\theta|\eta_{s-1}),$ and hence easily
update $\eta_s.$
We compare several online learning algorithms by analyzing
simulated/real data sets in Dirichlet process mixture models and
hierarchical Dirichlet processes topic models. The proposed method shows
better accuracy in our experiments.
<span>**Keywords:**</span> online learning, weakly conjugate
approximation, Dirichlet process mixture model, hierarchical Dirichlet
processes
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_050"><p class="contribBanner">Wednesday 13<sup>th</sup> 10:30 Case Room 2 (260-057)</p></div>
## Improving The Production Cycle At Stats NZ With RStudio {.unnumbered}
<p style="text-align:center">
Gareth Minshall and Chris Hansen<br />
Stats NZ<br />
</p>
<span>**Abstract:**</span> Stats NZ are looking to move away from the
collection and publication of stand-alone surveys to making use of a
wide range of data sources and estimation strategies. A key component to
enabling this change is to develop the infrastructure which allows
analysts to explore, test and use a range of tools which are not
traditionally heavily used within National Statistics Offices. One of
the tools Stats NZ is looking to make heavier use of is R. This talk
will outline the development of internal RStudio and Shiny servers at
Stats NZ, and give examples demonstrating the types of innovation
RStudio has enabled at Stats NZ to improve the way we produce and
disseminate statistics.
<span>**Keywords:**</span> Shiny, R Markdown, Official Statistics
<span>**Acknowledgement:**</span>
This work was supported by JSPS KAKENHI Grant Number JP16H02013.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_063"><p class="contribBanner">Wednesday 13<sup>th</sup> 10:30 Case Room 3 (260-055)</p></div>
## A Max-Type Multivariate Two-Sample Baumgartner Statistic {.unnumbered}
<p style="text-align:center">
Hidetoshi Murakami<br />
Tokyo University of Science<br />
</p>
<span>**Abstract:**</span> A multivariate two-sample testing problem is
one of the most important topics in nonparametric statistics. Further, a
max-type Baumgartner statistic based on the modified Baumgartner
statistic (Murakami, 2006) was proposed by Murakami (2012) for testing
the equality of two continuous distribution functions. In this paper, a
max-type multivariate two-sample Baumgartner statistic is suggested
based on the Jurečková and Kalina’s ranks of distances (Jurečková and
Kalina, 2012). Simulations are used to investigate the power of the
suggested statistic for various population distributions. The results
indicate that the proposed test statistic is more suitable than various
existing statistics for testing a shift in the location, scale and
location-scale parameters.
<span>**Keywords:**</span> Baumgartner statistic, Jurečková & Kalina’s
ranks of distances, Multivariate two-sample rank test, Power comparison
<span>**References:**</span>
Jurečková, J. and Kalina, J. (2012). Nonparametric multivariate rank
tests and their unbiasedness. <span>*Bernoulli*</span>, **18**, 229–251.
Murakami, H. (2006). A $k$-sample rank test based on the modified
Baumgartner statistic and its power comparison. <span>*Journal of the
Japanese Society of Computational Statistics*</span>,
<span>**19**</span>, 1–13.
Murakami, H. (2012). A max-type Baumgartner statistic for the two-sample
problem and its power comparison. <span>*Journal of the Japanese Society
of Computational Statistics*</span>, **25**, 39–49.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_062"><p class="contribBanner">Wednesday 13<sup>th</sup> 10:30 Case Room 4 (260-009)</p></div>
## Random Search Global Optimization Using Random Forests {.unnumbered}
<p style="text-align:center">
Blair Robertson, Chris Price, and Marco Reale<br />
University of Canterbury<br />
</p>
<span>**Abstract:**</span> The purpose of a global optimization
algorithm is to efficiently find an objective function’s global minimum.
In this talk we consider bound constrained global optimization, where
the search is performed in a box, denoted $\Omega$. The global
optimization problem is deceptively simple and it is usually difficult
to find the global minimum. One of the difficulties is that there is
often no way to verify that a local minimum is indeed the global
minimum. If the objective function is convex, the local minimum is also
the global minimum. However, many optimization problems are not convex.
Of particular interest in this talk are objective functions that lack
any special properties such as continuity, smoothness, or a Lipschitz
constant.
A random search algorithm for bound constrained global optimization is
presented. This algorithm alternates between partition and sampling
phases. At each iteration, points sampled from $\Omega$ are classified
low or high based on their objective function values. These classified
points define training data that is used to partition $\Omega$ into low
and high regions using a random forest. The objective function is then
evaluated at a number of points drawn from the low region and from
$\Omega$ itself. Drawing points from the low region focuses the search
in areas where the objective function is known to be low. Sampling
$\Omega$ reduces the risk of missing the global minimum and is necessary
to establish convergence. The new points are then added to the existing
training data and the method repeats.
A preliminary simulation study showed that alternating between random
forest partition and sampling phases was an effective strategy for
solving a variety of global optimization test problems. The authors are
currently refining the method and extending the set of test problems.
<span>**Keywords:**</span> Bound constrained optimization,
classification and regression trees (CART), stochastic optimization
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_032"><p class="contribBanner">Wednesday 13<sup>th</sup> 10:50 098 Lecture Theatre (260-098)</p></div>
## gridSVG: Then And Now {.unnumbered}
<p style="text-align:center">
Paul Murrell<br />
University of Auckland<br />
</p>
<span>**Abstract:**</span> The <span><span>**gridSVG**</span></span>
package[@RJ-2014-013] was first developed in 2003 to experiment with
features of the SVG format that were not available through a normal R
graphics device[@R], such as hyperlinks and animation. A number of
different R packages[@rsvgtipsdevice; @cairo; @svglite; @svgannotation]
have been developed since then to allow the generation of SVG output
from R, but <span><span>**gridSVG**</span></span> has remained unique in
its focus on generating structured and labelled SVG output. The reason
for that was to maximise support for customisation and reuse,
particularly unforseen reuse, of the SVG output. Unfortunately, there
were two major problems: killer examples of customisation and reuse
failed to materialise; and the production of SVG with
<span><span>**gridSVG**</span></span> was painfully slow. In brief,
<span><span>**gridSVG**</span></span> was a (sluggish) solution waiting
for a problem. This talk charts some of the developments over time that
have seen <span><span>**gridSVG**</span></span>’s patient wait for
relevance ultimately rewarded and its desperate need for speed finally
satisfied.
<span>**Keywords:**</span> R, statistical graphics, SVG, accessibility
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_188"><p class="contribBanner">Wednesday 13<sup>th</sup> 10:50 OGGB4 (260-073)</p></div>
## Probabilistic Outlier Detection And Visualization Of Smart Meter Data {.unnumbered}
<p style="text-align:center">
Rob Hyndman<br />
Monash University<br />
</p>
<span>**Abstract:**</span> It is always a good idea to plot your data before fitting any models,
making any predictions, or drawing any conclusions. But how do you
actually plot data on thousands of smart meters, each comprising
thousands of observations over time? We cannot simply produce time plots
of the demand recorded at each meter, due to the sheer volume of data
involved.
I will propose an approach in which each long series of demand data is
converted to a single two-dimensional point that can be plotted in a
simple scatterplot. In that way, all the meters can be seen in the
scatterplot; so outliers can be detected, clustering can be observed,
and any other interesting structure can be examined. To illustrate, I
will use data collected during a smart metering trial conducted by the
Commission for Energy Regulation (CER) in Ireland.
First we estimate the demand percentiles for each half hour of the week,
giving us 336 probability distributions per household. Then, we compute
the distances between pairs of households using the sum of
Jensen–Shannon distances.
From these pairwise distances, we can compute a measure of the
“typicality” of a specific household, by seeing how many similar houses
are nearby. If there are many households with similar probability
distributions, the typicality measure will be high. But if there are few
similar households, the typicality measure will be low. This gives us a
way of finding anomalies in the data set — they are the smart meters
corresponding to the least typical households.
The pairwise distances between households can also be used to create a
plot of all households together. Each of the household distributions can
be thought of as a vector in $K$-dimensional space where
$K=7\times48\times99 = 33,264$. To easily visualize these, we need to
project them onto a two-dimensional space. I propose using Laplacian
eigenmaps which attempt to preserve the smallest distances — so the most
similar points in $K$-dimensional space are as close as possible in the
two-dimensional space.
This way of plotting the data easily allows us to see the anomalies, to
identify any clusters of observations in the data, and to examine any
other structure that might exist.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_004"><p class="contribBanner">Wednesday 13<sup>th</sup> 10:50 OGGB5 (260-051)</p></div>
## The Joint Models For Nonlinear Longitudinal And Time-To-Event Data Using Penalized Splines: A Bayesian Approach {.unnumbered}
<p style="text-align:center">
Thi Thu Huong Pham, Darfiana Nur, and Alan Branford<br />
Flinders University<br />
</p>
<span>**Abstract:**</span> The joint models for longitudinal data and
time-to-event data have been introduced to measure the association
between longitudinal data and survival time in clinical, epidemiological
and educational studies.. The main aim of this talk is to estimate the
parameters in the joint models using a Bayesian approach for nonlinear
longitudinal data and time-to-event data using penalized splines. To
perform this analysis, the joint posterior distribution of hazard rate
at baseline, survival and longitudinal coefficient and random effects
parameters is first being introduced followed by derivation of the
conditional posterior distributions for each of parameter. Based on
these target posterior distributions, the samples of parameters are
simulated using Metropolis, Metropolis Hastings and Gibbs sampler
algorithms. An R program is written to implement the analysis. Finally,
the prior sensitivity analysis for the baseline hazard rate and
association parameters is performed following by simulations studies and
a case study.
<span>**Keywords:**</span> Bayesian analysis, Joint models, Longitudinal
data, MCMC algorithms, Prior sensitivity analysis, Survival data
<span>**References:**</span>
D. Rizopoulos, D. (2014). The R package JMbayes for fitting joint models
for longitudinal and time-to- event data using MCMC. *Journal of
Statistical Software,* 72(7):1 – 45.
Brown, E. R., J. G. Ibrahim, J. G., DeGruttola, V. (2005). A flexible
B-spline model for multiple longitudinal biomarkers and
survival.*Biometrics,* 61(1):64 – 73.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_053"><p class="contribBanner">Wednesday 13<sup>th</sup> 10:50 Case Room 2 (260-057)</p></div>
## R – A Powerful Analysis Tool To Improve Official Statistics In Romania {.unnumbered}
<p style="text-align:center">
Nicoleta Caragea^1,2^ and Antoniade Ciprian Alexandru^1,2^<br />
^1^National Institute of Statistics<br />
^2^Ecological University of Bucharest<br />
</p>
<span>**Abstract:**</span> This presentation is focused on how R is used
in Romanian official statistics to improve the quality of results
provided by different statistical data sources on the base of
administrative data. Some benefits for statistical analysis come when it
is possible to link administrative records from different registers
together, or when they can be linked with censuses or sample surveys.
Many of these record linkage or matching methods must be done under
statistically conditions, R program being one of the most powerful
analysis tool. In Romania, there has been increasing attention in recent
years to use R in official statistics, through specialized R courses for
statisticians and training on the job sessions. A international
conference on R (uRos) is yearly organized to provide a public forum for
researchers from academia and institutes of statistics. It is also a
continuous work to develop statistics based on Big Data, Romania being
part of the ESSnet Big Data Project.
<span>**Keywords:**</span> R package, data sources, statistics, matching
method, linkage method
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_071"><p class="contribBanner">Wednesday 13<sup>th</sup> 10:50 Case Room 3 (260-055)</p></div>
## Simultaneous Test For Mean Vectors And Covariance Matrices In High-Dimensional Settings {.unnumbered}
<p style="text-align:center">
Takahiro Nishiyama^1^ and Masashi Hyodo^2^<br />
^1^Senshu University<br />
^2^Osaka Prefecture University<br />
</p>
<span>**Abstract:**</span> Let $\mathbf{X}_{g1}, \mathbf{X}_{g2}, \ldots, \mathbf{X}_{gn_g}$ be i.i.d. random samples of
size $n_g$ from a $p$-dimensional population $\Pi_g$
($g \in \{1, 2\}$) with
$\mathrm{E}(\mathbf{X}_{gi})={\boldsymbol\mu}_g$
and ${\rm var}({\mbox{\boldmath ${X}$}}_{gi})=\Sigma_g$
($i \in \{1, \ldots ,n_g\}$). In this talk, our primary interest is to
test following hypothesis when $p > \min\{n_1-1, n_2-1 \}$:
$$\begin{aligned}
H_0 : {\boldsymbol\mu}_1 = {\boldsymbol\mu}_1,~ \Sigma_1 = \Sigma_2 \quad
\mbox{vs.} \quad H_1 : \mbox{not}~ H_0.
\end{aligned}$$
For this problem, we discuss an $L^2$-norm-based test for simultaneous
testing of mean vectors and covariance matrices among two non-normal
populations. To construct a test procedure, we propose a test statistic
based on both unbiased estimator of differences mean vectors proposed by
Chen and Qin (2010) and covariance matrices proposed by Li and Chen
(2012). Also, we derive an asymptotic distribution of this test
statistic and investigate the asymptotic sizes and powers of the
proposed test. Finally, we study the finite sample and dimension
performance of this test via Monte Carlo simulations.
<span>**Keywords:**</span> Asymptotic distribution, High-dimensional
data analysis, Testing hypothesis
<span>**References:**</span>
Chen, S.X. and Qin, Y.L. (2010). A two-sample test for high dimensional
data with applications to gene-set testing. *Ann. Statist.*, **38**,
808–835.
Li, J and Chen, S.X. (2012). Two sample tests for high-dimensional
covariance matrices. *Ann. Statist.*, **40**, 908–940.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_075"><p class="contribBanner">Wednesday 13<sup>th</sup> 10:50 Case Room 4 (260-009)</p></div>
## Dimension Reduction For Classification Of High-Dimensional Data By Stepwise SVM {.unnumbered}
<p style="text-align:center">
Elizabeth Chou and Tzu-Wei Ko<br />
National Chengchi University<br />
</p>
<span>**Abstract:**</span> The purpose of this study is to build a
simple and intuitive wrapper method, stepwise SVM, for reducing
dimension and classification of large p small n datasets. The method
employs a suboptimum search procedure to determine the best subset of
variables for classification. The proposed method is compared with other
dimension reduction methods, such as Pearson product moment correlation
coefficient (PCCs), Recursive Feature Elimination based on Random Forest
(RF-RFE), and Principal Component Analysis (PCA) by using five gene
expression datasets. In this study, we show that stepwise SVM can
effectively select the important variables and perform well in
prediction. Moreover, the predictions of reduced datasets from stepwise
SVM are better than that of the unreduced datasets. Compared with other
methods, the performance of stepwise SVM is more stable than PCA and
RF-RFE but it is difficult to tell the difference in performance from
PCCs. In conclusion, stepwise SVM can effectively eliminate the noise in
data and improve the prediction accuracy.
<span>**Keywords:**</span> Stepwise SVM, Dimension reduction, Feature
selection, High-dimension
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_066"><p class="contribBanner">Wednesday 13<sup>th</sup> 11:10 098 Lecture Theatre (260-098)</p></div>
## Bringing Multimix From Fortran To R {.unnumbered}
<p style="text-align:center">
Murray Jorgensen<br />
Auckland University of Technology<br />
</p>
<span>**Abstract:**</span> Multimix is the name for a class of multivariate finite mixture models designed with clustering (<span>*unsupervised learning*</span>) in mind. It is also a name for a program to fit these models, written in Fortran77 by Lyn Hunt as part of her Waikato PhD thesis.
**Why convert to R?** Although written in the 1990s Multimix is easy to convert to modern GNU Fortran (gfortran) but there are advantages to having an R version available. For users this means a simpler way of reading in the data and describing the form of the model. Also for ongoing development of improvement and modifications of the Multimix models. R's interactive environment provides a more comfortable place for experimentation. Designing the new program. Rather than attempt any sort of translation of the old code, the new R version of Multimix is designed from the beginning as an R program. In my talk I will describe some of the design decisions made and the reasons for them. A particular concern was that the R version be as fast as possible.
**How to package up the new program?** Two versions of Multimix in R have been developed, a <span>*global*</span> version with many global variables employed, and a <span>*nested*</span> version restricting the scope of variables to the surrounding function. The pluses and minuses of each approach will be described. I am conscious that I may not always have made the best design decisions and comments from others will be welcomed.
<span>**Keywords:**</span> multivariate finite mixture models, clustering, package, global, local
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_007"><p class="contribBanner">Wednesday 13<sup>th</sup> 11:10 OGGB4 (260-073)</p></div>
## Specification Of GARCH Model Under Asymmetric Error Innovations {.unnumbered}
<p style="text-align:center">
Oyebimpe Adeniji, Olarenwaju Shittu, and Kazeeem Adepoju<br />
University of Ibadan<br />
</p>
<span>**Abstract:**</span> An empirical analysis of the mean return and conditional variance of
Nigeria Stock Exchange (NSE) index is performed using various error
innovations in GARCH models. Conventional GARCH model which assumed
normal error term failed to capture volatility clustering, leptokurtosis
and leverage effect as a result of zero skewness and kurtosis
respectively. We re-modify error distributions of GARCH (p,q) model
inference using some thick-tailed distributions. Method of Quasi-Maximum
Likelihood Estimation (MLE) was used in parameter estimation. The robust
model that explained the NSE index is determined by loglikelihood and
model selection Criteria. Our result shows that GARCH model with
fat-tailed densities improves overall estimation for measuring
conditional variance. The GARCH model using Beta-Skewed-t distribution
is the most successful for forecasting NSE index.
<span>**Keywords:**</span> GARCH, Nigeria stock index, Maximum Lilkelihood Estimation
(MLE), Beta Skewed -t distributions
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_064"><p class="contribBanner">Wednesday 13<sup>th</sup> 11:10 OGGB5 (260-051)</p></div>
## Performance Of Bayesian Credible Interval For Binomial Proportion Using Logit Transformation {.unnumbered}
<p style="text-align:center">
Toru Ogura^1^ and Takemi Yanagimoto^2^<br />
^1^Mie University Hospital<br />
^2^Institute of Statistical Mathematics<br />
</p>
<span>**Abstract:**</span> The confidence or the credible interval of
the binomial proportion $p$ is one of most widely employed statistical
analysis methods, and a variety of methods have been proposed. The
Bayesian credible interval attracts recent researches’ attentions. One
of the promising methods is the highest posterior density (HPD)
interval, which implies the shortest possible interval enclosing
$100(1-\alpha)$% of the probability density function. The HPD interval
is often used because it is narrow compared to other credible intervals.
However, the HPD interval has some drawbacks when the binomial
proportion is a small. To dissolve them, we calculate first a credible
interval by the HPD interval of the logit transformed parameter,
$\theta=\log\{p/(1-p)\}$, instead of $p$. Note that $\theta$ and $p$ are
the canonical and the mean parameters of the binomial distribution in
the exponential family, respectively. Writing the HPD interval of
$\theta$ as $(\theta_{l}, \theta_{u})$, we define the proposed credible
interval of $p$ as
$(p_{l}, p_{u})= \big( e^{\theta_{l}} / ( 1+e^{\theta_{l}} ), \, e^{\theta_{u}}/(1+e^{\theta_{u}}) \big)$.
It is explored in depth, and numerical comparison studies are conducted
to confirm its favorable performance, especially when the observed
number is small, such as 0 or 1. Practical datasets are analyzed to
examine the potential usefulness for applications in medical fields.
<span>**Keywords:**</span> Bayesian credible interval, binomial
proportion, highest posterior density interval, logit transformation,
zero count
<span>**References:**</span>
Newcombe, R.G. (2012). *Confidence Intervals for Proportions and Related
Measures of Effect Size*. Florida: Chapman and Hall/CRC.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_054"><p class="contribBanner">Wednesday 13<sup>th</sup> 11:10 Case Room 2 (260-057)</p></div>
## Statistical Disclosure Control With R: Traditional Methods And Synthetic Data {.unnumbered}
<p style="text-align:center">
Matthias Templ<br />
Zurich University of Applied Sciences<br />
</p>
<span>**Abstract:**</span> The demand for and volume of data from
surveys, registers or other sources containing sensible information on
persons or enterprises have increased significantly over the last
several years. At the same time, privacy protection principles and
regulations have imposed restrictions on the access and use of
individual data. Proper and secure microdata dissemination calls for the
application of statistical disclosure control methods to the data before
release. Traditional approaches to (micro)data anonymization, including
data perturbation methods, disclosure risk methods, data utility and
methods for simulating synthetic data have been made available in R.
After introducing the audience to the R packages sdcMicro and simPop,
the presentation will focus on new developments and research for
generating close-to-reality synthetic data sets using specific
model-based approaches. The resulting data can work as a proxy of
real-world data and they are useful for training purposes, agent-based
and/or microsimulation experiments, remote execution as well as they can
be provided as public-use files. The strength and weakness of the
methods are highlighted and an (brief) application to the Euorpean
Statistics of Income and Living Condition Survey is given.
<span>**Keywords:**</span> Statistical Disclosure Control,
Anonymization, Disclosure Risk, Synthetic Data
<span>**References:**</span>
Templ, M. (2017). *Statistical Disclosure Control for Microdata. Methods
and Applications in R*, Springer International Publishing.
doi:10.1007/978-3-319-50272-4
Templ, M., Kowarik, A., Meindl, B. (2015). Statistical Disclosure
Control for Micro-Data Using the R Package sdcMicro. *Journal of
Statistical Software*, 67(4), 1-36. doi:10.18637/jss.v067.i04
Templ, M., Kowarik, A., Meindl, B., Dupriez, O. (2017). Simulation of
Synthetic Complex Data: The R Package simPop. *Journal of Statistical
Software*, 79(10), 1-38. doi:10.18637/jss.v079.i10
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_108"><p class="contribBanner">Wednesday 13<sup>th</sup> 11:10 Case Room 3 (260-055)</p></div>
## High Dimensional Asymptotics For The Naive Canonical Correlation Coefficient {.unnumbered}
<p style="text-align:center">
Mitsuru Tamatani^1^ and Kanta Naito^2^<br />
^1^Doshisha University<br />
^2^Shimane University<br />
</p>
<span>**Abstract:**</span> In this talk we investigate the asymptotic
behavior of the estimated naive canonical correlation coefficient under
the normality assumption and High Dimension Low Sample Size (HDLSS)
settings. In general, canonical correlation matrix is associated with
canonical correlation analysis which is useful in studying the
relationship between two sets of variables. However, in HDLSS settings,
the within-class sample covariance matrix $\hat{\Sigma}$ is singular,
because the rank of $\hat{\Sigma}$ is much less than the number of
dimension. To avoid the singularity of $\hat{\Sigma}$ in HDLSS settings,
we utilize the naive canonical correlation matrix with replacing sample
covariance matrix by its diagonal part only. We derive the asymptotic
normality of the estimated naive canonical correlation coefficient, and
compare the results of our numerical studies to the theoretical
asymptotic results.
<span>**Keywords:**</span> High dimension low sample size, Naive
canonical correlation coefficient, Asymptotic normality
<span>**References:**</span>
Tamatani, M., Koch, I. and Naito, K. (2012). *Journal of Multivariate
Analysis*, **111**, 350–367.
Srivastava, M. S. (2011). *Journal of Multivariate Analysis*, **102**,
1190–1103.
Fan, J. and Fan, Y. (2008). *The Annals of Statistics*, **36**,
2605–2637.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_096"><p class="contribBanner">Wednesday 13<sup>th</sup> 11:10 Case Room 4 (260-009)</p></div>
## Deep Learning High-Dimensional Covariance Matrices {.unnumbered}
<p style="text-align:center">
Philip Yu and Yaohua Tang<br />
Unversity of Hong Kong<br />
</p>
<span>**Abstract:**</span> Modeling and forecasting covariance matrices
of asset returns play a crucial role in finance. The availability of
high frequency intraday data enables the modeling of the realized
covariance matrix directly. However, most models in the literature
depend on strong structural assumptions and they also suffer from the
curse of dimensionality. To solve the problem, we propose a deep
learning model which treats each realized covariance matrix as an image.
The network structure is designed with simplicity in mind, and yet
provides superior accuracy compared with several advanced statistical
methods. The model could handle both low-dimensional and
high-dimensional realized covariance matrices.
<span>**Keywords:**</span> Deep learning, Realized covariance matrix,
Convolutional neural network
<span>**References:**</span>
LeCun, Y., Bottou, L., Bengio, Y. and Haffner, P. (1998). Gradient-based
learning applied to document recognition. In *Proceedings of the IEEE*,
86, 2278–2324.
Shen, K., Yao, J. and Li, W. K.(2015). Forecasting High-Dimensional
Realized Volatility Matrices Using A Factor Model. *ArXiv e-prints*.
Tao, M., Wang, Y., Yao, Q. and Zou, J. (2011). Large volatility matrix
inference via combining low-frequency and high-frequency approaches.
*Journal of the American Statistical Association*, 106, 1025–1040.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_141"><p class="contribBanner">Wednesday 13<sup>th</sup> 11:30 098 Lecture Theatre (260-098)</p></div>
## R In Industry – Application On Pipe Renewal Planning {.unnumbered}
<p style="text-align:center">
Glenn Thomas<br />
Harmonic Analytics<br />
</p>
<span>**Abstract:**</span> R has become an increasingly used tool in industry to practically help
councils and organisations with their asset management challenges. We
will demonstrate some of the practical tools Harmonic Analytics has
developed using R to assist in asset management.
One specific example demonstrated will be recent work for a New Zealand
council that was experiencing challenges in long term planning around
its three waters infrastructure. In particular, challenges stem from the
limited information about pipe condition. Using past work order history
as proxy for pipe failures, we present a tool that uses a pipe break
model to inform replacement strategies. The developed tool allows users
to generate and compare both data driven and engineering based scenarios
through a variety of lenses, ranging from annual replacement length to
service level outcomes. A number of visualisations are available to
support comparisons. Data driven scenarios are driven from a variety of
perspectives, such as traditional age based replacement, probability of
failure and minimising the expected number of pipe breaks across the
network.
This kind of work is an exciting step forward, as councils show interest
in collaboration and pooling data to improve accuracy.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_059"><p class="contribBanner">Wednesday 13<sup>th</sup> 11:30 OGGB4 (260-073)</p></div>
## Empirical Comparison Of Some Algorithms For Automatic Univariate ARMA Modeling Using RcmdrPlugin.SPSS {.unnumbered}
<p style="text-align:center">
Dedi Rosadi<br />
Universitas Gadjah Mada<br />
</p>
<span>**Abstract:**</span> In some application of time series modeling,
it is necessary to obtain forecast of various types of data
automatically and possibly, in real-time. For instances, to forecast
large number of univariate series every day, or to do a real-time
processing of the satellite data. Various automatic algorithms for
modeling ARMA models are available in the literature, where here we will
discuss three methods in particular. One of the method is based on a
combination between the best exponential smoothing model to obtain the
forecast, together with state-space approach of the underlying model to
obtain the prediction interval (see Hyndman, 2007). The second method,
which is more advanced method, is based on X-13-ARIMA-SEATS, the
seasonal adjustment software by the US Census Bureau (see Sax , 2015).
From our previous study in Rosadi (2016), we found that these methods
are perform relatively well for SARIMA data. Unfortunately, these
approaches do not working well for many of ARMA data. Therefore in paper
we extend the study by considering an automatic modeling method based on
genetic algorithm approach (see Abo-Hammour, et.al., 2012). These
approaches are implemented in our R-GUI package RcmdrPlugin.Econometrics
which now already integrated in our new and more comprehensive R-GUI
package, namely RcmdrPlugin.SPSS. We provide application of the methods
and the tool. From some empirical studies, we found that for ARMA data,
the method based on genetic algorithm performs better than the other
approaches.
<span>**Keywords:**</span> Automatic ARMA modeling, genetic algorithm,
exponential smoothing, X-13-ARIMA, R-GUI
<span>**References:**</span>
Abo-Hammour, Z. E. S., Alsmadi, O. M., Al-Smadi, A. M., Zaqout, M. I., &
Saraireh, M. S. (2012). ARMA model order and parameter estimation using
genetic algorithms. *Mathematical and Computer Modelling of Dynamical
Systems*, **18(2)**, 201–221.
Hyndman, R. J. (2007). forecast: Forecasting functions for time series,
R package version 1.05.
`URL: http://www.robhyndman.info/Rlibrary/forecast/`.
Sax, C. (2015). Introduction to seasonal: R interface to
X-13ARIMA-SEATS,\
`https://cran.r-project.org/web/packages/seasonal/vignettes/seas.pdf`.
Rosadi, D. (2016). Automatic ARIMA Modeling using RcmdrPlugin.SPSS,
Presented in *COMPSTAT 2016*, Oviedo, Spain, 23-26 August 2016.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_014"><p class="contribBanner">Wednesday 13<sup>th</sup> 11:30 OGGB5 (260-051)</p></div>
## Bayesian Optimum Warranty Length Under Type-II Unified Hybrid Censoring Scheme {.unnumbered}
<p style="text-align:center">
Tanmay Sen^1^, Biswabrata Pradhan^2^, Yogesh Mani Tripathi^1^, and Ritwik Bhattacharya^3^<br />
^1^Indian Institute of Technology Patna<br />
^2^Indian Statistical Institute Kolkata<br />
^3^Centro de Investigacionen Matematicas<br />
</p>
<span>**Abstract:**</span> This work considers determination of optimum
warranty length under Type-II unified hybrid censoring scheme. Consumers
are willing to purchase a highly reliable product with certain cost
constraint. To assure the product reliability and also to remain
profitable, the manufacturer provides warranties on product lifetime.
Moreover, censoredlifetime data are available in practice, to assess the
reliability of the product. Therefore, determination of an appropriate
warranty length based on censored lifetime data is an important issue to
the manufacturer. It is assumed that the lifetime follows a lognormal
distribution. We consider a combine free replacement and pro-rata
warranty policy (FRW/PRW). The life test is conducted under Type-II
unified hybrid censoring scheme. The warranty length is obtained by
maximizing an expected utility function.The expectation is taken with
respect to the posterior predictive model for time to failure given the
available data obtained under Type-II unified hybrid censoring scheme. A
real data set is analyzed to illustrate the proposed methodology. We
propose a non-linear prorate warranty policy and compare them with
linear warranty policy. It is observed that non-linear prorate warranty
policy give larger warranty length with maximum profit
<span>**Keywords:**</span> Lognormal distribution, FRW/PRW policies,
Optimum warranty length, MH algorithm
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_098"><p class="contribBanner">Wednesday 13<sup>th</sup> 11:30 Case Room 2 (260-057)</p></div>
## Imputation Of The 2016 Economic Census For Business Activity In Japan {.unnumbered}
<p style="text-align:center">
Kazumi Wada^1^, Hiroe Tsubaki^2^, Yukako Toko^1^, and Hidemine Sekino^3^<br />
^1^National Statistics Center<br />
^2^Institute of Statistical Mathematics<br />
^3^The Statistics Bureau<br />
</p>
<span>**Abstract:**</span>
R has been used in the field of official statistics in Japan for over
ten years. This presentation takes up the case of the 2016 Economic
Census for Business Activity. The Census aims to identify the structure
of establishments and enterprises in all industries on a national and
regional level, and to obtain basic information to conduct various
statistical surveys by investigating the economic activity of these
establishments and enterprises. The major corporate accounting items,
such as sales, expenses and salaries, surveyed by the census require
imputation to avoid bias. Although ratio imputation is a leading
candidate, it is well known that the ratio estimator is very sensitive
to outliers; therefore, we need to take appropriate measures for this
problem.
Ratio imputation is a special case of regression imputation; however,
the conventional ratio estimator has a heteroscedastic error term, which
is the obstacle of robustification by means of M-estimation. New robust
ratio estimators are developed by segregating the homoscedastic error
term with no relation to the auxiliary variable from the original error.
The computation of the estimators are made by modifying iterative
reweighted least squares (IRLS) algorithm, since it is easy to calculate
and fast to converge. The proposed robustified ratio estimator broadens
the conventional definition of the ratio estimator with regards to the
variance of the error term in addition to effectively alleviating the
influence of outliers. The application of the robust estimator is
expected to contribute to the accuracy of the Census results.
An random number simulation to confirm the characteristics of these
estimators, deciding imputation domains by CART (classification and
regression tree), model selection and preparing necessary rates by
domain for the census data processing are conducted within the R
programming environment.
<span>**Keywords:**</span> GNU R, Outlier, Iteratively reweighted least
squares, Ratio estimator, Official statistics
<span>**Acknowledgement:**</span>
This work was supported by JSPS KAKENHI Grant Number JP16H02013.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_105"><p class="contribBanner">Wednesday 13<sup>th</sup> 11:30 Case Room 4 (260-009)</p></div>
## Applying Active Learning Procedure To Drug Consumption Data {.unnumbered}
<p style="text-align:center">
Yuan-Chin Chang<br />
Academia Sinica<br />
</p>
<span>**Abstract:**</span> We apply the method of active learning to
build a binary classification model for drug consumption data. Due to
the nature of active learning, subject selection is an major issue is
its learning process. There are many kinds of subject selection schemes
proposed in the literature. The subject recruiting procedure may also
depend on its learning target criterion such as accuracy, area under ROC
curve and so on. Moreover, in practical active learning scenarios, the
label information of samples can only be revealed as they are recruited
into training data set, and we will pay the domain experts to label
these selected sample. Therefore, to consider the labelling cost,
how/when to stop an active learning procedure is always an important and
challenging problem in active learning. In this talk, we propose an
active learning procedure targeting at area under an ROC curve, and
based on the idea of robustness, we then used a modified influential
index to locate the most informative samples, sequentially, such that
the learning procedure can achieve the target efficiently. We then apply
our procedure to drug consumption data sets.
<span>**Keywords:**</span> ROC curve, area under curve, active learning,
influential index
<span>**References:**</span>
Calders, T. and Jaroszewicz, S. (2007).
Efficient auc optimization for classification. In <span>*Knowledge
Discovery in Databases: PKDD 2007*</span>, pages 42–53. Springer.
Hampel, F. R. (1974). The influence curve and its role in robust estimation. , 69(346):383–393.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_194"><p class="contribBanner">Wednesday 13<sup>th</sup> 11:50 098 Lecture Theatre (260-098)</p></div>
## R For Everything {.unnumbered}
<p style="text-align:center">
Jared Lander<br />
Lander Analytics<br />
</p>
<span>**Abstract:**</span> Everyone knows I love R. So much that I never
want to leave the friendly environs of R and RStudio. Want to download a
file? Use `download.file`. Want to create a directory? Use `dir.create`.
Sending an email? `gmailr`. Using Git? `git2r`. Building this slideshow?
`rmarkdown`. Writing a book? `knitr`. Let’s take a look at everyday
activities that can be done in R.
<span>**Keywords:**</span> R, RMarkdown, knitr, email, football, git,
download, data, plotting, modeling, logistic regression
<span>**References:**</span>
Lander, J. (2017). *R for Everyone, Second Edition.* New York:
Addison-Wesley.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_143"><p class="contribBanner">Wednesday 13<sup>th</sup> 11:50 OGGB4 (260-073)</p></div>
## R Package For New Two-Stage Methods In Forecasting Time Series With Multiple Seasonality {.unnumbered}
<p style="text-align:center">
Anupama Lakshmanan and Shubhabrata Das<br />
Indian Institute of Management Bangalore<br />
</p>
<span>**Abstract:**</span> Complex multiple seasonality is an important emerging challenge in time
series forecasting. We propose a framework that segregates the task into
two stages. In the first stage, the time series is aggregated at the low
frequency level (such as daily or weekly) and suitable methods such as
regression, ARIMA or TBATS, are used to fit this lower frequency data.
In the second stage, additive or multiplicative seasonality at the
higher frequency levels may be estimated using classical, or
function-based methods. Finally, the estimates from the two stages are
combined.
In this work, we build a package for implementing the above two-stage
framework for modeling time series with multiple levels of seasonality
within R. This would make it convenient to execute and possibly lead to
more practitioners and academicians adopting it. The package would allow
the user to decide the specific methods to be used in the two stages and
also the separation between high and low frequency. Errors are
calculated for both model and validation period, which may be selected
by the user and model selection choices based on different criterion
will be facilitated. Forecast combination may also be integrated with
the developed routine. The schematics will be presented along with
demonstration of the package in several real data sets.
<span>**Keywords:**</span> Additive seasonality, ARIMA, forecast combination, high
frequency, low frequency, multiplicative seasonality, polynomial
seasonality, regression, TBATS, trigonometric seasonality
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_112"><p class="contribBanner">Wednesday 13<sup>th</sup> 11:50 Case Room 2 (260-057)</p></div>
## Analysis Of Official Microdata Using Secure Statistical Computation System {.unnumbered}
<p style="text-align:center">
Kiyomi Shirakawa^1,3^, Koji Chida^2^, Satoshi Takahashi^2^, Satoshi Tanaka^2^, Ryo Kikuchi^2^, and Dai Ikarashi^2^<br />
^1^National Statistics Center<br />
^2^NTT<br />
^3^Hitotsubashi University<br />
</p>
<span>**Abstract:**</span> We introduce some important functions on a
secure computation system and empirically evaluate them using the
statistical computing software R. The secure computation is a
cryptographic technology that enables us to operate data while keeping
the data encrypted. Due to the remarkable aspect, we can construct a
secure on-line analytical system to protect against unauthorized access,
computer virus and internal fraud. Moreover, the function of secure
computation has a benefit for privacy.
So far, we developed a secure computation system that runs R as a
front-end application. In this research, we focus on the analysis of
official microdata using our secure computation system. By employing the
R script language to secure computation, we can potentially make new
functions for the analysis of official microdata on our secure
computation system. We show some examples of functions on the system
using the R script language. A demonstration experiment to verify the
practicality and scalability of the system in the field of official
statistics is also in our scope.
<span>**Keywords:**</span> Secure Computation, Security, Privacy, Big
Data, Official Statistics, R
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_011"><p class="contribBanner">Wednesday 13<sup>th</sup> 11:50 Case Room 4 (260-009)</p></div>
## Presenting Flexi, A Statistical Program For Fitting Variance Models {.unnumbered}
<p style="text-align:center">
Martin Upsdell<br />
AgResearch<br />
</p>
<span>**Abstract:**</span> Flexi is a statistical program designed to fit variance
based models. In this talk I will explore the advantages and
disadvantages of the variance based model compared to the more commonly
adopted mean based approach. Several examples will be given where the
properties of variance based models provide a clearer understanding of
the data. To illustrate the differences in the approach to the data I
will compare Television and Progressive Graphics File methods of
transferring a picture. The Television builds up the global picture from
individual pixels describing a local area of the picture, whereas the
Progressive Graphics File proceeds from the global value of the median
colour of the whole picture to the local value of each individual pixel
by successive refinements. This gives a coarse blocky picture at the
start which refines into a detailed picture at the end. Mean based
models are like television pictures whereas variance based models are
like Progressive Graphics File pictures. The advantages and
disadvantages of the two methods will be discussed.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_197"><p class="keynoteBanner">Keynote: Wednesday 13<sup>th</sup> 13:20 098 Lecture Theatre (260-098)</p></div>
## Space And Circular Time Log Gaussian Cox Processes With Application To Crime Event Data {.unnumbered}
<p style="text-align:center">
Alan Gelfand<br />
Duke University<br />
</p>
<span>**Abstract:**</span> We view the locations and times of a collection of crime events as a space-time point pattern modeled as either a nonhomogeneous Poisson process or a more general log Gaussian Cox process. We need to specify a space-time intensity. Viewing time as circular, necessitates a valid separable and nonseparable covariance functions over a bounded spatial region crossed with circular time. Additionally, crimes are classified by crime type and each crime event is marked by day of the year which we convert to day of the week.
We present marked point pattern models to accommodate such data. Our specifications
take the form of hierarchical models which we fit within a Bayesian framework. We consider
model comparison between the nonhomogeneous Poisson process and the log Gaussian Cox
process as well as separable vs. nonseparable covariance specifications. Our motivating
dataset is a collection of crime events for the city of San Francisco during the year 2012.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Wednesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_056"><p class="contribBanner">Wednesday 13<sup>th</sup> 14:10 098 Lecture Theatre (260-098)</p></div>
## Cluster-Wise Regression Models Combined By A Quasi-Linear Function {.unnumbered}
<p style="text-align:center">
Kenichi Hayashi^1^, Katsuhiro Omae^2^, and Shinto Eguchi^3^<br />
^1^Keio University<br />