forked from jmcurran/NZSA-IASC-Prog
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path01-Monday.Rmd
2887 lines (2450 loc) · 138 KB
/
01-Monday.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Programme And Abstracts For Monday 11^th^ Of December {-}
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 9:10 098 Lecture Theatre (260-098)</p>
## TBA-LT {-}
<p style="text-align:center">
Luke Tierney<br />
University of Iowa<br />
</p>
<span>**Abstract:**</span> TBA
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 10:30 OGGB4 (260-073)</p>
## Effect Of Area Level Deprivation On Body Mass Index: Analysis Of NZ Health Surveys {-}
<p style="text-align:center">
Andrew Adiguna Halim, Arindam Basu, and Raymond Kirk<br />
Unversity of Canterbury<br />
</p>
<span>**Abstract:**</span> Obesity is a growing public health problem in New Zealand but
the trends of its determinants are unclear. We obtained the
confidentialised unit record files (CURF) of the New Zealand Health
Surveys (NZHS) from the Statistics New Zealand, containing multiple sets
of anonymised individual level data from 2002/03 to 2014/15. We assessed
the association between deprivation quintile and compliance with the
dietary guideline, and the prevalence of overweight/obesity. For adults,
we converted Body Mass Index (BMI) variable into tertiles. Then we
regressed the BMI tertiles on deprivation level, ethnicity, age, sex,
physical activity, education, smoking status, fruit guideline, vegetable
guideline, and household income variables using stepwise ordinal
logistic regression with complex survey design. We regressed the BMI
categories on deprivation level, ethnicity, age, sex, household income,
education, fruit guideline, vegetable guideline, soft drink consumption,
and fast food consumption in the child data. We found that people living
in the highest deprivation quintile were more likely to be in the higher
BMI tertile in adults and BMI category in children compared with those
living in the lowest deprivation quintile after adjusting for other
confounding variables. For adults and children the ORs (95
<span>**Keywords:**</span> obesity, BMI, dietary guideline, deprivation, r statistics,
proportional odds regression, survey complex design.
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 10:30 OGGB5 (260-051)</p>
## Calendar-Based Graphics For Visualising People's Daily Schedules {-}
<p style="text-align:center">
Earo Wang, Dianne Cook, and Rob Hyndman<br />
Monash University<br />
</p>
<span>**Calendar-based graphics for visualising people’s daily
schedules**</span>
Earo Wang$^1$, Dianne Cook$^1$ and Rob J Hyndman$^1$
$^1 \;$ Department of Econometrics and Business Statistics, Monash
University, VIC 3800, Australia
<span>**Abstract**</span>. This paper describes a `frame_calendar`
function that organises and displays temporal data, collected on
sub-daily resolution, into a calendar layout. Calendars are broadly used
in society to display temporal information, and events. The
`frame_calendar` uses linear algebra on the date variable to create the
layout. It utilises the grammar of graphics to create the plots inside
each cell, and thus synchronises neatly with <span>**ggplot2**</span>
graphics. The motivating application is studying pedestrian behaviour in
Melbourne, Australia, based on counts which are captured at hourly
intervals by sensors scattered around the city. Faceting by the usual
features such as day and month, was insufficient to examine the
behaviour. Making displays on a monthly calendar format helps to
understand pedestrian patterns relative to events such as work days,
weekends, holidays, and special events. The layout algorithm has several
format options and variations. It is implemented in the R package
<span>**sugrrants**</span>.
<span>**Keywords**</span>. data visualisation, statistical graphics,
time series, R package, grammar of graphics
References {#references .unnumbered}
----------
Van Wijk JJ, Van Selow ER (1999). Cluster and Calendar Based
Visualization of Time Series Data. In *Information Visualization,
1999.(Info Vis’ 99) Proceedings*. 4–9.
Wickham H (2009). *ggplot2: Elegant Graphics for Data Analysis.*
Springer-Verlag New York, New York, NY.
Wickham H, Hofmann H, Wickham C, Cook D (2012). Glyph-maps for Visually
Exploring Temporal Patterns in Climate Data and Models.
*Environmetrics*, **23**(5), 382–393.
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 10:30 Case Room 2 (260-057)</p>
## Nonparametric Test For Volatility In Clustered Multiple Time Series Abstract {-}
<p style="text-align:center">
Paolo Victor Redondo and Erniel Barrios<br />
University of the Philippines Diliman<br />
</p>
Testing Volatility in Clustered Multiple Time Series: A Nonparametric
Approach
Erniel B. Barrios and Paolo Victor T. Redondo School of Statistics,
University of the Philippines Diliman Abstract
We proposed a test for volatility in clustered multiple time series
based on sieve bootstrap. Clustering of observations is intended to
capture contagion effect in multiple time series data, assumed to be
present in the data generating process where the test is based from. We
designed a simulation study to evaluate the test procedure. The method
is further illustrated using data on global stock prices and rice
production among Asian countries. The test is potentially robust to some
distributional assumption but is possibly affected by the nature of
volatility.
Keywords: multiple time series; volatility; nonparametric test; Sieve
Bootstrap
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 10:30 Case Room 3 (260-055)</p>
## IGESS: A Statistical Approach To Integrating Individual Level Genotype Data And Summary Statistics In Genome Wide Association Studies {-}
<p style="text-align:center">
Mingwei Dai^1^, Jingsi Ming^2^, Mingxuan Cai^2^, Jin Liu^3^, Can Yang^4^, Xiang Wan^2^, and Zongben Xu^1^<br />
^1^Duke-NUS Medical School<br />
^2^Hong Kong Baptist University<br />
^3^Hong Kong University of Science and Technology<br />
^4^Xi'an Jiaotong University<br />
</p>
<span>**Abstract**</span>. Recent genome-wide association studies (GWAS) suggests that a complex phenotype is often affected by many variants with small effects, known as "polygenicity". Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by integrating individual level genotype data and summary statistics. An efficient algorithm based on variational inference is developed to handle genome-wide-scale analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohn's Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase statistical power of identifying risk variants and improve risk prediction accuracy.
<span>**Keywords**</span>. GWAS,
functional annotations, variational inference
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 10:30 Case Room 4 (260-009)</p>
## Author Name Identification For Evaluating Research Performance Of Institutes {-}
<p style="text-align:center">
Tomokazu Fujino^1^, Keisuke Honda^2^, and Hiroka Hamada^2^<br />
^1^Fukuoka Women's University<br />
^2^Institute of Statistical Mathematics<br />
</p>
<span>**Author Name Identification for Evaluating Research Performance
of Institutes**</span>
Tomokazu Fujino$^1$, Keisuke Honda$^2$ and Hiroka Hamada$^2$
$^1 \;$ Department of Environmental Science, Fukuoka Women’s University,
Kasumigaoka, Fukuoka 813-8529, Japan
$^2 \;$ Institute of Statistical Mathematics, Tachikawa, Tokyo 190-8562,
Japan
<span>**Abstract**</span>. In this paper, we propose a new framework to
extract a complete list of the articles written by researchers who
belong to a specific research or educational institute from an academic
document database such as Web of Science and Scopus. In this framework,
it is necessary to perform author name identification because the query
for the database is based on the author’s name to extract documents
written before the author comming to the current institute. The
framework is based on the latent dirichlet allocation (LDA), which is a
kind of topic modeling, and some techniques and indices such as synonym
retrieval and inverse document frequency (IDF) are used for enhancing
the framework.
<span>**Keywords**</span>. Institutional Research, Topic Modeling,
Latent Dirichlet Allocation
References {#references .unnumbered}
----------
Tang, L. and Walsh,J.P. (2010). Bibliometric fingerprints: name
disambiguation based on approximate structure equivalence of cognitive
maps. *Scientometrics*, 84(3), 763–784.
Strotmann,A., Zhao,D. and Bubela,T. (2009). Author name disambiguation
for collaboration network analysis and visualization. *Proc. American
Society for Information Science and Technology*, 46(1), 1–20.
Soler,J.M. (2007). Separating the articles of authors with the same
name. *Scientometrics*, 72(2), 281–290.
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 10:50 OGGB4 (260-073)</p>
## Clustering Using Nonparametric Mixtures And Mode Identification {-}
<p style="text-align:center">
Shengwei Hu and Yong Wang<br />
University of Auckland<br />
</p>
<span>**Clustering using Nonparametric Mixtures and Mode
Identification**</span>
Shengwei Hu$^1$ and Yong Wang$^2$
$^1 \;$ Department of Statistics, the University of Auckland, New
Zealand
$^2 \;$ Department of Statistics, the University of Auckland, New
Zealand
<span>**Abstract**</span>. Clustering aims to partition a set of
observations into a proper number of clusters with similar objects
allocated to the same group. Current partitioning methods mainly include
those based on some measure of distance or probability distribution.
Here we propose a mode-based clustering methodology motivated via
density estimation and mode identification procedures. The idea is to
estimate the data-generating probability distribution using a
nonparametric density estimator and then locate the modes of the density
obtained. In the nonparametric mixture models, each mode and the
observations ascend to it correspond to a single cluster. Thus, the
problem of determining the number of clusters can be recast as a mode
merging problem. A criterion of measuring the separability between modes
is also addressed in this work. The most similar modes would be merged
sequentially until the optimal number of clusters is reached. The
performance of the proposed method is investigated on both simulated and
real datasets.
<span>**Keywords**</span>. Clustering, Nonparametric mixtures, Mode
identification
References {#references .unnumbered}
----------
Wang, X. and Wang, Y.: *Nonparametric multivariate density estimation
using mixtures*. Stat. Comput. **25**, 349–-364 (2015).
Li, J., Ray S. and Lindsay B.G.: *A nonparametric statistical approach
to clustering via mode identification*. Journal of Machine Learning
Research. **8**, 1687–-1723 (2007).
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 10:50 OGGB5 (260-051)</p>
## Bayesian Curve Fitting For Discontinuous Function Using Overcomplete Representation With Multiple Kernels {-}
<p style="text-align:center">
Youngseon Lee^1^, Shuhei Mano^2^, and Jaeyong Lee^1^<br />
^1^Institute of Statistical Mathematics<br />
^2^Seoul National University<br />
</p>
<span>**Bayesian curve fitting for discontinuous function using
overcomplete representation with multiple kernels**</span>
Youngseon Lee$^1$, Shuhei Mano$^2$ and Jaeyong Lee$^1$
$^1 \;$ Department of Statistics, College of Natural Science, Seoul
National University, 56-1 Mountain, Sillim-dong, Gwanak-gu, Seoul, Korea
$^2 \;$ The Institute of Statistical Mathematics, 10-3 Midori-cho,
Tachikawa, Tokyo 190-8562, Japan
<span>**Abstract**</span>. We propose a new Bayesian methodology for
estimating discontinuous functions. In this model, the estimated
function is expressed by the overcomplete representation with multiple
kernels. Therefore, the complex shape of functions can be expressed by
the much smaller number of parameters due to the nature of the
sparseness. It does not need any assumptions about the location of
discontinuities, the smoothness of the function, the number of features.
The form of the function taking all of these into account is determined
naturally by the random Levy measure. Simulation data and real data
analysis show that this model is suitable for fitting discontinuous
functions. We also proved theoretical properties about the support of
the function space having jumps in this paper.
<span>**Keywords**</span>. Bayesian, nonparametric regression,
discontinuous curve fitting, overcomplete, multiple kernel, Levy random
field
References {#references .unnumbered}
----------
Chu, J. H., Clyde, M. A., and Liang, F. (2009). Bayesian function
estimation using continuous wavelet dictionaries, *Statistica Sinica*,
1419–1438
Clyde, M. A., and Wolpert, R. L. (2007). Nonparametric function
estimation using overcomplete dictionaries, *Bayesian Statistics*,
**8**, 91–114.
Green, Peter J. (1995). Reversible jump Markov chain Monte Carlo
computation and Bayesian model determination, *Biometrika*, **82(4)**,
711–732.
Khinchine, Alexander Ya and L<span>é</span>vy, Paul (1936). Sur les lois
stables, *CR Acad. Sci. Paris*, **202**, 374–376.
M<span>ü</span>ller, P., and Quintana, F. A. (2004). Nonparametric
Bayesian data analysis, *Statistical science*, 95–110
Pillai, N. S., Wu, Q., Liang, F., Mukherjee, S., and Wolpert, R. L.
(2007). Characterizing the function space for Bayesian kernel models,
*Journal of Machine Learning Research*, **8**, 1769–1797.
Qiu, Peihua (2011). *Jump Regression Analysis*. Springer.
Wolpert, R. L., Clyde, M. A., and Tu, C. (2011). Stochastic expansions
using continuous dictionaries: L<span>é</span>vy adaptive regression
kernels, The *Annals of Statistics*, 1916–1962.
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 10:50 Case Room 2 (260-057)</p>
## Estimation Of A Semiparametric Spatiotemporal Models With Mixed Frequency {-}
<p style="text-align:center">
Vladimir Malabanan, Erniel Barrios, and Joseph Ryan Lansangan<br />
University of the Philippines Diliman<br />
</p>
Estimation of a Semiparametric Spatiotemporal Models with Mixed
Frequency
Vladimir A. Malabanan, Erniel B. Barrios, Joseph Ryan G. Lansangan
School of Statistics, University of the Philippines Diliman Abstract
A semiparametric spatiotemporal model is postulated with data measured
at varying frequency. The model optimizes utilization of information
from variables measured at higher frequency by estimating its
nonparametric effect on the response through the backfitting algorithm.
Simulation studies support the optimality of the model over simple
generalized additive model with aggregation of high frequency data. The
method is then used in analyzing the spatiotemporal dynamics of corn
yield based on some remotely-sensed data as covariates.
Keywords: spatiotemporal model, semiparametric model, backfitting, mixed
frequency
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 10:50 Case Room 3 (260-055)</p>
## LSMM: A Statistical Approach To Integrating Functional Annotations With Genome-Wide Association Studies {-}
<p style="text-align:center">
Jingsi Ming^1^, Mingwei Dai^2^, Mingxuan Cai^1^, Xiang Wan^1^, Jin Liu^3^, and Can Yang^4^<br />
^1^Duke-NUS Medical School<br />
^2^Hong Kong Baptist University<br />
^3^Hong Kong University of Science and Technology<br />
^4^Xi'an Jiaotong University<br />
</p>
<span>**LSMM: A statistical approach to integrating functional
annotations with genome-wide association studies**</span>
Jingsi Ming$^1$, Mingwei Dai$^{2,5}$, Mingxuan Cai$^1$, Xiang Wan$^3$,
Jin Liu$^4$ and Can Yang$^5$
$^1 \;$ Department of Mathematics, Hong Kong Baptist University, Hong
Kong
$^2 \;$ School of Mathematics and Statistics, Xi’an Jiaotong University,
Xi’an, China
$^3 \;$ Department of Computer Science, Hong Kong Baptist University,
Hong Kong
$^4 \;$ Centre for Quantitative Medicine, Duke-NUS Medical School,
Singapore
$^5 \;$ Department of Mathematics, The Hong Kong University of Science
and Technology, Hong Kong
<span>**Abstract**</span>. Thousands of risk variants underlying complex
phenotypes have been identified in genome-wide association studies
(GWAS). However, there are two major challenges towards fully
characterizing the biological basis of complex diseases. First, many
complex traits are suggested to be highly polygenic, whereas a large
proportion of risk variants with small effects remains unknown. Second,
the functional roles of the majority of GWAS hits in the non-coding
region is largely unclear. In this paper, we propose a latent sparse
mixed model (LSMM) to address the challenges by integrating functional
annotations with summary statistics from GWAS. An efficient variational
expectation-maximization (EM) algorithm is developed. We conducted
comprehensive simulation studies and then applied it to 30 GWAS of
complex phenotypes integrating 9 genic annotation categories and 127
tissue-specific functional annotations from the Roadmap project. The
results demonstrate that LSMM is not only able to increase the
statistical power to identify risk variants, but also provide a deeper
understanding of genetic architecture of complex traits by detecting
relevant functional annotations. <span>**Keywords**</span>. GWAS,
functional annotations, variational inference
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 10:50 Case Room 4 (260-009)</p>
## A Study Of The Influence Of Articles In The Large-Scale Citation Network {-}
<p style="text-align:center">
Frederick Kin Hing Phoa^1^ and Livia Lin Hsuan Chang^2^<br />
^1^Academia Sinica<br />
^2^Institute of Statistical Mathematics<br />
</p>
<span>**A Study of the Influence of Articles in the Large-Scale Citation
Network**</span>
Frederick K. H. Phoa and Livia Lin Hsuan Chang\
<span>*Institute of Statistical Science, Academia Sinica, Taipei 115,
Taiwan.*</span>\
> <span>**Abstract:**</span> Nowadays there are many research metrics at
> the author-, article-, journal-levels, like the impact factors and
> many others. However, none of them possess a universally meaningful
> interpretation on the research influence at all levels, not mentioning
> that many are subject-biased and consider neighboring relations only.
> In this work, we introduce a new network-based research metric called
> the network influence. It utilizes all information in the whole
> network and it is universal to any levels. Due to its statistical
> origin, this metric is computationally efficient and statistically
> interpretable even if one applies it to a large-scale network. This
> work demonstrates the analysis of networks via network influence using
> a large-scale citation database called the Web of Science. By just
> considering the articles among statistics community in 2005-2014, the
> network influence of all articles are calculated and compared,
> resulting in a top-ten important articles that are slightly different
> from the list via impact factors. This metric can be easily extended
> to author citation network and many similar networks embedded in the
> Web of Science.
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 11:10 OGGB4 (260-073)</p>
## Estimation Of A High-Dimensional Covariance Matrix {-}
<p style="text-align:center">
Xiangjie Xue and Yong Wang<br />
University of Auckland<br />
</p>
<span>**Estimation of a High-Dimensional Covariance Matrix**</span>
Xiangjie Xue$^*$ and Yong Wang$^*$
$^* \;$ Department of Statistics, The University of Auckland, New
Zealand.
<span>**Abstract**</span>. The estimation of covariance or precision
(inverse covariance) matrices plays a prominent role in multivariate
analysis. The usual estimator, the sample covariance matrix, is known to
be unstable and ill-conditioned in high-dimensional setting. In the past
two decades, various methods have been developed to give a stable and
well-conditioned estimator and they have their own advantages and
disadvantages. We will review some of the most popular methods and
describe a new method to estimate the correlation matrix and hence the
covariance matrix using the empirical Bayes method. Similar to many
element-wise methods in the literature, we also assume that the elements
in a correlation matrix are independent of each other. We use the fact
that the elements in a sample correlation matrix can be approximated by
the same one-parameter normal distribution with unknown means , along
with the non-parametric maximum likelihood estimation to give a new
estimator of the correlation matrix. Preliminary simulation results show
that the new estimator has some advantages over various thresholding
methods in estimating sparse covariance matrices.
<span>**Keywords**</span>. Big Data, Multivariate Analysis, Statistical
Inference
References {#references .unnumbered}
----------
Efron, B., 2010. *Correlated $z$-values and the accuracy of large-scale
statistical estimates*. J Am Stat Assoc **105**, 1042 - 1055.
Fan, J., Liao, Y., Liu, H., 2016. *An overview of the estimation of
large covariance and precision matrices*. Econometrics Journal **19**,
C1 - C32.
Wang, Y., 2007. *On fast computation of the non-parametric maximum
likelihood estimate of a mixing distribution*. Journal of the Royal
Statistical Society: Series B **69**, 185 - 198.
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 11:10 OGGB5 (260-051)</p>
## Innovative Bayesian Estimation In The von Mises Distribution {-}
<p style="text-align:center">
Yuta Kamiya^1^, Toshinari Kamakura^1^, and Takemi Yanagimoto^2^<br />
^1^Chuo University<br />
^2^Institute of Statistical Mathematics<br />
</p>
<span>**Innovative Bayesian Estimation in the von-Mises
Distribution**</span>
Yuta Kamiya$^1$ , Toshinari Kamakura$^2$ and Takemi Yanagimoto$^3$
$^1 \;$ Graduate School of Industrial and Systems Engineering, Chuo
University, Japan
$^2 \;$ Department of Industrial and Systems Engineering, Chuo
University, Japan
$^3 \;$ Institute of Statistical Mathematics, Japan
<span>**Abstract**</span>. In spite of recent growing interest in
applying the von-Mises distribution to circular data in various
scientific fields, researches on the parameter estimation are
surprisingly sparse. The standard estimators are the MLE and the maximum
marginal likelihood estimator (Schou 1978). Although Bayesian estimators
are promising, it looks that they have not been fully developed. We
propose the posterior mean of the canonical parameter, instead of the
mean parameter, under the reference prior. This estimator satisfies an
optimality property, and performs favorably for wide ranges of true
parameters. Extensive simulation studies yield that the risks of the
proposed estimator are significantly small, compared with the existing
estimators. An interesting finding is that the estimating function for
the dispersion parameter behaves reasonably. Notable advantages of the
present approach are its straightforward extensions to various
procedures, including Bayesian estimator under an informative prior
based on the reference prior. The proposed estimator is examined by
applying to practical datasets.
<span>**Keywords**</span>. von-Mises distribution, bayesian estimation,
canonical parameter
References {#references .unnumbered}
----------
Fisher, Nicholas I. <span>*Statistical analysis of circular
data.*</span> Cambridge University Press, 1995.
Schou, Geert. “Estimation of the concentration parameter in von
Mises–Fisher distributions.” Biometrika 65.2 (1978): 369-377.
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 11:10 Case Room 2 (260-057)</p>
## Evidence Of Climate Change From Nonparametric Change-Point Analysis {-}
<p style="text-align:center">
Angela Nalica, Paolo Redondo, Erniel Barrios, and Stephen Villejo<br />
University of the Philippines Diliman<br />
</p>
Evidence of Climate Change from Nonparametric Change-point Analysis
Stephen Jun Villejo, Paolo Victor Redondo, Angela Nalica, Erniel Barrios
University of the Philippines Diliman
Abstract Suppose that the time series data is sufficiently explained by
a model, e.g., autoregressive model, transfer function model. A
change-point is considered to exist if any of the model parameters is
substantially different in two or more regimes. We proposed a test for
existence of a change-point (assuming that location of the change is
known) based on nonparametric bootstrap. The method is used in verifying
whether the southern oscillation index exhibits change-point which is
taken as an evidence of climate change. There is indeed an evidence of
climate change in the period. Keywords: change-point analysis, block
bootstrap, southern oscillation index (SOI)
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 11:10 Case Room 3 (260-055)</p>
## Joint Analysis Of Individual Level Genotype Data And Summary Statistics By Leveraging Pleiotropy {-}
<p style="text-align:center">
Mingwei Dai^1^, Jin Liu^2^, and Can Yang^3^<br />
^1^Duke-NUS Medical School<br />
^2^Hong Kong University of Science and Technology<br />
^3^Xi'an Jiaotong University<br />
</p>
<span>**Joint Analysis of Individual Level Genotype Data and Summary
Statistics by Leveraging Pleiotropy**</span>
Mingwei Dai$^1$, Can Yang$^2$ and Jin Liu$^3$
$^1 \;$ School of Mathematics and Statistics, Xi’an Jiaotong University,
Xi’an, China
$^2 \;$ Department of Mathematics, Hong Kong University of Science and
Technology, Hong Kong
$^3 \;$ Centre of Quantitative Medicine, Duke-NUS Medical School,
Singapore
<span>**Abstract**</span>. Results from Genome-wide association studies
(GWAS) suggest that a complex phenotype is often affected by many
variants with small effects, known as “polygenicity”. Tens of thousands
of samples are often required to ensure statistical power of identifying
these variants with small effects. However, it is often the case that a
research group can only get approval for the access to individual-level
genotype data with a limited sample size (e.g., a few hundreds or
thousands). Meanwhile, pleiotropy is a pervasive phenomenon in genetics
whereby a DNA variant influences multiple traits, and summary statistics
for genetically related traits (e.g., autoimmune diseases or psychiatric
disorders) are becoming publicly available. The sample sizes associated
with the summary statistics data sets are usually quite large. How to
make the most efficient use of existing abundant data resources largely
remains an open problem.
In this study, we propose a statistical approach, LEP, to increasing
statistical power of identifying risk variants and improving accuracy of
risk prediction by integrating individual level genotype data and
summary statistics by veraging leiotropy. An efficient algorithm based
on variational inference is developed to handle the genome-wide
analysis. Through comprehensive simulation studies, we demonstrated the
advantages of LEP over the methods which take either individual-level
data or summary statistics data as input. We applied LEP to perform
integrative analysis of several auto-immune diseases from WTCCC and
summary statistics from other studies. LEP was able to significantly
increase the statistical power of identifying risk variants and improve
the risk prediction accuracy by jointly analyzing autoimmune diseases.
<span>**Keywords**</span>. GWAS, pleiotropy, polygenicity, summary
statistics, variational inference
References {#references .unnumbered}
----------
Solovieff N, Cotsapas C, Lee P H, et al. (2013) Pleiotropy in complex
traits: challenges and strategies In: *Nature reviews. Genetics* 14(7):
483.
Carbonetto P, Stephens M. (2012) Scalable variational inference for
Bayesian variable selection in regression, and its accuracy in genetic
association studies In: *Bayesian analysis* 7(1): 73-108.
Chung D, Yang C, Li C, et al. (2014). GPA: a statistical approach to
prioritizing GWAS results by integrating pleiotropy and annotation In:
*PLoS genetics*
Dai M, Ming J, Cai M, et al. (2017). IGESS: a statistical approach to
integrating individual-level genotype data and summary statistics in
genome-wide association studies. In: *Bioinformatics*
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 11:10 Case Room 4 (260-009)</p>
## An Advanced Approach For Time Series Forecasting Using Deep Learning {-}
<p style="text-align:center">
Balaram Panda<br />
Inland Revenue Department<br />
</p>
<span>**An Advanced Approach for Time Series Forecasting using Deep
Learning**</span>
Balaram Panda$^1$
$^1 \;$ Data Scientist, Inland Revenue Department, New Zealand
<span>**Abstract**</span>. Time series forecasting is a decade-long
research and which is being evolving day by day. Due to the recent
advancement is deep learning technique many of the complex problems have
been solved using deep learning. Deep learning techniques have shown
tremendous better performance in supervised learning problem. One of the
reasons for this success is the ability of deep feedforward network
methods to learn multiple feature interaction for a single instance.
However, the time-dependent nature not being captured by deep
feedforward network till the evolution of RNN(recurrent neural network)
and LSTM(long short term memory) network architecture. This paper
reveals the success of LSTM time series in comparison with ARIMA and
other standard approaches for time series modeling. A sensitivity
analysis is also conducted to explore the effect of hyper parameter
tuning on LSTM model to reduce the time series forecasting error. We
also derive practical advice from our empirical results for those
interested in getting most out of LSTM time series for modern time
series forecasting.
<span>**Keywords**</span>. Deep Learning, Time Series, LSTM, RNN
References {#references .unnumbered}
----------
Längkvist, Martin, Lars Karlsson, and Amy Loutfi. “A review of
unsupervised feature learning and deep learning for time-series
modeling.” Pattern Recognition Letters 42 (2014): 11-24.
Zheng, Yi, et al. “Time series classification using multi-channels deep
convolutional neural networks.” International Conference on Web-Age
Information Management. Springer, Cham, 2014.
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 11:30 OGGB4 (260-073)</p>
## Genetic Map Estimation Using Hidden Markov Models In The Presence Of Partially Observed Information {-}
<p style="text-align:center">
Timothy Bilton^1,2^, Matthew Schofield^1^, Ken Dodds^2^, and Michael Black^1^<br />
^1^AgResearch<br />
^2^University of Otago<br />
</p>
<span>**Genetic map estimation using hidden Markov models in the
presence of partially observed information**</span>
Timothy P. Bilton^1,2^, Matthew R. Schofield^1^, Ken G. Dodds^2^ and
Michael A. Black^3^\
$^1 \;$ Department of Mathematics and Statistics, University of Otago,
Dunedin, New Zealand
$^2 \;$ Invermay Agricultural Centre, AgResearch, Mosgiel, New Zealand
$^3 \;$ Department of Biochemistry, University of Otago, Dunedin, New
Zealand
<span>**Abstract**</span>. A genetic linkage map shows the relative
position of and genetic distance between markers, positions of the
genome that exhibit variation, and underpins the study of species’
genomes in a number of scientific applications. Genetic maps are
constructed by tracking the transmission of genetic information from
individuals to their offspring, which is frequently modelled using a
hidden Markov model (HMM) since only the expression and not the
transmission of genetic information is observed. However, constructing
genetic maps with data generated using the latest sequencing technology
is complicated by the fact that some observations are only partially
observed which, if unaccounted for, typically results in inflated
estimates. We extend the HMM to model partially observed information by
including an additional layer of latent variables. In addition, we
investigate several different approaches for computing confidence
intervals of the genetic map estimates obtained from the extended HMM.
Results show that our model is able to produce accurate genetic map
estimates, even in situations where a large proportion of the data is
only partially observed. Our methodology has been implemented in the R
package GusMap.
<span>**Keywords**</span>. hidden Markov models, linkage mapping,
partially observed data, confidence intervals
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 11:30 OGGB5 (260-051)</p>
## A Simple Method For Grouping Patients Based On Historical Doses {-}
<p style="text-align:center">
Shengli Tzeng<br />
China Medical University<br />
</p>
<span>**A simple method for grouping patients based on historical
doses**</span>
ShengLi Tzeng$^1$
$^1 \;$ Department of Public Health, China Medical University, Taichung,
40402, Taiwan
<span>**Abstract**</span>
Monitoring dose patterns over time helps physicians and patients learn
more about metabolic change, disease evolution, etc. One way to turn
such longitudinal data into clinically useful information is through
cluster analysis, which aims to separate the “profiles of doses” among
patients into homogeneous subgroups. Different doses patterns reflect
heterogeneity in patients’ characteristics and effectiveness of therapy.
However, not all patients were prescribed at regular time points, and
missing values seems ubiquitous if one aligns records at distinct time
points. Moreover, a few outliers may heavily influence the estimation
for within and/or between variations of clusters, making the distinction
among clusters blurred. In this study, a simple method based on a novel
pairwise dissimilarity is proposed, which also serves as a screen tool
to detect potential outliers. We use smoothing splines, handling data
observed either at regular or irregular time points, and measure the
dissimilarity between patients based on pairwise varying curve estimates
with commutation of smoothing parameters. It takes into account the
estimation uncertainty and is not strongly affected by outliers. The
effectiveness of our proposal is shown by simulations comparing it to
other dissimilarity measures and by a real application to methadone
dosage maintenance levels.
<span>**Keywords**</span>. Clustering, longitudinal data, smoothing
splines, outliers
References {#references .unnumbered}
----------
Lin, Chien-Ju, Christian Hennig, and Chieh-Liang Huang. (2016).
Clustering and a dissimilarity measure for methadone dosage time series.
In *Analysis of Large and Complex Data*, 31-41. Springer, Switzerland.
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 11:30 Case Room 2 (260-057)</p>
## Semiparametric Mixed Analysis Of Covariance Model {-}
<p style="text-align:center">
Virgelio Alao, Erniel Barrios, and Joseph Ryan Lansangan<br />
University of the Philippines Diliman<br />
</p>
Semiparametric Mixed Analysis of Covariance Model
Virgelio M. Alao Visayas State University Erniel B. Barrios Joseph Ryan
G. Lansangan University of the Philippines Diliman
ABSTRACT
A semiparametric mixed analysis of covariance model is postulated and
estimated using the two procedures: based on an imbedded restricted
maximum likelihood (REML) and nonparametric regression (smoothing
splines) estimation into the backfitting framework (ARMS); and infusing
bootstrap into the ARMS (B-ARMS). The heterogeneous effect of covariates
across the groups is postulated to affect the response through a
nonparametric function to mitigate overparameterization. Using
simulation studies, we exhibited the capability of the postulated model
(and estimation procedures) in increasing predictive ability and
stabilizing variance components estimates even for small sample size and
with minimal covariate effect, and regardless of whether the model is
correctly specified or there is misspecification error.
Keywords: mixed ANCOVA model, nonparametric regression, backfitting,
bootstrap, random effects, variance components
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 11:30 Case Room 3 (260-055)</p>
## Adaptive False Discovery Rate Regression With Application In Integrative Analysis Of Large-Scale Genomic Data {-}
<p style="text-align:center">
Can Yang<br />
Hong Kong University of Science and Technology<br />
</p>
<span>**Adaptive False Discovery Rate regression with application in
integrative analysis of large-scale genomic data**</span>
Can YANG$^1$
$^1 \;$ Department of Mathematics, The Hong Kong University of Science
and Techonolgy, Clear Water Bay, Hong Kong.
<span>**Abstract**</span>. Recent international projects, such as the
Encyclopedia of DNA Elements (ENCODE) project, the Roadmap project and
the Genotype-Tissue Expression (GTEx) project, have generated vast
amounts of genomic annotation data, e.g., epigenome and transcriptome.
There is great demanding of effective statistical approaches to
integrate genomic annotations with the results from genome-wide
association studies. In this talk, we introduce a statistical framework,
named AdaFDR, for integrating multiple annotations to characterize
functional roles of genetic variants that underlie human complex
phenotypes. For a given phenotype, AdaFDR can adaptively incorporates
relevant annotations for prioritization of genetic risk variants,
allowing nonlinear effects among these annotations, such as interaction
effects between genomic features. Specifically, we assume that the prior
probability of a variant associated with the phenotype is a function of
its annotations $F(X)$, where $X$ is the collection of the annotation
status and $F(X)$ is an ensemble of decision trees, i.e.,
$F(X) = \sum_k f_k(X)$ and $f_k(X)$ is a shallow decision tree. We have
developed an efficient EM-Boosting algorithm for model fitting, where a
shallow decision tree grows in a gradient-Boosting manner (Friedman J.
2001) at each EM-iteration. Our framework inherits the nice property of
gradient boosted trees: (1) The gradient accent property of the Boosting
algorithm naturally guarantees the convergence of our EM-Boosting
algorithm. (2) Based on the fitted ensemble $\hat{F}(X)$, we are able to
rank the importance of annotations, measure the interaction among
annotations and visualize the model via partial plots (Friedman J.
2008). Using AdaFDR, we performed integrative analysis of genome-wide
association studies on human complex phenotypes and genome-wide
annotation resources, e.g., Roadmap epigenome. The analysis results
revealed interesting regulatory patterns of risk variants. These
findings deepen our understanding of genetic architectures of complex
phenotypes. The statistical framework developed here is also broadly
applicable to many other areas for integrative analysis of rich data
sets.
<span>**Keywords**</span>. False Discovery Rate, integrative analysis,
functional annotation, genomic data
References {#references .unnumbered}
----------
Friedman, Jerome H (2001). Greedy function approximation: a gradient
boosting machine, *Annals of statistics*, **29:5**,1189–1232.
Jerome H. Friedman and Bogdan E. Popescu (2008) Predictive Learning via
Rule Ensembles *The Annals of Applied Statistics*, **2:3**, 916–954
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 11:30 Case Room 4 (260-009)</p>
## Structure Of Members In The Organization To Induce Innovation: Quantitatively Analyze The Capability Of The Organization {-}
<p style="text-align:center">
Yuji Mizukami^1^ and Junji Nakano^2^<br />
^1^Institute of Statistical Mathematics<br />
^2^Nihon University<br />
</p>
<span>**Structure of Members in the Organization to Induce Innovation:
Quantitatively Analyze the Capability of the Organization**</span>
Yuji Mizukami$^1$ and Junji Nakano$^2$
$^1 \;$ Nihon University, 1-2-1 Izumicho, Narashino, Chiba 275-8575,
Japan
$^2 \;$ Institute of Statistical Mathematics, 10-3 Midori-cho,
Tachikawa, Tokyo 190-8562, Japan
<span>**Abstract**</span>. Innovation is the act of creating new value
by using “new connection”, “new point of view”, “new way of thinking”,
“new usage method” (Schumpeter 1912). In recent years, the promotion of
the Innovation has been strongly encouraged. In the field of research,
attempts are also being made to create new value through connection
between those fields. Moreover, along with the move to promote
integration among these research fields, research is being conducted to
grasp and promote the degree of them. In this research, for the purpose
of providing indices for measuring the degree of them, we show indices
quantitatively indicating the degree of fusion in different fields and
the distance between the fields. Also, we have try to present indices
for grasping the whole image based on the random graph.
<span>**Keywords**</span>. Research Metrix, Institute Research,
Co-author analysis
References {#references .unnumbered}
----------
Wagner, C. S., Roessner, J. D., Bobb, K., Klein, J. T., Boyack, K. W.,
Keyton, J. and Börner, K. (2011). *Approaches to understanding and
measuring interdisciplinary scientific research: A review of the
literature, Journal of Informetrics*. Vol. 5, No. 1, pp. 14-26.
Mizukami, Y., Mizutani, Y., Honda, K., Suzuki, S., Nakano, J. (2017).
*An International Research Comparative Study of the Degree of
Cooperation between disciplines within mathematics and mathematical
sciences, Behaviormetrika*, **1**, 19 pages, On-line.
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 11:50 OGGB4 (260-073)</p>
## Vector Generalized Linear Time Series Models {-}
<p style="text-align:center">
Victor Miranda and Thomas Yee<br />
University of Auckland<br />
</p>
<span>**Vector Generalized Linear Time Series Models**</span>
Victor Miranda$^1$ and Thomas Yee$^1$
$^1 \;$ Department of Statistics, University of Auckland, Auckland, NZ.
<span>**Abstract**</span>. Since the introduction of the ARMA class in
the early 1970s many time series (TS) extensions have been proposed,
e.g., vector ARMA and GARCH-type models for heteroscedasticity. The
result has been a plethora of models having pockets of substructure but
little overriding framework. In this talk we propose a class of TS
models called Vector Generalized Linear Time Series Models (VGLTSM),
which can be thought of as multivariate generalized linear models
directed towards time series data. The crucial VGLM ideas are constraint
matrices, vector responses and covariate-specific linear predictors, and
estimation by iteratively reweighted least squares and Fisher scoring.
The only addition to the VGLM framework is a log-likelihood that depends
on past values. We show how several popular sub-classes of TS models are
accommodated as special cases of VGLMs, as well as new work that
broadens TS modelling even more. Algorithmic details of its
implementation in , and properties such as stationarity, parameters
depending on covariates, expected information matrices and cointegrated
TS are surveyed.
<span>**Keywords**</span>. VGLM, time series, Fisher scoring.
References {#references .unnumbered}
----------
Yee, T. W. (2015) *Vector Generalized Linear and Additive Models: With
an Implementation in R.* New York, USA: Springer.
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 11:50 OGGB5 (260-051)</p>
## Local Canonical Correlation Analysis For Multimodal Labeled Data {-}
<p style="text-align:center">
Seigo Mizutani and Hiroshi Yadohisa<br />
Doshisha University<br />
</p>
<span>**Local Canonical Correlation Analysis for Multimodal Labeled
Data**</span>
Seigo Mizutani$^1$ and Hiroshi Yadohisa$^2$
$^1 \;$ Graduate School of Culture and Information Science, Doshisha
University, Kyoto, JAPAN
$^2 \;$ Faculty of Culture and Information Science, Doshisha University,
Kyoto, JAPAN
<span>**Abstract**</span>
In supervised learning, canonical correlation analysis (CCA) is widely
used for dimension reduction problems. When using dimension reduction
methods, researchers should always aim to preserve the data structure in
a low dimensional space. However, if the obtained data are assumed to be
multimodal labeled data, that is, each cluster can be subdivided into
several latent clusters, CCA is rarely able to preserve the data
structure in a low dimensional space.
In this study, we propose local CCA (LCCA) for multimodal labeled data.
This method is based on local Fisher discriminant analysis (LFDA)
(Sugiyama, 2007). We do not employ the same local covariance matrix of
the explanatory variables as under LFDA, which uses a local
between-group variance matrix and a local within-group variance matrix.
Instead, in our proposed method, we use a covariance matrix of the
explanatory variables as well as a weighted affinity matrix. The
usefulness of LCCA in data visualization and clustering is then
demonstrated by simulation studies.
<span>**Keywords**</span>. Supervised learning, Dimension reduction,
Local Fisher discriminant analysis (LFDA), Weighted affinity matrix
References {#references .unnumbered}
----------
Sugiyama, M. (2007). Dimensionality reduction of multimodal labeled data
by local Fisher discriminant analysis. *Journal of Machine Learning
Research*, **8**, 1027-1061.
Hastie, T. and Buja, A. and Tibshirani, R. (1995) Penalized discriminant
analysis., 73-102.
Hotelling, H. (1936). Relations between two sets of variates.
*Biometrika*, **28**, 321-377.
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 11:50 Case Room 2 (260-057)</p>
## A Practitioners Guide To Deep Learning For Predictive Analytics On Structured Data {-}
<p style="text-align:center">
Balaram Panda and Habib Baluwala<br />
Inland Revenue Department<br />
</p>
<span>**Abstract**</span>. Recently, deep learning techniques have shown
remarkably strong performance in problems involving unstructured data
(ex. text, image, and video). One of the reasons for this success is the
ability of deep learning methods to learn multiple levels of abstraction
and feature interaction. However, the advantages of using deep learning
techniques for structured/ event/transactional data has not been studied
in detail. The purpose of this paper is to review the advantages and
limitations of using deep feed forward networks on structured data. This
is achieved by comparing the performance of deep feed forward networks
with conventional machine learning techniques applied on a large
structured dataset for classification problem. The paper also describes
methodologies for optimizing the deep feed forward networks to achieve
better accuracy and different approaches to reduce over fitting for deep
feed forward network. A sensitivity analysis is conducted to explore the
effect of hyper parameter tuning on model performance. We also derive
practical advice from our extensive empirical results for those
interested in getting most out of deep feed forward networks for real
world settings.
<span>**Keywords**</span>. Deep Learning, deep feed forward networks,
machine learning, R, Tensorflow, Python
<span>**References**</span>
Bengio, Yoshua. “Learning deep architectures for AI.” Foundations and
trends® in Machine Learning 2.1 (2009): 1-127.
Goodfellow, Ian J., et al. “Maxout networks.” arXiv preprint
arXiv:1302.4389 (2013).
<p class="pagebreak"></p>
<p style="background-color:#ccccff;text-align:center">Monday 11<sup>th</sup> 11:50 Case Room 4 (260-009)</p>
## Clustering Of Research Subject Based On Stochastic Block Model {-}
<p style="text-align:center">
Hiroka Hamada^1^, Keisuke Honda^1^, Frederick Kin Hing Phoa^2^, and Junji Nakano^1^<br />
^1^Academia Sinica<br />
^2^Institute of Statistical Mathematics<br />
</p>
<span>**Clustering of research subject based on stochastic block
model**</span>
Hiroka Hamada$^1$, Keisuke Honda$^1$, Frederick Kin Hing Phoa$^2$ and
Junji Nakano $^1$
$^1 \;$ Institute of Statistical Mathematics, Tachikawa, Tokyo 190-8562,
Japan
$^2 \;$ Institute of Statistical Science, Academia Sinica, Nankang,
Taipei 11529, Taiwan