-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmain.tex
1492 lines (1300 loc) · 81.4 KB
/
main.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
%% bare_conf.tex
%% V1.3
%% 2007/01/11
%% by Michael Shell
%% See:
%% http://www.michaelshell.org/
%% for current contact information.
%%
%% This is a skeleton file demonstrating the use of IEEEtran.cls
%% (requires IEEEtran.cls version 1.7 or later) with an IEEE conference paper.
%%
%% Support sites:
%% http://www.michaelshell.org/tex/ieeetran/
%% http://www.ctan.org/tex-archive/macros/latex/contrib/IEEEtran/
%% and
%% http://www.ieee.org/
%%*************************************************************************
%% Legal Notice:
%% This code is offered as-is without any warranty either expressed or
%% implied; without even the implied warranty of MERCHANTABILITY or
%% FITNESS FOR A PARTICULAR PURPOSE!
%% User assumes all risk.
%% In no event shall IEEE or any contributor to this code be liable for
%% any damages or losses, including, but not limited to, incidental,
%% consequential, or any other damages, resulting from the use or misuse
%% of any information contained here.
%%
%% All comments are the opinions of their respective authors and are not
%% necessarily endorsed by the IEEE.
%%
%% This work is distributed under the LaTeX Project Public License (LPPL)
%% ( http://www.latex-project.org/ ) version 1.3, and may be freely used,
%% distributed and modified. A copy of the LPPL, version 1.3, is included
%% in the base LaTeX documentation of all distributions of LaTeX released
%% 2003/12/01 or later.
%% Retain all contribution notices and credits.
%% ** Modified files should be clearly indicated as such, including **
%% ** renaming them and changing author support contact information. **
%%
%% File list of work: IEEEtran.cls, IEEEtran_HOWTO.pdf, bare_adv.tex,
%% bare_conf.tex, bare_jrnl.tex, bare_jrnl_compsoc.tex
%%*************************************************************************
% *** Authors should verify (and, if needed, correct) their LaTeX system ***
% *** with the testflow diagnostic prior to trusting their LaTeX platform ***
% *** with production work. IEEE's font choices can trigger bugs that do ***
% *** not appear when using other class files. ***
% The testflow support page is at:
% http://www.michaelshell.org/tex/testflow/
% Note that the a4paper option is mainly intended so that authors in
% countries using A4 can easily print to A4 and see how their papers will
% look in print - the typesetting of the document will not typically be
% affected with changes in paper size (but the bottom and side margins will).
% Use the testflow package mentioned above to verify correct handling of
% both paper sizes by the user's LaTeX system.
%
% Also note that the "draftcls" or "draftclsnofoot", not "draft", option
% should be used if it is desired that the figures are to be displayed in
% draft mode.
%
\documentclass[conference]{IEEEtran}
% Add the compsoc option for Computer Society conferences.
%
% If IEEEtran.cls has not been installed into the LaTeX system files,
% manually specify the path to it like:
% \documentclass[conference]{../sty/IEEEtran}
\usepackage[latin1]{inputenc}
\usepackage[T1]{fontenc}
% Some very useful LaTeX packages include:
% (uncomment the ones you want to load)
% *** MISC UTILITY PACKAGES ***
%
%\usepackage{ifpdf}
% Heiko Oberdiek's ifpdf.sty is very useful if you need conditional
% compilation based on whether the output is pdf or dvi.
% usage:
% \ifpdf
% % pdf code
% \else
% % dvi code
% \fi
% The latest version of ifpdf.sty can be obtained from:
% http://www.ctan.org/tex-archive/macros/latex/contrib/oberdiek/
% Also, note that IEEEtran.cls V1.7 and later provides a builtin
% \ifCLASSINFOpdf conditional that works the same way.
% When switching from latex to pdflatex and vice-versa, the compiler may
% have to be run twice to clear warning/error messages.
\usepackage[dvips]{graphicx}
\graphicspath{{figs-copie}}
\DeclareGraphicsExtensions{.eps}
\usepackage{amssymb,amsmath,array}
\usepackage{algorithm,algorithmic}
% *** CITATION PACKAGES ***
%
%\usepackage{cite}
% cite.sty was written by Donald Arseneau
% V1.6 and later of IEEEtran pre-defines the format of the cite.sty package
% \cite{} output to follow that of IEEE. Loading the cite package will
% result in citation numbers being automatically sorted and properly
% "compressed/ranged". e.g., [1], [9], [2], [7], [5], [6] without using
% cite.sty will become [1], [2], [5]--[7], [9] using cite.sty. cite.sty's
% \cite will automatically add leading space, if needed. Use cite.sty's
% noadjust option (cite.sty V3.8 and later) if you want to turn this off.
% cite.sty is already installed on most LaTeX systems. Be sure and use
% version 4.0 (2003-05-27) and later if using hyperref.sty. cite.sty does
% not currently provide for hyperlinked citations.
% The latest version can be obtained at:
% http://www.ctan.org/tex-archive/macros/latex/contrib/cite/
% The documentation is contained in the cite.sty file itself.
% *** GRAPHICS RELATED PACKAGES ***
%
\ifCLASSINFOpdf
% \usepackage[pdftex]{graphicx}
% declare the path(s) where your graphic files are
% \graphicspath{{../pdf/}{../jpeg/}}
% and their extensions so you won't have to specify these with
% every instance of \includegraphics
% \DeclareGraphicsExtensions{.pdf,.jpeg,.png}
\else
% or other class option (dvipsone, dvipdf, if not using dvips). graphicx
% will default to the driver specified in the system graphics.cfg if no
% driver is specified.
% \usepackage[dvips]{graphicx}
% declare the path(s) where your graphic files are
% \graphicspath{{../eps/}}
% and their extensions so you won't have to specify these with
% every instance of \includegraphics
% \DeclareGraphicsExtensions{.eps}
\fi
% graphicx was written by David Carlisle and Sebastian Rahtz. It is
% required if you want graphics, photos, etc. graphicx.sty is already
% installed on most LaTeX systems. The latest version and documentation can
% be obtained at:
% http://www.ctan.org/tex-archive/macros/latex/required/graphics/
% Another good source of documentation is "Using Imported Graphics in
% LaTeX2e" by Keith Reckdahl which can be found as epslatex.ps or
% epslatex.pdf at: http://www.ctan.org/tex-archive/info/
%
% latex, and pdflatex in dvi mode, support graphics in encapsulated
% postscript (.eps) format. pdflatex in pdf mode supports graphics
% in .pdf, .jpeg, .png and .mps (metapost) formats. Users should ensure
% that all non-photo figures use a vector format (.eps, .pdf, .mps) and
% not a bitmapped formats (.jpeg, .png). IEEE frowns on bitmapped formats
% which can result in "jaggedy"/blurry rendering of lines and letters as
% well as large increases in file sizes.
%
% You can find documentation about the pdfTeX application at:
% http://www.tug.org/applications/pdftex
% *** MATH PACKAGES ***
%
%\usepackage[cmex10]{amsmath}
% A popular package from the American Mathematical Society that provides
% many useful and powerful commands for dealing with mathematics. If using
% it, be sure to load this package with the cmex10 option to ensure that
% only type 1 fonts will utilized at all point sizes. Without this option,
% it is possible that some math symbols, particularly those within
% footnotes, will be rendered in bitmap form which will result in a
% document that can not be IEEE Xplore compliant!
%
% Also, note that the amsmath package sets \interdisplaylinepenalty to 10000
% thus preventing page breaks from occurring within multiline equations. Use:
%\interdisplaylinepenalty=2500
% after loading amsmath to restore such page breaks as IEEEtran.cls normally
% does. amsmath.sty is already installed on most LaTeX systems. The latest
% version and documentation can be obtained at:
% http://www.ctan.org/tex-archive/macros/latex/required/amslatex/math/
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% SMALL EQUATIONS
\usepackage{exscale}
\makeatletter
\newenvironment{equationsize}[1]{%
\skip@=\baselineskip
#1%
\baselineskip=\skip@
\equation
}{\endequation \ignorespacesafterend}
\makeatother
\makeatletter
\newenvironment{equationsize*}[1]{%
\skip@=\baselineskip
#1%
\baselineskip=\skip@
\equation
}{\nonumber\endequation \ignorespacesafterend}
\makeatother
\makeatletter
\newenvironment{alignsize}[1]{%
\skip@=\baselineskip
#1%
\baselineskip=\skip@
\align
}{\endalign \ignorespacesafterend}
\makeatother
\makeatletter
\newenvironment{alignsize*}[1]{%
\skip@=\baselineskip
#1%
\baselineskip=\skip@
\start@align\@ne\st@rredtrue\m@ne
}{\endalign\ignorespacesafterend}
\makeatother
%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% *** SPECIALIZED LIST PACKAGES ***
%
%\usepackage{algorithmic}
% algorithmic.sty was written by Peter Williams and Rogerio Brito.
% This package provides an algorithmic environment fo describing algorithms.
% You can use the algorithmic environment in-text or within a figure
% environment to provide for a floating algorithm. Do NOT use the algorithm
% floating environment provided by algorithm.sty (by the same authors) or
% algorithm2e.sty (by Christophe Fiorio) as IEEE does not use dedicated
% algorithm float types and packages that provide these will not provide
% correct IEEE style captions. The latest version and documentation of
% algorithmic.sty can be obtained at:
% http://www.ctan.org/tex-archive/macros/latex/contrib/algorithms/
% There is also a support site at:
% http://algorithms.berlios.de/index.html
% Also of interest may be the (relatively newer and more customizable)
% algorithmicx.sty package by Szasz Janos:
% http://www.ctan.org/tex-archive/macros/latex/contrib/algorithmicx/
% *** ALIGNMENT PACKAGES ***
%
%\usepackage{array}
% Frank Mittelbach's and David Carlisle's array.sty patches and improves
% the standard LaTeX2e array and tabular environments to provide better
% appearance and additional user controls. As the default LaTeX2e table
% generation code is lacking to the point of almost being broken with
% respect to the quality of the end results, all users are strongly
% advised to use an enhanced (at the very least that provided by array.sty)
% set of table tools. array.sty is already installed on most systems. The
% latest version and documentation can be obtained at:
% http://www.ctan.org/tex-archive/macros/latex/required/tools/
%\usepackage{mdwmath}
%\usepackage{mdwtab}
% Also highly recommended is Mark Wooding's extremely powerful MDW tools,
% especially mdwmath.sty and mdwtab.sty which are used to format equations
% and tables, respectively. The MDWtools set is already installed on most
% LaTeX systems. The lastest version and documentation is available at:
% http://www.ctan.org/tex-archive/macros/latex/contrib/mdwtools/
% IEEEtran contains the IEEEeqnarray family of commands that can be used to
% generate multiline equations as well as matrices, tables, etc., of high
% quality.
%\usepackage{eqparbox}
% Also of notable interest is Scott Pakin's eqparbox package for creating
% (automatically sized) equal width boxes - aka "natural width parboxes".
% Available at:
% http://www.ctan.org/tex-archive/macros/latex/contrib/eqparbox/
% *** SUBFIGURE PACKAGES ***
%\usepackage[tight,footnotesize]{subfigure}
% subfigure.sty was written by Steven Douglas Cochran. This package makes it
% easy to put subfigures in your figures. e.g., "Figure 1a and 1b". For IEEE
% work, it is a good idea to load it with the tight package option to reduce
% the amount of white space around the subfigures. subfigure.sty is already
% installed on most LaTeX systems. The latest version and documentation can
% be obtained at:
% http://www.ctan.org/tex-archive/obsolete/macros/latex/contrib/subfigure/
% subfigure.sty has been superceeded by subfig.sty.
%\usepackage[caption=false]{caption}
%\usepackage[font=footnotesize]{subfig}
% subfig.sty, also written by Steven Douglas Cochran, is the modern
% replacement for subfigure.sty. However, subfig.sty requires and
% automatically loads Axel Sommerfeldt's caption.sty which will override
% IEEEtran.cls handling of captions and this will result in nonIEEE style
% figure/table captions. To prevent this problem, be sure and preload
% caption.sty with its "caption=false" package option. This is will preserve
% IEEEtran.cls handing of captions. Version 1.3 (2005/06/28) and later
% (recommended due to many improvements over 1.2) of subfig.sty supports
% the caption=false option directly:
%\usepackage[caption=false,font=footnotesize]{subfig}
%
% The latest version and documentation can be obtained at:
% http://www.ctan.org/tex-archive/macros/latex/contrib/subfig/
% The latest version and documentation of caption.sty can be obtained at:
% http://www.ctan.org/tex-archive/macros/latex/contrib/caption/
% *** FLOAT PACKAGES ***
%
%\usepackage{fixltx2e}
% fixltx2e, the successor to the earlier fix2col.sty, was written by
% Frank Mittelbach and David Carlisle. This package corrects a few problems
% in the LaTeX2e kernel, the most notable of which is that in current
% LaTeX2e releases, the ordering of single and double column floats is not
% guaranteed to be preserved. Thus, an unpatched LaTeX2e can allow a
% single column figure to be placed prior to an earlier double column
% figure. The latest version and documentation can be found at:
% http://www.ctan.org/tex-archive/macros/latex/base/
%\usepackage{stfloats}
% stfloats.sty was written by Sigitas Tolusis. This package gives LaTeX2e
% the ability to do double column floats at the bottom of the page as well
% as the top. (e.g., "\begin{figure*}[!b]" is not normally possible in
% LaTeX2e). It also provides a command:
%\fnbelowfloat
% to enable the placement of footnotes below bottom floats (the standard
% LaTeX2e kernel puts them above bottom floats). This is an invasive package
% which rewrites many portions of the LaTeX2e float routines. It may not work
% with other packages that modify the LaTeX2e float routines. The latest
% version and documentation can be obtained at:
% http://www.ctan.org/tex-archive/macros/latex/contrib/sttools/
% Documentation is contained in the stfloats.sty comments as well as in the
% presfull.pdf file. Do not use the stfloats baselinefloat ability as IEEE
% does not allow \baselineskip to stretch. Authors submitting work to the
% IEEE should note that IEEE rarely uses double column equations and
% that authors should try to avoid such use. Do not be tempted to use the
% cuted.sty or midfloat.sty packages (also by Sigitas Tolusis) as IEEE does
% not format its papers in such ways.
% *** PDF, URL AND HYPERLINK PACKAGES ***
%
%\usepackage{url}
% url.sty was written by Donald Arseneau. It provides better support for
% handling and breaking URLs. url.sty is already installed on most LaTeX
% systems. The latest version can be obtained at:
% http://www.ctan.org/tex-archive/macros/latex/contrib/misc/
% Read the url.sty source comments for usage information. Basically,
% \url{my_url_here}.
% *** Do not adjust lengths that control margins, column widths, etc. ***
% *** Do not use packages that alter fonts (such as pslatex). ***
% There should be no need to do such things with IEEEtran.cls V1.6 and later.
% (Unless specifically asked to do so by the journal or conference you plan
% to submit to, of course. )
% correct bad hyphenation here
\hyphenation{op-tical net-works semi-conduc-tor}
\begin{document}
%
% paper title
% can use linebreaks \\ within to get better formatting as desired
\title{Reward-based online learning in non-stationary environments:
Adapting a P300-speller with a ``Backspace'' key}
% author names and affiliations
% use a multiple column layout for up to three different
% affiliations
\author{\IEEEauthorblockN{Emmanuel~Dauc\'e}
\IEEEauthorblockA{Ecole Centrale Marseille\\
INSERM UMR\_S 1106\\
Marseille, France\\
Email: [email protected]}
\and
\IEEEauthorblockN{Timoth\'ee~Proix}
\IEEEauthorblockA{Aix-Marseille Universit\'e\\
INSERM UMR\_S 1106\\
Marseille, France}
\and
\IEEEauthorblockN{Liva~Ralaivola}
\IEEEauthorblockA{Aix-Marseille Universit\'e\\
CNRS UMR 7279 \\
Marseille, France}}
% conference papers do not typically use \thanks and this command
% is locked out in conference mode. If really needed, such as for
% the acknowledgment of grants, issue a \IEEEoverridecommandlockouts
% after \documentclass
% for over three affiliations, or if they all won't fit within the width
% of the page, use this alternative format:
%
%\author{\IEEEauthorblockN{Michael Shell\IEEEauthorrefmark{1},
%Homer Simpson\IEEEauthorrefmark{2},
%James Kirk\IEEEauthorrefmark{3},
%Montgomery Scott\IEEEauthorrefmark{3} and
%Eldon Tyrell\IEEEauthorrefmark{4}}
%\IEEEauthorblockA{\IEEEauthorrefmark{1}School of Electrical and Computer Engineering\\
%Georgia Institute of Technology,
%Atlanta, Georgia 30332--0250\\ Email: see http://www.michaelshell.org/contact.html}
%\IEEEauthorblockA{\IEEEauthorrefmark{2}Twentieth Century Fox, Springfield, USA\\
%Email: [email protected]}
%\IEEEauthorblockA{\IEEEauthorrefmark{3}Starfleet Academy, San Francisco, California 96678-2391\\
%Telephone: (800) 555--1212, Fax: (888) 555--1212}
%\IEEEauthorblockA{\IEEEauthorrefmark{4}Tyrell Inc., 123 Replicant Street, Los Angeles, California 90210--4321}}
% use for special paper notices
%\IEEEspecialpapernotice{(Invited Paper)}
% make the title area
\maketitle
\begin{abstract}
%% Text of abstract
We adapt a policy gradient approach to the problem of reward-based online learning
of a non-invasive EEG-based ``P300''-speller.
We first clarify the nature of the P300-speller classification problem
and present a general regularized gradient ascent formula.
We then show that when the reward is immediate and binary (namely ``bad response'' or ``good response''),
each update is expected to improve the classifier accuracy, whether the actual response is correct or not.
We also estimate the robustness of the method to occasional mistaken rewards, i.e.
show that the learning efficacy may only
linearly decrease with the rate of invalid rewards.
The effectiveness of our approach is tested in a series of simulations reproducing
the conditions of real experiments.
We show in a first experiment that a systematic improvement of the spelling rate
is obtained for all subjects in the absence of initial calibration.
In a second experiment, we consider the case of the online recovery that is expected to follow
failed electrodes.
Combined with a specific failure detection algorithm, the spelling error information
(typically contained in a ``backspace'' hit)
is shown useful for the policy gradient to
adapt the P300 classifier to the new situation, provided
the feedback is reliable enough (namely having a reliability greater than 70\%).
\end{abstract}
% IEEEtran.cls defaults to using nonbold math in the Abstract.
% This preserves the distinction between vectors and scalars. However,
% if the conference you are submitting to favors bold math in the abstract,
% then you can use LaTeX's standard command \boldmath at the very start
% of the abstract to achieve this. Many IEEE journals/conferences frown on
% math in the abstract anyway.
\begin{IEEEkeywords}
Online learning, Reinforcement learning, Policy gradient,
Brain-Computer Interfaces, P300 speller
\end{IEEEkeywords}
% For peer review papers, you can put extra information on the cover
% page as needed:
% \ifCLASSOPTIONpeerreview
% \begin{center} \bfseries EDICS Category: 3-BBND \end{center}
% \fi
%
% For peerreview papers, this IEEEtran command inserts a page break and
% creates the second title. It will be ignored for other modes.
%\IEEEpeerreviewmaketitle
\section{Introduction}
We consider the case of embedded classifiers having to interact in real time
with their environment. Embedded classifiers (in vehicles, planes, robots...) have to deal with large
amounts of data vectors sampled from the environment, and take their decisions from these samples.
In order to build such classifiers,
a training session is generally done prior to actual use.
This initial training issues a set of parameters that are then considered fixed
for the remaining time.
Brain Computer Interfaces are an example of such embedded classifiers.
The objective of brain-computer interfaces (BCI's) is to analyze in real-time electro-encephalographic (EEG) signals recorded at the surface of the scalp in order to control a device
(mouse, keyboard, wheelchair,...). This problem has straightforward applications, at first, toward helping disabled people
for their communication \cite{Farwell88} and displacements \cite{Vanacker2007}, but also more generally for game remote control,
assisted driving, neurofeedback, etc.
From a general standpoint, the problem consists in collecting samples of EEG activity
and trying to \emph{classify} them in different categories reflecting the ``state of mind'' of the subject
at the moment of the observation. The better the classification, the more effective the
interface.
The generic name ``BCI'' encompasses different
EEG-based communication protocols like controlling a pointer on a screen
using motor imagery (the ``Graz'' task \cite{Pfurtscheller1997}), text typing
using event-related potentials (``P300'' speller) \cite{Farwell88}, ``brain switch'' \cite{Mason00}, etc.,
%Since then, significant improvements have been introduced, like enhanced and more ``user-friendly''
%interfaces \cite{Townsend10,Congedo11},
with corresponding dedicated pre-processing \cite{Blankertz08,Rivet09,Ang12} and
classification and/or regression techniques \cite{Pfurtscheller1997,Krusienski08,Hoffmann08}.
We consider in this paper the case of
grid-based P300 spellers, where the subject faces a screen with a $6 \times 6$ grid
of letters and numbers
and is asked to focus his attention on the symbol he wants to spell out.
After a classifier has been trained, the subject is expected to handle the interface and be capable
of spelling words autonomously.
However, many changes are expected to take place during subsequent use and
the quality of the signal (and
thus the accuracy of the interface) is known to progressively deteriorate over time \cite{Shenoy2006}:
Some electrodes may accidentally be displaced or unstick from the scalp, the conductivity of the scalp itself
may change during the experiment,
and the level of attention of the subject may evolve, etc. Moreover, experimental
conditions are difficult to reproduce
and day-to-day recalibration is often necessary.
This problem of adapting the parameters while the device is used
is known as the ``online learning'' problem, relevant when the statistics of the input change over time, i.e. when
the input is \emph{non-stationary} \cite{Kivinen08}, or
%Online learning is moreover known to have good performance
when the input data is abundant and high-dimensional \cite{Bottou03}.
%and this at a significantly lower computational cost than traditional batch methods.
%General online adaptation methods rely on a feedback signal indicating how good the classifier is at the task (the ``error signal'').
%The nature of this feedback defines the class of learning algorithms to be implemented.
%In the supervised case, for instance, the expected label is given to the classifier after each response and can be compared to the
%label obtained from the classifier.
%This approach does not apply here because the expected output (the letter to be spelled for instance)
%is not available in free use.
The problem of BCI non-stationarity is generally addressed using an unsupervised expectation-maximi\-zation (EM)
procedure \cite{Li06,Lu09,Kindermans12}, which is not proven to always converge to a consistent classifier.
We develop in this paper a first attempt to adapt the
reinforcement learning methodology \cite{Sutton98}
to the context of an adaptive BCI P300 speller,
where the ``reward'' may take the form of a scalar
representing the agreement of the subject
regarding the response of the device.
%For instance, the reward obtained after
%the response of the classifier may be either 1 (correct classification) or
%-1 (incorrect classification).
A reward is of course less informative than the true response, but,
in counterpart, generally cheaper to obtain.
In a P300-speller setup, two solutions should be considered for that purpose: (i)
use the ``error negativity'' event-related potential (ERP) following a wrong classifier's response, where a detection
rate from 60 to 90\% is reported in the literature \cite{Buttfield06,dalSeno10,Schmidt12}.
In that case, a specific classifier is needed for detecting the error negativities in the EEG,
and as such a specific training session should
take place prior to the spelling session; (ii) dedicate a
specific symbol to the case when the previous character
was incorrectly spelled to
detect the subject disagreement regarding previous response \cite{Dauce13}.
The ``Backspace'' key of the standard computer keyboards is interesting to consider
from this perspective.
It may indeed provide an information (or a guess) about the correctness of the previous spelling,
and thus allow to decipher between valid letters and invalid letters in the current series of spelled characters.
A minimal spelling accuracy
is then needed for this approach to be effective, which means that a training
session may also take place prior to the spelling session.
In this paper, we consider the second case and
look in section \ref{sec:principles} at the principles underlying reward-based learning in the P300-speller case,
adapting the classical policy gradient approach \cite{Wil92} to that context.
In section \ref{sec:analysis},
we look at the gradient estimator, in the particular case where the rewards are binary,
and we prove (i) the estimator to systematically head toward response improvement when correct responses are positively
marked and incorrect responses negatively marked, and (ii) to be
robust to occasional misleading rewards, which is typically the case when the rewards are driven by a ``backspace'' key.
Then, in order to validate our approach,
we simulate in section \ref{sec:P300} two different adaptive P300-speller setups
from a dataset coming from real P300-speller experiments.
In the first numerical experiment, the learning is made ``from scratch''
(i.e. without initial training or calibration), while in a second numerical experiment,
an initial calibration is done and the policy
gradient is combined with a specific failure detection algorithm
in order to adapt the classifier to unexpected signal breakout.
\section{Toward reward-based P300-spellers}\label{sec:principles}
\subsection{Related work}
Online learning with partial feedback is often referred a the ``contextual bandit'' problem, or
``bandit with side information'', that aims at finding online strategies in order to minimize the ``regret''
when one has to find a best choice out of $K$ possibilities by trial and error. Popular solutions use
parametrized random policies where every possible choice is described by its expected gain and the number of visits
\cite{Auer02}. %In the case of the contextual bandit, a first choice is made within a finite set of possible policies.
%This approach is however known to be challenged in a continuous context.
Although several variants have been adapted to the continuous context case \cite{Kakade08,Hazan11,Crammer11}, they
do not fit to the problem we consider here.
Moreover, despite having mathematically exact upper regret bound, they should not show in practice fast enough convergence rates
%or (***FAUX***) regularization capabilities
for applying to real-time online learning in a dynamic context.
%where, under a
%linear separability assumption, a gradient descent is shown to reach the perceptron hyperplane in
%the long run.
%On one hand, the perceptron-inspired method presented in \cite{Kakade08} has a way too high regret bound
%On the other hand, more elaborate contextual bandit variants proposed in \cite{Hazan11,Crammer11}
%while having more competing regret bounds, do not address the problem of non-stationary context under consideration here.
\subsection{The ``P300-speller'' classification problem}
The so-called ``oddball paradigm'' is well-established protocol which has been developed in electrophysiology
in order to identify the subject's reaction to an unexpected stimulus taking place
in a sequence of monotonic stimuli \cite{Farwell88}.
A specific Event Related Potential (ERP) can be measured in the EEG around 300 ms after the stimulus onset.
This response (called the ``P300'') is generally considered as the signature of the surprise of the subject regarding
the expected stimulus.
%As it can be seen from a visual inspection of the EEG signal (see figure \ref{fig:EEG}), identifying the P300 response from
%other non-specific responses is not trivial.
%\begin{figure}
%\centerline{
% \includegraphics[width=0.35\linewidth]{figs/P300_grid}
% \includegraphics[width=0.65\linewidth]{figs/fig_EEG}
%}
%\caption{(left) The P300 speller grid, using the OpenVibe platform (here the first
%%column is flashed with some magnification) \cite{Renard10}. (right) 32-channels EEG signal excerpt.
%Row or column tags are shown on top and stars denotes target row or column (here target row = 6, target column = 8).
%he dashed rectangle indicates an example 600 ms EEG sample taken after flashing the grid at position 8 (2nd column).}
%\label{fig:EEG}
%\end{figure}
%In the absence of labels, automatic signal classification must be used
%in order to decipher the P300 (``target'') responses from the non-P300 (``non-target'') responses.
The problem is then to identify this particular
ERP (the ``oddball'' ERP) in a set of $K$ observations.
If we note $\underline{\mathbf{x}} = (\boldsymbol{x}_1,...,\boldsymbol{x}_K) \in \mathcal{X}^K$ the set of $K$ observations,
where $\mathcal{X}$ is the feature vectors space,
the problem is to identify the ``target''
within a set having multiple inputs belonging to the ``non-target'' category
and only one input belonging to the ``target'' category.
This problem can be seen as a one-\emph{among}-all classification problem\footnote{not to be mixed up with the ``one-vs-all'' classification setup.}.
Noting $P^+$ the target vectorial distribution and
$P^-$ the non-target vectorial distribution, we propose the following generative model:
Each observation $\underline{\mathbf{x}}$ can be seen as the result of a random draw from a uniform
mixture of $K$ distributions $P_1, ..., P_K$,
with $P_1 = (P^+,P^-,...,P^-)$, $P_2=(P^-,P^+,P^-,...,P^-)$, ..., $P_K=(P^-,...,P^-,P^+)$,
each $P_k$ standing for a sequence of $K$ independently drawn feature vectors having the $k^\text{th}$ vector for target.
The uniform prior reflects the uniform probability among the target location in the sequence.
%\begin{figure}
% \centerline{
% \includegraphics[width=0.5\linewidth]{figs/oddball}
% }
% \caption{The ``\emph{P300-speller}'' classification problem. The input is a set of $K$ observations. From this set, it
% is known that $K-1$ observations are drawn according to $P^-$, while only one is drawn according to $P^+$. The position of
% this ``oddball'' is unknown. The classification problem consists in identifying the position of the oddball (from its own features and/or by
% comparison with the others).}
% \label{fig:oddball}
%\end{figure}
With such uniform priors, the posterior probability of identifying location $k$ as the target,
given observation $(\boldsymbol{x}_1,...,\boldsymbol{x}_K)$ is
easily shown from Bayes formula to be:
\begin{equation}
P(k|\boldsymbol{x}_1,...,\boldsymbol{x}_K) = \frac{\frac{P^+(\boldsymbol{x}_k)}{P^-(\boldsymbol{x}_k)}}
{\sum_l \frac{P^+(\boldsymbol{x}_l)}{P^-(\boldsymbol{x}_l)}}
\end{equation}
In the linear-Gaussian case, where $P^+$ and $P^-$ are multivariate Gaussian distributions of respective mean
$\boldsymbol{\mu}^+$ and $\boldsymbol{\mu}^-$, with shared covariance $\boldsymbol{\Sigma}$,
the previous formula simplifies to:
\begin{equation}\label{eq:model-free}
P(k|\boldsymbol{x}_1,...,\boldsymbol{x}_K) = \frac{\exp(\boldsymbol{x}_k \boldsymbol{w}^T)}
{\sum_l \exp(\boldsymbol{x}_l \boldsymbol{w}^T)}
\end{equation}
where $\boldsymbol{w}$ can be shown to be equal to $(\boldsymbol{\mu}^+ - \boldsymbol{\mu}^-)\Sigma^{-1}$.
The discriminating manifold $\boldsymbol{x}\boldsymbol{w}^T=0$ is similar to the
manifold obtained in the binary classification problem, the only difference being the number of samples considered (one sample in the binary
classification case, multiple samples in the ``P300-speller'' classification case).
Despite the categorical nature of the response, the P300-speller classifier is structurally closer to the binary case
than to the multiclass case.
%\begin{table}
% %\begin{footnotesize}
% \begin{center}
% \begin{tabular}{|c||c|c|}
% \hline
% & {\bf Model-based} & {\bf Model-free (Softmax-linear)} \\
% \hline\hline
% Binary &
% $P(1|\boldsymbol{x}) = \frac{1}{1 + \frac{P^-(\boldsymbol{x})}{P^+(\boldsymbol{x})}}$ &
% $\pi(\boldsymbol{x};\boldsymbol{w}, b) = \frac{1}{1 + exp(- \boldsymbol{x} \boldsymbol{w}^T + b)}$\\
% \hline
% ``P300-speller'' &
% $P(k|\boldsymbol{x}_1,...,\boldsymbol{x}_K) = \frac{\frac{P^+(\boldsymbol{x}_k)}{P^-(\boldsymbol{x}_k)}}
% {\sum_{\ell} \frac{P^+(\boldsymbol{x}_\ell)}{P^-(\boldsymbol{x}_\ell)}}$ &
%
% $\pi_k(\boldsymbol{x}_1,...,\boldsymbol{x}_K;\boldsymbol{w}) = \frac{\exp(\boldsymbol{x}_k \boldsymbol{w}^T)}
% {\sum_\ell \exp(\boldsymbol{x}_\ell \boldsymbol{w}^T)}$\\
% \hline
% Multiclass &
% $P(k|\boldsymbol{x}) = \frac{P(\boldsymbol{x}|k)} {\sum_\ell P(\boldsymbol{x}|\ell)}$ &
% $\pi_k(\boldsymbol{x}; \boldsymbol{w}_1, ..., \boldsymbol{w}_K) = \frac{\exp(\boldsymbol{x} \boldsymbol{w}_k^T)}{\sum_\ell \exp(\boldsymbol{x} \boldsymbol{w}_\ell^T)}$\\
% \hline
% \end{tabular}
% \end{center}
% \caption{Formal comparison of the ``P300-speller'' posterior with binary and multiclass posteriors, in the model-based and in the softmax-linear model-free cases.}
% \label{tab:oddball}
% %\end{footnotesize}
%\end{table}
%
\subsection{Stochastic classifier}
%In model-free approaches, no assumption is made on the underlying distribution
%governing the input data. Only a parametrized decision rule $\pi$ is considered, governing the response of the device.
%In the ``oddball'' classification case, a simple method is to choose the index that maximizes $\boldsymbol{x}_k \boldsymbol{w}^T$,
%where $\boldsymbol{w}$ is the discriminating manifold, obtained for instance by a linear
%discriminant analysis \cite{Pfurtscheller1997,Krusienski08,Hoffmann08}.
We consider the reinforcement learning framework, where different responses must be explored at random before the classifier can estimate which of
them is expected to bring the highest reward.
The response being obtained from a random draw, the rule that determines the probabilities associated to each possible response is
called the policy.
In our case, consistently with eq. (\ref{eq:model-free}), the policy relies on a vector of parameters $\boldsymbol{w}$
that is to be compared with every observation vector from the set $\underline{\mathbf{x}}$
through the scalar products $\langle\boldsymbol{w},\boldsymbol{x}_k\rangle$'s
that are expected to be higher when $\boldsymbol{x}_k$ is a target than when it is not.
The actual response $y \in \{1,...,K\}$ is
drawn from a multinomial distribution relying on the ``$\pi$-scores'' (softmax choice):
\begin{equation}
\label{eq:softmax_multi}
\forall k \in \{1,...,K\}, \pi(\underline{\mathbf{x}},k;\boldsymbol{w}) = \frac {\exp \langle\boldsymbol{w},\boldsymbol{x}_k\rangle} {\sum_l \exp \langle\boldsymbol{w},\boldsymbol{x}_l\rangle}
\end{equation}
so that that $\pi(\underline{\mathbf{x}},k;\boldsymbol{w})$ is the probability of having $y=k$ given $\underline{\mathbf{x}}$.
\subsection{Learning problem}
After the response $y$ is drawn out, a scalar $r(\underline{\mathbf{x}},y)$ (the ``reward'')
is read from the environment. In essence, the reward quantifies the achievement of the current trial.
The reward expectation $E(r)$ represents the global
achievement of the policy, given by
the integral of the rewards obtained for every observation set $\underline{\mathbf{x}}$
and every choice $k$, given their probability of appearance:
%\begin{equation}\label{eq:objective_pg}
$$E(r) = \int_{\mathcal{X}^K}\Big( \sum_{k=1}^{K} r(\underline{\mathbf{x}},k) \pi(\underline{\mathbf{x}},k;\boldsymbol{w}) \Big)p(\underline{\mathbf{x}}) d\underline{\mathbf{x}}$$
%\end{equation}
where the distribution $p(\underline{\mathbf{x}})$ is not explicitely given.
The objective is to find the best parameters $\boldsymbol{w}$ for the policy
to maximise the reward expectation, i.e.:
%\begin{equation}\label{eq:optim_pg}
$\max_{\boldsymbol{w}} E(r)$.
%\end{equation}
We additionally consider a regularization term that
is expected to promote a small norm for the model and priorize
the most recently seen examples in an online setup \cite{Kivinen08}. % (as proposed by ).
The optimization problem becomes:
\begin{equation}\label{eq:optim_regularized_pg}
\max_{\boldsymbol{w}} \mathcal{G} = \max_{\boldsymbol{w}} E(r)-{\lambda \over 2} \Vert \boldsymbol{w} \Vert^2
\end{equation}
where $\mathcal{G}$ is the objective function and $\lambda$ the regularization parameter.
%\subsection{Online update}\label{sec:gradient}
In the absence of a model, the solution of eq.~(\ref{eq:optim_regularized_pg}) can be approached
by trying to cancel the gradient of $\mathcal{G}$
through a stochastic gradient \emph{ascent}.
The policy gradient is a general purpose reward-based algorithm we adapt here to the online
``P300-speller'' classification case.
Following \cite{Wil92}, the regularized policy gradient can be shown to obey to:
%\begin{equation}\label{eq:simple_pg}
%\nabla_f E(r) = E(r \nabla_f \ln(\pi))
%\end{equation}
%and the regularized policy gradient can be shown to be:
\begin{equation}\label{eq:regularized_pg}
\nabla_{\boldsymbol{w}} \mathcal{G} = E(r \nabla_{\boldsymbol{w}} \ln(\pi)) - \lambda \boldsymbol{w}
\end{equation}
Starting from scratch, the update procedure is expected to refine the model trial after trial
using the local estimator $\boldsymbol{g}(\underline{\mathbf{x}}, y) - \lambda \boldsymbol{w}$,
with
$\boldsymbol{g}(\underline{\mathbf{x}}, y) = r(\underline{\mathbf{x}},y)\nabla_{\boldsymbol{w}}\ln \pi(\underline{\mathbf{x}},y;\boldsymbol{w})$,
so that the rewards should be maximized in the long run.
From the derivation of $\ln \pi$ according to $\boldsymbol{w}$, we obtain the following expression:
\begin{equation}\label{eq:pol_grad_multi}
\boldsymbol{g}(\underline{\mathbf{x}}, y)
= r(\underline{\mathbf{x}},y) \left(\boldsymbol{x}_y - \sum_k \pi(\underline{\mathbf{x}},k;\boldsymbol{w})\boldsymbol{x}_k\right)
\end{equation}
The regularized online policy gradient ascent update
can be defined the following way:
at every time $t$, after reading $\boldsymbol{x}_t$, $y_t$ and $r_t$, increment $\boldsymbol{w}$
with $\eta\left(\boldsymbol{g}(\underline{\mathbf{x}}, y) - \lambda \boldsymbol{w}\right)$, where $\eta$ is the \emph{learning rate}, i.e.:
\begin{equationsize}{\small}\label{eq:online_update}
%\begin{equation}
\boldsymbol{w}_t = (1-\eta\lambda) \boldsymbol{w}_{t-1} + \eta r_t \left( \boldsymbol{x}_{y_t,t}
- \sum_{k=1}^K \pi(\underline{\mathbf{x}},k;\boldsymbol{w}_{t-1})\boldsymbol{x}_{k,t} \right)
%\end{equation}
\end{equationsize}
%The sketch of our online learning setup is given in algorithm \ref{algo:sketch}.
%\begin{algorithm}
%\caption{Reward-based online learning}
%\label{algo:sketch}
%\begin{algorithmic}[1]
%\STATE $ \boldsymbol{w} \leftarrow$ initialize\_mapping()
%\LOOP
% \STATE $\underline{\mathbf{x}} \leftarrow$ measure\_state()
% \STATE $y \leftarrow$ choice $(\boldsymbol{w},\underline{\mathbf{x}})$ \hspace{3cm} \emph{see eq. (\ref{eq:softmax_multi})}
% \STATE $r \leftarrow$ measure\_reward($\underline{\mathbf{x}},y$)
%% \STATE $\boldsymbol{w} \leftarrow$ update\_mapping($\boldsymbol{w},\underline{\mathbf{x}},y,r$) \hspace{1cm} \emph{see eq. (\ref{eq:online_update})}
%\ENDLOOP
%\end{algorithmic}
%\end{algorithm}
%Every example being presented once and for all, the norm limitation constraint imposes to favour the newest examples
%against the oldest ones in the update of the parameters \cite{Kivinen08}, so that every shift from the current distribution
%should be tracked progressively, according to the value of $\eta$ (that can be interpreted as a ``renewal'' parameter).
%In particular, the product $\eta\lambda$ gives an indication on the number of examples that significantly
%take part in the classifier build-up (as old examples are progressively erased by more recently
%seen examples). This ``memory span'', that drives the complexity of the classifier, can be
%approximated to $\frac{1}{\eta \lambda}$, so that
%%between target and non-target, no matter how high or low the scores are,
%%allowing higher values for $\lambda$ (implying a smaller norm to $\boldsymbol{f}$).
%a smaller $\lambda$ implies a higher ``memory span'' (irrespectively of $\eta$).
\section{Analysis}\label{sec:analysis}
\subsection{Gradient estimator in the binary reward case}\label{sec:analysis-1}
The policy gradient estimator (\ref{eq:pol_grad_multi}) can be analyzed in the binary reward case,
i.e when $r(\underline{\mathbf{x}},y)$ takes only two values: $r^+$ when the response is correct and $r^-$ when the response is
incorrect. Using a symmetry argument, we only consider the case $\underline{\mathbf{x}} \in \mathcal{C}_1$, i.e.
$\boldsymbol{x}_1 \sim P^+$ and $\forall k>1, \boldsymbol{x}_k \sim P^-$, for the analysis.
Considering that $r(\underline{\mathbf{x}},1) = r^+$ and $r(\underline{\mathbf{x}},k)=r^-$ for k>1,
we have:
$$\boldsymbol{g}(\underline{\mathbf{x}}, y) = (\mathbf{1}_{\{y=1\}} r^+ + \mathbf{1}_{\{y\neq 1\}}r^-) \left(\boldsymbol{x}_y - \sum_k \pi(\underline{\mathbf{x}},k)\boldsymbol{x}_k\right) $$
Then, noting that:
$$E_{\underline{X},Y}(\boldsymbol{g}) = E_{\underline{X}}\left[E_{Y|\underline{X}}(\boldsymbol{g})\right]$$
we look at the conditional expectation of the gradient
$E_{Y|\underline{X}}(\boldsymbol{g})$,
i.e. we try to estimate the general direction followed when
the set of observations $\underline{\mathbf{x}}$ is given.
Let us introduce few additional notations in order to simplify the writing:
\begin{itemize}
\item Target:
$\boldsymbol{x}^+(\underline{\mathbf{x}}) = \boldsymbol{x}_1 $
\item Non-targets weighted average :
$\boldsymbol{x}^-(\underline{\mathbf{x}}) = \sum_{k>1} \tilde{\pi}(\underline{\mathbf{x}},k) \boldsymbol{x}_k
\mbox{, with }
\tilde{\pi}(\underline{\mathbf{x}},k)=\frac{\pi(\underline{\mathbf{x}},k)}{1 - \pi(\underline{\mathbf{x}},1)}$
\item Difference (local discriminant vector):
$\boldsymbol{\Delta}(\underline{\mathbf{x}}) = \boldsymbol{x}^+(\underline{\mathbf{x}}) - \boldsymbol{x}^-(\underline{\mathbf{x}})$.
\end{itemize}
With some reordering, it can be shown that: %we have shown in section \ref{sec:analysis} that :
\begin{equation}\label{eq:exp_up}
E_{Y|\underline{X}}(\boldsymbol{g}(\underline{\mathbf{x}},y)) = (r^+-r^-) \pi(\underline{\mathbf{x}},1) (1 - \pi(\underline{\mathbf{x}},1)) \boldsymbol{\Delta}(\underline{\mathbf{x}})
\end{equation}
%so that the joint expectation is:
%\begin{equation}
% E_{X,Y}(\boldsymbol{g}(\underline{\mathbf{x}},y)) = (r^+-r^-){E}_X\left[\pi(\underline{\mathbf{x}},1) (1 - \pi(\underline{\mathbf{x}},1)) \boldsymbol{\Delta}(\underline{\mathbf{x}})\right]
%\end{equation}
%which can be compared with (\ref{eq:exp_multi}).
i.e. the local \emph{direction} of the gradient
is the local discriminant vector $\boldsymbol{\Delta}(\underline{\mathbf{x}})$.
By principle, as soon as $r^+>r^-$, the weights update
is expected to promote the correct responses (i.e. those associated with $r^+$).
Then, remarking that:
$\mathbb{P}(r = r^+|\underline{\mathbf{x}}) = \pi(\underline{\mathbf{x}},1)$,
we write:
\begin{align*}
E_{Y|\underline{X}}(\boldsymbol{g}(\underline{\mathbf{x}},y)) &= \pi(\underline{\mathbf{x}},1) E_{Y|\underline{X}}(\boldsymbol{g}(\underline{\mathbf{x}},y)|r = r^+)\\
&+ (1 - \pi(\underline{\mathbf{x}},1)) E_{Y|\underline{X}}(\boldsymbol{g}(\underline{\mathbf{x}},y)|r = r^-)
\end{align*}
and identify:
\begin{itemize}
\item[] $E_{Y|\underline{X}}(\boldsymbol{g}(\underline{\mathbf{x}},y)|r = r^+) = r^+ (1 - \pi(\underline{\mathbf{x}},1)) \boldsymbol{\Delta}(\underline{\mathbf{x}})$
\item[] $E_{Y|\underline{X}}(\boldsymbol{g}(\underline{\mathbf{x}},y)|r = r^-) = - r^- \pi(\underline{\mathbf{x}},1) \boldsymbol{\Delta}(\underline{\mathbf{x}})$
\end{itemize}
If the response is correct \emph{and} $r^+ > 0$, the gradient direction $\boldsymbol{\Delta}(\underline{\mathbf{x}})$
is followed, giving more chance to the correct choice in the next trials.
Interestingly, if the response is incorrect \emph{and} $r- < 0$ (i.e.
a negative mark is given to wrong choices), the same direction is followed,
so that the chance of making correct choices in the future is also enhanced.
%The \emph{norm} of the gradient is different for correct or incorrect choices,
%i.e. scaled by $r^+(1-\pi)$ for a correct choice and by $-r^-\pi$
%for an incorrect choice.
%(i.e. irrespectively of $r^+$ and $r^-$, the more unlikely the response, the higher the norm of the local gradient)
%We have shown that in the binary reward case with $r+ > 0$ and $r^-<0$,
Each update is thus expected to improve
the rate of correct responses, whether the actual response is correct or not.
%, which provides
%a stronger convergence garanty in comparison with EM-based unsupervised approaches \cite{Li06,Lu09,Kindermans12}.
\subsection{Noise-tolerant learning}\label{sec:non-reliable}
We now consider the case where the reward value is corrupted with a uniform noise $p_\text{valid}$, i.e.
at each trial, the chance of sending the correct reward is $p_\text{valid}$, the chance of sending the opposite reward is $1-p_\text{valid}$.
Then, the gradient expectation becomes:
\begin{alignsize*}{\footnotesize}
%\begin{align*}
E\left(\boldsymbol{g}(\underline{\boldsymbol{x}},y)\right) =&
p_\text{valid} (r^+-r^-) E \left[
\pi(\underline{\mathbf{x}},1)(1-\pi(\underline{\mathbf{x}},1))\boldsymbol{\Delta}(\underline{\mathbf{x}}) \right]\\
&+ (1-p_\text{valid}) (r^--r^+) E\left[
\pi(\underline{\mathbf{x}},1)(1-\pi(\underline{\mathbf{x}},1))\boldsymbol{\Delta}(\underline{\mathbf{x}}) \right]\\
=&
(2 p_\text{valid} -1) (r^+-r^-) E\left[
\pi(\underline{\mathbf{x}},1)(1-\pi(\underline{\mathbf{x}},1))\boldsymbol{\Delta}(\underline{\mathbf{x}}) \right]
%\end{align*}
\end{alignsize*}
In this equation, we recognize the typical $(2p_\text{valid}-1)$ term that appears when analyzing algorithms learning from
noisy labels (see, e.g. \cite{Kearns:1998,Denis:2006,Ralaivola:2006}); it
sets the limiting regimen, i.e. $p_\text{valid}>0.5$, where learning can take place. %In the present work, knowing that the incorrect labelling rate
%is lower than 30%, meaning that $p_{valid}>0.5$, we may safely learn from the data we have access to.
%We can then predict the update procedure will fail to improve the classifier as soon as $p_\text{valid} < 0.5$,
%i.e. when the number of misleading rewards
%overtakes the number of valid rewards.
Importantly, the norm of the estimator (and thus the
speed of the learning process) only displays a linear decrease with the rate of misleading rewards, which ensures
a correct but slowed down convergence even with a significant proportion of reward errors.
\section{Numerical experiments}\label{sec:P300}
\subsection{Dataset and preprocessing}\label{sec:preproc}
\begin{figure}
\centerline{
\includegraphics[width=\linewidth]{figs-copie/fig_EEG}
}
\caption{32-channels EEG signal excerpt.
Row or column tags are shown on top and stars denotes target row or column (here target row = 6, target column = 8).
The dashed square indicates an example 600 ms EEG sample taken after flashing the grid at position 8 (2nd column).}
\label{fig:EEG}
\end{figure}
The EEG dataset we use comes from a P300 experiment reported in \cite{Maby10}.
The data consists of 20 files, one file per subject
measuring the brain activity during a P300-speller
experiment where
each subject had to spell out mentally 220 letters.
%For trial letter was expected (the ``target symbol'').
For a given letter, rows and columns are flashed in random order
in order to enhance the ``surprise'' when the target row or column
is illuminated.
Each row and each column of the grid
is flashed several times before taking a decision.
In the considered dataset, each row and column was flashed 5 times per trial.
The stimulus duration was 100 ms, the inter-stimulus interval was also 100ms,
so that the total SOA\footnote{Stimulus Onset Asynchrony.} was 200 ms.
%, where the flash order
A 32-channels EEG signal was recorded during the whole experiment, sampled at 100 Hz.
% For simplicity as well as for clarity, we made the choice of having as few preprocessing as possible.
% On the contrary to most of the approaches seen in the litterature, we make no feature extraction and directly classify
% our vectors after the following light preprocessing step.
% Each trial is composed of many flashes.
The whole experiment is divided in sequences of $12 \times 5$ flashes, corresponding to one letter spelling trial.
For each series of 60 flashes, a 1-20 Hz bandpass filter is applied.
Then for each flash time $t$, a 600 ms subsample $\mathbf{s}_{t:t+600 \text{ms}}$ is taken,
a common reference average substraction is applied, followed by a channel-by-channel normalization.
Such a sample excerpt is presented on figure \ref{fig:EEG}.
The sample is then vectorized and put in its category $\in {1,...,K}$ (row or column number).
With a 100Hz sampling, the dimension of each data vector is $32 \times 60 = 1920$.
Then a set of multiple ERP observations is constructed the following way:
For $k \in 1,...,12$, calculate class average $\boldsymbol{x}_k$ and normalize it.
Finally, construct 2 multi-ERP set, i.e. $\underline{\mathbf{x}}^\text{row} = (\boldsymbol{x}_1,...,\boldsymbol{x}_6)$
and $\underline{\mathbf{x}}^\text{column} = (\boldsymbol{x}_7,...,\boldsymbol{x}_{12})$.
\subsection{Batch training}\label{sec:batch}
In the standard approach, EEG signal are recorded during a training session and analyzed prior to free spelling
%Specific features are extracted from the EEG channels, that map to the ERPs (event related potentials)
%that are expected to take place
% within specific temporal intervals after stimulation.