-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdp-draft.tex
2519 lines (1979 loc) · 229 KB
/
dp-draft.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\newif\ifdblatexpdf \ifx\pdfoutput\undefined \else \ifx\pdfoutput\relax \else\ifnum\pdfoutput>0 \dblatexpdftrue \fi \fi \fi
% ------------------------------------------------------------
% Autogenerated LaTeX file for books
% db2latex RELEASE: 0.8pre1
% db2latex VERSION: $Id: VERSION.xml,v 1.6 2004/01/31 12:47:11 j-devenish Exp $
% fithesis VERSION: 1.40
% ------------------------------------------------------------
\def\clsclass{rapport3}
\documentclass[draft]{fithesis}
% --------------------------------------------
% MetaFont and MetaPost logo support
% --------------------------------------------
\usepackage{mflogo}
% --------------------------------------------
% Load fithesis param
% --------------------------------------------
\thesistitle{Real-time Communication \\* in Web Browser}
\thesissubtitle{Master's thesis}
\thesisstudent{Pavel Smolka}
\thesiswoman{false}
\thesislang{en}
\thesisyear{2013}
\thesisfaculty{fi}
\thesisadvisor{doc.
RNDr. Tomá¹ Pitner, Ph.D.}
% --------------------------------------------
\label{idp229216}% --------------------------------------------
% Load graphicx package with pdf if needed
% --------------------------------------------
\ifdblatexpdf
\usepackage[pdftex]{graphicx}
\pdfcompresslevel=9
\else
\usepackage{graphicx}
\fi
\usepackage{anysize}
\marginsize{3cm}{2.5cm}{3.5cm}{3.5cm}
\makeatletter
% redefine the listoffigures and listoftables so that the name of the chapter
% is printed whenever there are figures or tables from that chapter. encourage
% pagebreak prior to the name of the chapter (discourage orphans).
\let\save@@chapter\@chapter
\let\save@@l@figure\l@figure
\let\the@l@figure@leader\relax
\def\@chapter[#1]#2{\save@@chapter[{#1}]{#2}%
\addtocontents{lof}{\protect\def\the@l@figure@leader{\protect\pagebreak[0]\protect\contentsline{chapter}{\protect\numberline{\thechapter}#1}{}{\thepage}}}%
\addtocontents{lot}{\protect\def\the@l@figure@leader{\protect\pagebreak[0]\protect\contentsline{chapter}{\protect\numberline{\thechapter}#1}{}{\thepage}}}%
}
\renewcommand*\l@figure{\the@l@figure@leader\let\the@l@figure@leader\relax\save@@l@figure}
\let\l@table\l@figure
\makeatother
\usepackage{fancyhdr}
\renewcommand{\headrulewidth}{0.4pt}
\renewcommand{\footrulewidth}{0.4pt}
% Safeguard against long headers.
\IfFileExists{truncate.sty}{
\usepackage{truncate}
% Use an ellipsis when text would be larger than x% of the text width.
% Preserve left/right text alignment using \hfill (works for English).
\fancyhead[ol]{\truncate{0.49\textwidth}{\sl\leftmark}}
\fancyhead[er]{\truncate{0.49\textwidth}{\hfill\sl\rightmark}}
\fancyhead[el]{\truncate{0.49\textwidth}{\sl\leftmark}}
\fancyhead[or]{\truncate{0.49\textwidth}{\hfill\sl\rightmark}}
}{\typeout{WARNING: truncate.sty wasn't available and functionality was skipped.}}
\pagestyle{fancy}
% ----------------------
% Most Common Packages
% ----------------------
\usepackage{latexsym}
\usepackage{enumerate}
\usepackage{fancybox}
\usepackage{float}
\usepackage{ragged2e}
\usepackage{fancyvrb}
\makeatletter\@namedef{FV@fontfamily@default}{\def\FV@FontScanPrep{}\def\FV@FontFamily{}}\makeatother
\fvset{obeytabs=true,tabsize=3}
\makeatletter
\let\dblatex@center\center\let\dblatex@endcenter\endcenter
\def\dblatex@nolistI{\leftmargin\leftmargini\topsep\z@ \parsep\parskip \itemsep\z@}
\def\center{\let\@listi\dblatex@nolistI\@listi\dblatex@center\let\@listi\@listI\@listi}
\def\endcenter{\dblatex@endcenter}
\makeatother
\usepackage{rotating}
\usepackage{subfigure}
\usepackage{tabularx}
\usepackage{url}
% --------------------------------------------
% Load hyperref package with pdf if needed
% --------------------------------------------
\ifdblatexpdf
\usepackage[pdftex,bookmarksnumbered,colorlinks,backref,bookmarks,breaklinks,linktocpage,plainpages=false, pdfstartview=FitH, plainpages=false, pdfpagelabels, unicode]{hyperref}
\else
\usepackage[bookmarksnumbered,colorlinks,backref,bookmarks,breaklinks,linktocpage,plainpages=false, plainpages=false, pdfpagelabels]{hyperref}
\fi
% --------------------------------------------
% ----------------------------------------------
% Define a new LaTeX environment (adminipage)
% ----------------------------------------------
\newenvironment{admminipage}%
{ % this code corresponds to the \begin{adminipage} command
\begin{Sbox}%
\begin{minipage}%
} %done
{ % this code corresponds to the \end{adminipage} command
\end{minipage}
\end{Sbox}
\fbox{\TheSbox}
} %done
% ----------------------------------------------
% Define a new LaTeX length (admlength)
% ----------------------------------------------
\newlength{\admlength}
% ----------------------------------------------
% Define a new LaTeX environment (admonition)
% With 2 parameters:
% #1 The file (e.g. note.pdf)
% #2 The caption
% ----------------------------------------------
\newenvironment{admonition}[2]
{ % this code corresponds to the \begin{admonition} command
\hspace{0mm}\newline\hspace*\fill\newline
\noindent
\setlength{\fboxsep}{5pt}
\setlength{\admlength}{\linewidth}
\addtolength{\admlength}{-10\fboxsep}
\addtolength{\admlength}{-10\fboxrule}
\admminipage{\admlength}
{\bfseries \sc\large{#2}} \newline
\\[1mm]
\sffamily
\includegraphics[width=1cm]{#1}
\addtolength{\admlength}{-1cm}
\addtolength{\admlength}{-20pt}
\begin{minipage}[lt]{\admlength}
\parskip=0.5\baselineskip \advance\parskip by 0pt plus 2pt
} %done
{ % this code corresponds to the \end{admonition} command
\vspace{5mm}
\end{minipage}
\endadmminipage
\vspace{.5em}
\par
}
% --------------------------------------------
% Commands to manage/style/create floats
% figures, tables, algorithms, examples, eqn
% --------------------------------------------
\floatstyle{plain}
\restylefloat{figure}
\floatstyle{plain}
\restylefloat{table}
\floatstyle{plain}
\newfloat{program}{ht}{lop}[section]
\floatstyle{plain}
\newfloat{example}{ht}{loe}[section]
\floatname{example}{Example}
\floatstyle{plain}
\newfloat{dbequation}{ht}{loe}[section]
\floatname{dbequation}{Equation}
\floatstyle{boxed}
\newfloat{algorithm}{ht}{loa}[section]
\floatname{algorithm}{Algorithm}
\ifdblatexpdf
\DeclareGraphicsExtensions{.pdf,.png,.jpg}
\else
\DeclareGraphicsExtensions{.eps}
\fi
% --------------------------------------------
% $latex.caption.swapskip enabled for $formal.title.placement support
\newlength{\docbooktolatextempskip}
\newcommand{\captionswapskip}{\setlength{\docbooktolatextempskip}{\abovecaptionskip}\setlength{\abovecaptionskip}{\belowcaptionskip}\setlength{\belowcaptionskip}{\docbooktolatextempskip}}
% Guard against a problem with old package versions.
\makeatletter
\AtBeginDocument{
\DeclareRobustCommand\ref{\@refstar}
\DeclareRobustCommand\pageref{\@pagerefstar}
}
\makeatother
% --------------------------------------------
\makeatletter
\newcommand{\dbz}{\penalty \z@}
\newcommand{\docbooktolatexpipe}{\ensuremath{|}\dbz}
\newskip\docbooktolatexoldparskip
\newcommand{\docbooktolatexnoparskip}{\docbooktolatexoldparskip=\parskip\parskip=0pt plus 1pt}
\newcommand{\docbooktolatexrestoreparskip}{\parskip=\docbooktolatexoldparskip}
\def\cleardoublepage{\clearpage\if@twoside \ifodd\c@page\else\hbox{}\thispagestyle{empty}\newpage\if@twocolumn\hbox{}\newpage\fi\fi\fi}
\usepackage[latin2]{inputenc}
\usepackage[T1]{fontenc}
\ifx\dblatex@chaptersmark\@undefined\def\dblatex@chaptersmark#1{\markboth{\MakeUppercase{#1}}{}}\fi
\let\save@makeschapterhead\@makeschapterhead
\def\dblatex@makeschapterhead#1{\vspace*{-80pt}\save@makeschapterhead{#1}}
\def\@makeschapterhead#1{\dblatex@makeschapterhead{#1}\dblatex@chaptersmark{#1}}
\AtBeginDocument{\ifx\refname\@undefined\let\docbooktolatexbibname\bibname\def\docbooktolatexbibnamex{\bibname}\else\let\docbooktolatexbibname\refname\def\docbooktolatexbibnamex{\refname}\fi}
% Facilitate use of \cite with \label
\newcommand{\docbooktolatexbibaux}[2]{%
\protected@write\@auxout{}{\string\global\string\@namedef{docbooktolatexcite@#1}{#2}}
}
% Provide support for bibliography `subsection' environments with titles
\newenvironment{docbooktolatexbibliography}[3]{
\begingroup
\let\save@@chapter\chapter
\let\save@@section\section
\let\save@@@mkboth\@mkboth
\let\save@@bibname\bibname
\let\save@@refname\refname
\let\@mkboth\@gobbletwo
\def\@tempa{#3}
\def\@tempb{}
\ifx\@tempa\@tempb
\let\chapter\@gobbletwo
\let\section\@gobbletwo
\let\bibname\relax
\else
\let\chapter#2
\let\section#2
\let\bibname\@tempa
\fi
\let\refname\bibname
\begin{thebibliography}{#1}
}{
\end{thebibliography}
\let\chapter\save@@chapter
\let\section\save@@section
\let\@mkboth\save@@@mkboth
\let\bibname\save@@bibname
\let\refname\save@@refname
\endgroup
}
%\usepackage{cite}
%\renewcommand\citeleft{(} % parentheses around list
%\renewcommand\citeright{)} % parentheses around list
\newcommand{\docbooktolatexcite}[2]{%
\@ifundefined{docbooktolatexcite@#1}%
{\cite{#1}}%
{\def\@docbooktolatextemp{#2}\ifx\@docbooktolatextemp\@empty%
\cite{\@nameuse{docbooktolatexcite@#1}}%
\else\cite[#2]{\@nameuse{docbooktolatexcite@#1}}%
\fi%
}%
}
\newcommand{\docbooktolatexbackcite}[1]{%
\ifx\Hy@backout\@undefined\else%
\@ifundefined{docbooktolatexcite@#1}{%
% emit warning?
}{%
\ifBR@verbose%
\PackageInfo{backref}{back cite \string`#1\string' as \string`\@nameuse{docbooktolatexcite@#1}\string'}%
\fi%
\Hy@backout{\@nameuse{docbooktolatexcite@#1}}%
}%
\fi%
}
% --------------------------------------------
% A way to honour <footnoteref>s
% Blame j-devenish (at) users.sourceforge.net
% In any other LaTeX context, this would probably go into a style file.
\newcommand{\docbooktolatexusefootnoteref}[1]{\@ifundefined{@fn@label@#1}%
{\hbox{\@textsuperscript{\normalfont ?}}%
\@latex@warning{Footnote label `#1' was not defined}}%
{\@nameuse{@fn@label@#1}}}
\newcommand{\docbooktolatexmakefootnoteref}[1]{%
\protected@write\@auxout{}%
{\global\string\@namedef{@fn@label@#1}{\@makefnmark}}%
\@namedef{@fn@label@#1}{\hbox{\@textsuperscript{\normalfont ?}}}%
}
\makeindex
% index labeling helper
\newif\ifdocbooktolatexprintindex\docbooktolatexprintindextrue
\let\dbtolatex@@theindex\theindex
\let\dbtolatex@@endtheindex\endtheindex
\@ifundefined{@openrighttrue}{\newif\if@openright}{}
\def\theindex{\relax}
\def\endtheindex{\relax}
\newenvironment{dbtolatexindex}[2]
{
\if@openright\cleardoublepage\else\clearpage\fi
\let\dbtolatex@@indexname\indexname
\def\dbtolatex@current@indexname{#2}
\ifx\dbtolatex@current@indexname\@empty \def\dbtolatex@current@indexname{\dbtolatex@@indexname}
\fi
\def\dbtolatex@indexlabel{%
\ifnum \c@secnumdepth >\m@ne \ifx\c@chapter\undefined\refstepcounter{section}\else\refstepcounter{chapter}\fi\fi%
\label{#1}\hypertarget{#1}{\dbtolatex@current@indexname}%
\global\docbooktolatexprintindexfalse}
\def\indexname{\ifdocbooktolatexprintindex\dbtolatex@indexlabel\else\dbtolatex@current@indexname\fi}
\dbtolatex@@theindex
}
{
\dbtolatex@@endtheindex\let\indexname\dbtolatex@@indexname
}
\newlength\saveparskip \newlength\saveparindent
\newlength\tempparskip \newlength\tempparindent
\def\docbooktolatexgobble{\expandafter\@gobble}
% Prevent multiple openings of the same aux file
% (happens when backref is used with multiple bibliography environments)
\ifx\AfterBeginDocument\undefined\let\AfterBeginDocument\AtBeginDocument\fi
\AfterBeginDocument{
\let\latex@@starttoc\@starttoc
\def\@starttoc#1{%
\@ifundefined{docbooktolatex@aux#1}{%
\global\@namedef{docbooktolatex@aux#1}{}%
\latex@@starttoc{#1}%
}{}
}
}
% --------------------------------------------
% Hacks for honouring row/entry/@align
% (\hspace not effective when in paragraph mode)
% Naming convention for these macros is:
% 'docbooktolatex' 'align' {alignment-type} {position-within-entry}
% where r = right, l = left, c = centre
\newcommand{\docbooktolatex@align}[2]{\protect\ifvmode#1\else\ifx\LT@@tabarray\@undefined#2\else#1\fi\fi}
\newcommand{\docbooktolatexalignll}{\docbooktolatex@align{\raggedright}{}}
\newcommand{\docbooktolatexalignlr}{\docbooktolatex@align{}{\hspace*\fill}}
\newcommand{\docbooktolatexaligncl}{\docbooktolatex@align{\centering}{\hfill}}
\newcommand{\docbooktolatexaligncr}{\docbooktolatex@align{}{\hspace*\fill}}
\newcommand{\docbooktolatexalignrl}{\protect\ifvmode\raggedleft\else\hfill\fi}
\newcommand{\docbooktolatexalignrr}{}
\ifx\captionswapskip\@undefined\newcommand{\captionswapskip}{}\fi
\makeatother
\title{\bfseries Real-time Communication \textbackslash \textbackslash * in Web Browser\\[12pt]\normalsize Master's thesis}
\author{Pavel Smolka}
% --------------------------------------------
\makeglossary
% --------------------------------------------
\setcounter{tocdepth}{4}
\setcounter{secnumdepth}{4}
\begin{document}
% --------------------------------------------
% Useing fithesis
% --------------------------------------------
\FrontMatter
\ThesisTitlePage
\begin{ThesisDeclaration}
\DeclarationText
\AdvisorName
\end{ThesisDeclaration}
% --------------------------------------------
% Thanks
% --------------------------------------------
\begin{ThesisThanks}
Above all, I~would like to thank my colleagues from Celebrio, with whom I~have been working for a long time and who inspired me to create this thesis; especially Petr Kunc, always challenging my ideas but always being helpful. Of course, I~would also like to thank my advisor, doc. Tomá¹ Pitner, for not only providing me with valuable advice and help during the work on my thesis, but also for guiding me through the studies at the faculty.
I~am very thankful to my parents for bringing me up to this point in the best way parents can, and still splendidly supporting and encouraging me to do my best. Also, I~thank my nearest for having patience with me working and studying and hardly finding any time for them.
I~must not forget my Lasaris laboratory fellow students, revealing the world of web development to me. I~would also like to thank my English teacher, Petra Wachsmuthová, for helping me with the language part. And of course my classmates, colleagues and friends, not only for reviewing the text part of this thesis but also for discussing the technologies and the programming part as well.
After all, I~am grateful to the Internet community, StackOverflow members, various IRC attendants, people contributing and commenting at GitHub, Twitter and other media sources that helped me and inspired me all the time. Thank you, world!
\end{ThesisThanks}
% --------------------------------------------
% Abstract
% --------------------------------------------
\begin{ThesisAbstract}
The thesis comprehends the topic of real-time communication in a web browser. Most of the available solutions for building real-time web applications are described and compared, with regards to security, browser support and usage difficulty. According to the theoretical results, a real-time application Talker is designed and developed, serving as text and video instant messaging client for Celebrio -- simple web-based application environment for the elderly. XMPP protocol, used in Talker in order to be interconnectible with other instant messaging clients, is also mentioned within the thesis, especially with regards to running the XMPP client in a web browser.
\end{ThesisAbstract}
% --------------------------------------------
% KeyWords
% --------------------------------------------
\begin{ThesisKeyWords}
XMPP, real-time communication, RTC, Celebrio, web browser, HTTP, Comet, JavaScript, WebSockets, BOSH, Ember, Strophe, OpenTok, Talker, server push\end{ThesisKeyWords}
\makeatletter
\def\dbtolatex@contentsid{idp195360}
\def\dbtolatex@@contentsname{\latex@@contentsname}
\let\latex@@contentsname\contentsname
\newif\ifdocbooktolatexcontentsname\docbooktolatexcontentsnametrue
\def\dbtolatex@contentslabel{%
\label{\dbtolatex@contentsid}\hypertarget{\dbtolatex@contentsid}{\dbtolatex@@contentsname}%
\global\docbooktolatexcontentsnamefalse}
\def\contentsname{\ifdocbooktolatexcontentsname\dbtolatex@contentslabel\else\dbtolatex@@contentsname\fi}
\let\save@@@mkboth\@mkboth
\let\@mkboth\@gobbletwo
\tableofcontents
\let\@mkboth\save@@@mkboth
\let\contentsname\latex@@contentsname
\Hy@writebookmark{}{\dbtolatex@@contentsname}{\dbtolatex@contentsid}{0}{toc}%
\makeatother
\MainMatter
% -------------------------------------------------------------
% Chapter Introduction
% -------------------------------------------------------------
\chapter{Introduction}
\label{uvod}\hypertarget{uvod}{}%
Millions, billions, trillions. So many and even more messages are exchanged every day between various people in the world. The Internet created a brand new way to communicate and collaborate, even if you are located on the opposite parts of the world. Since the times of Alexander Graham Bell, the accessibility to the communication devices and their simplicity have been incredibly enhanced. Nowadays, almost 2.5 billion people in the world have access to the Internet and, therefore, they are able to use almost limitless communication possibilities it provides. \docbooktolatexcite{internet-usage}{}
However, the manner of Internet usage essentially changed during the first decade of 21$^\textrm{\tiny st}$ century. Using the Internet and using the web browser became almost synonymous. People use the web browser as the primary platform to do every single task on the Internet. Sometimes it is not even possible to use the other Internet services without visiting certain web page in the web browser and performing the authentication there.\label{idp4484704}\begingroup\catcode`\#=12\footnote{
Two examples of such behavior. Wi-fi network in the Student Agency coaches forces the user to visit the entry page in the web browser. The second example, very well known to the students of the Faculty of Informatics at Masaryk University, is the faculty wireless network called wlan\_fi. Every user has to open the web browser and log in with her credentials. It is not possible just to open the terminal or e-mail client and start working online.
}\endgroup\docbooktolatexmakefootnoteref{idp4484704} Considering the mentioned fact, web browsers have become also the basic platform for the communication tools. Even though the purpose of the world wide web and HTTP protocol was completely different at first (displaying single documents connected via hypertext links), it appeared that there was a need for common rich applications running within a web browser -- a rich Internet application (RIA) sprang up. \docbooktolatexcite{ria}{} Such popular social networks are built on top of the web browser platform and they are used by more than a billion people in the world. \docbooktolatexcite{facebook-usage}{} And the main reason why the social networks are so popular is the real-time stream of news and messages from other people. At the beginning of 2013, I~would say that static web is dead -- users prefer interactivity.
As mentioned above, the web browser has become one of the most popular platforms. Celebrio, a simple software for the elderly, simulating the operating system interface, is a typical example of a rich Internet application.\label{idp3871856}\begingroup\catcode`\#=12\footnote{
{\textless}\url{http://www.celebriosoftware.com/celebrio-system}{\textgreater}
}\endgroup\docbooktolatexmakefootnoteref{idp3871856} All the topics mentioned in the previous paragraph appeared to be very important in the system. When interviewing the elderly people in the Czech Republic, it appeared that almost 90 \% of the elderly computer users use the real-time communication (RTC) applications, mostly Skype. \docbooktolatexcite{elderly-questionnaires}{} Interaction with their loved ones is the most desired benefit they expect from the computer. Therefore, creating a real-time application, a text messenger supporting video calling, has become not only a programming challenge but also a business goal.
With regard to the {``}real-time tendency{''}, this thesis embraces the topic of real-time applications in web browsers, especially the text communication tools and the technologies being used to develop them. Among the available solutions, XMPP protocol and OpenTok library (built on WebRTC) have been chosen and a real-time communication application has been designed and developed as a part of this thesis. Both text and multimedia streams are covered, as well as multimedia content transfer (audio and video).
Considering the fact that people prefer real-time communication (not only direct messaging but also real-time cooperation, simultaneous document editing or playing multiplayer games) while using a web browser brings us to the question what the currently available solutions are. There are {``}big players{''} providing their own services as closed-source, without the possibility to being used by third party developers. To name the most important ones, it is Google Talk web browser client and Facebook chat, using XMPP\index{XMPP} protocol. \docbooktolatexcite{gtalk}{}\docbooktolatexcite{fb-chat}{} Even though Facebook\index{Facebook} chat service is not a pure XMPP server implementation (the message and presence engine is a proprietary system of Facebook, implemented mostly in C++ and Erlang), they provide the possibility to connect to the {``}world of Facebook Chat{''} via XMPP as proxy. \docbooktolatexcite{fb-erlang}{} The combination of the facts that XMPP is an open technology with an open-sourced client and server implementations \docbooktolatexcite{xmpp-history}{} and the big Internet companies also use it persuaded us to use it in our communication application, too. XMPP itself and its usage in web applications is described in \hyperlink{chap-xmpp}{Chapter {\ref{chap-xmpp}}, {``}Extensible Messaging and Presence Protocol{''}}.
Nevertheless, there are also other RTC solutions not directly based in a web browser. Very popular communication platforms, XMPP (Jabber), ICQ or Windows Live Messenger have to be mentioned, all intended to run in dedicated client applications (Pidgin, ICQ, ...). All of them have been ported to a web browser in some way, in the form of applications such as Meebo (supporting ICQ and XMPP, however closed lately) or Google Talk (XMPP). There are also several voice over IP (VoIP) tools providing a video call platform, such as Skype\index{Skype}. To make this list comprehensive, Unix {``}talk{''} chat program for sending text messages has to be mentioned. However, it was superseded by previously mentioned modern systems.
Since the web browser was designed to perform simple request/response interaction, it is not a typical platform for building real-time applications. Thus, there is a need for an extra layer enhancing or even completely replacing the common way HTTP communicates. Within the scope of this thesis, primarily the WebSockets and HTTP long polling approaches are used. The two of them and basic information about several others are covered in \hyperlink{chap-rtc}{Chapter {\ref{chap-rtc}}, {``}Bidirectional communication between a web browser and a server{''}}.
There are many existing real-time chat-based applications on the Internet that could have been used. However, none of them suited our needs perfectly. Celebrio has a very specific graphical user interface (GUI) and there is a need to integrate both text-based chat and video calling. Just to mention, there is the commercial chat module Cometchat\label{idp2831776}\begingroup\catcode`\#=12\footnote{
http://www.cometchat.com/
}\endgroup\docbooktolatexmakefootnoteref{idp2831776} or even the open project Jappix.\label{idp205056}\begingroup\catcode`\#=12\footnote{
https://project.jappix.com/
}\endgroup\docbooktolatexmakefootnoteref{idp205056} Video calling web browser applications are provided for example by TokBox Inc.\label{idp226160}\begingroup\catcode`\#=12\footnote{
http://www.tokbox.com/
}\endgroup\docbooktolatexmakefootnoteref{idp226160} Nevertheless, following the rule that {``}If you have to customize 1/5 of a reusable component, its likely better to write it from scratch{''}, \docbooktolatexcite{brian-stats-tweet}{} just very generic existing libraries (Strophe.js) and APIs (OpenTok) were used for implementing a brand new application called {\em{Celebrio Talker}}. The general approaches when building web browser based chat application are mentioned in \hyperlink{chap-xmpp-in-javascript}{Chapter {\ref{chap-xmpp-in-javascript}}, {``}JavaScript XMPP client{''}}. Within the programming part of the thesis, the real-time text chat application and the video calling application for Celebrio have been implemented. Celebrio Talker application itself, its architecture and the specific procedures used to create it are described in \hyperlink{chap-talker}{Chapter {\ref{chap-talker}}, {``}Talker -- IM client in a web browser{''}}.
It has been said that Skype\index{Skype}\label{idp159168}\begingroup\catcode`\#=12\footnote{
http://www.skype.com/
}\endgroup\docbooktolatexmakefootnoteref{idp159168} is the most favorite communication tool among the target audience. If it were implemented, the existing customer base could be used and converted to our messaging application. However, there is one big pitfall in this approach. The Skype license strictly prohibits incorporating their software into mobile devices in third party applications. \docbooktolatexcite{skype-license}{} They support only prompting the official Skype client to be opened via Skype URI, which is insufficient for Celebrio since the messaging client has to be built in the system, with the corresponding user interface. \docbooktolatexcite{skype-uri}{}
% -------------------------------------------------------------
% Chapter Bidirectional communication between a web browser and a server
% -------------------------------------------------------------
\chapter{Bidirectional communication between a web browser and a server}
\label{chap-rtc}\hypertarget{chap-rtc}{}%
The very essence of every instant messaging is the bidirectional stream where both sides can immediately {\em{push}} new data and the other side (or more other sides) is promptly notified without the need to perform any manual {\em{pull}} (update) action\index{pull \& push communication}.\label{idp5459824}\begingroup\catcode`\#=12\footnote{
In this thesis, this behaviour is commonly referred as RTC. The {``}real-time part{''} relates mostly to the server part since the application running in the web browser can perform the AJAX request on background anytime and the server receives the request instantly.
}\endgroup\docbooktolatexmakefootnoteref{idp5459824} Such use case requires an appropriate transport layer on top of which the application can send messages via another protocol. When using HTTP, there is a TCP connection opened by the client (web browser) through which the data is sent. However, according to the HTTP protocol\index{HTTP}, the communication is strictly initiated by the client -- HTTP is a request/response protocol. \docbooktolatexcite{rfc-http}{} When the client continuously needs up-to-date information, it must poll the server as frequently as possible. Such approach takes a considerable amount of bandwidth and generates purposeless overhead on the server. So, when one wants to avoid those drawbacks and still make the web browser application communicate in both directions, HTTP protocol must be hacked somehow or another communication channel used. This chapter covers both -- reshaping HTTP in \hyperlink{chap-http-requests}{Section {\ref{chap-http-requests}}} and a brand new approach in \hyperlink{chap-ws}{Section {\ref{chap-ws}}}, bypassing HTTP completely. Unfortunately, every approach brings also some disadvantages. Ultimately, an overview of several higher-level solutions is to be found in \hyperlink{chap-high-level-rtc}{Section {\ref{chap-high-level-rtc}}}, most of which are based on HTTP requests or WebSockets.
% ------------------------
% Section
\section{Using HTTP requests}
\label{chap-http-requests}\hypertarget{chap-http-requests}{}%
Hypertext Transfer Protocol represents the most widely used protocol in web applications. It is a typical example of an application layer protocol (7th level), according to OSI Model\index{OSI Model} (ISO/IEC 7498-1). HTTP powers the Web. Along with other application layer protocols -- Simple Mail Transfer Protocol (SMTP)\index{SMTP}, File Transfer Protocol (FTP)\index{FTP} and DNS (domain name system) protocols -- HTTP constitutes the whole Internet as we know it. HTTP works as a {``}request-response{''} protocol, presuming an underlying transport layer protocol on top of which it works. The underlying transport protocol is almost always represented by Transmission Control Protocol (TCP)\index{TCP}, however, UDP can be used as well (for example in case of Simple Service Discovery Protocol (SSDP). \docbooktolatexcite{ssdp-udp}{} Nowadays, HTTP is not only used by web browser clients but also, thanks to its simplicity, by various mobile applications and Internet services, requesting new data or updating the information on a server.
HTTP is a very lightweight client-server protocol, where a client carries out a request and a server responds. The request specifies the action and a resource the action relates to, along with the protocol version. For example, the following request line represents a request fetching index.html file from the server:
\begin{Verbatim}[fontsize=\small]
GET /index.html HTTP/1.1
\end{Verbatim}
A~further request data is optional. After the initial line, custom request headers can be specified, along with an optional request body, which usually contains the request data. \docbooktolatexcite{rfc-http}{} For example, in case of POST request, updating the resource, the new resource data is contained in the request body.
HTTP response structure is similar to a request. The first initial line contains the response status code and a {``}reason phrase{''}, which identify how the request was handled. \docbooktolatexcite{rfc-http}{} Then, optional headers and message body can be listed. The following piece of code displays the response to the previous example request, resulted in success and transferring simple HTML code to the client:
\begin{Verbatim}[fontsize=\small]
HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
<html><body><p>
This is the HTML content in response body,
separated by one blank line from the headers.
</p></body></html>
\end{Verbatim}
The very first approach to achieve RTC in a web browser, for a very long time the only one, is hacking HTTP. The idea is very simple. Generally, the client sends an extra request and it is not awaiting an immediate response. Instead, the server keeps the request for some time and sends the data as it comes in the response. There are several techniques to achieve such behaviour, in general called {\em{Comet}}\index{Comet}. Naive approach, consisting of ordinary HTTP requests, along with more advanced techniques of HTTP long polling and HTTP streaming are described in the following sections.
\subsection{Naive approach to real-time communication with HTTP}
\label{idp5479936}\hypertarget{idp5479936}{}%
It would be useful to keep an HTTP request opened for a bit longer time in order to perform real-time communication within an opened stream. When simple HTTP is used, however, each request corresponds to exactly one resource retrieved from a server. There is one common misunderstanding about long-lived HTTP requests. Since HTTP 1.1 (actually implemented even before, but not covered in the RFC specification), there is a possibility for the client to claim a persistent TCP\index{TCP} connection to the server, declaring {\texttt{{Connection: Keep-Alive}}} in the request header.\index{HTTP!Keep-Alive} Actually, all connections are considered persistent unless declared otherwise. \docbooktolatexcite{rfc-http}{} Even though the default timeout after which the server closes the connection lasts only several seconds, \docbooktolatexcite{apache-core-features}{} the persistent TCP connection is very useful for delivering various resources (style sheets, scripts, images, etc.) to the client without the unnecessary overhead of creating multiple streams. However, every single transmission within the TCP connection has to be in form of a separate HTTP request/response, always initiated by the client. On no account is the server allowed to push any data without a respective request from the client. Therefore, such TCP connection is of no use to RTC.
The situation is depicted in \hyperlink{fig-http-polling}{Figure {\ref{fig-http-polling}}} and \hyperlink{fig-http-polling-wrong}{Figure {\ref{fig-http-polling-wrong}}}. In the former case, a valid sequence of HTTP requests and responses is shown. Nevertheless, there is a delay between the moment the server gets (either generates or receives from a third party) the data (2) and the following request (3). Yet, it is possible to reduce the latency by shortening the polling time (the time between Response (1.1) and Request (3)), it is still a trade-off between the delay and overhead caused by frequent empty request/response pairs.
% figure ------------------------------------------------------
\begin{figure}[hbt]
\hypertarget{fig-http-polling}{}%
\begin{center}
{{\includegraphics[]{vp/http-polling}}\hypertarget{idp5498784}{}%
\label{idp5498784}
}
{{\caption[{Correct HTTP polling with delay}]{{{Correct HTTP polling with delay}}}\label{fig-http-polling}}}
\end{center}
\end{figure}
% figure ------------------------------------------------------
\begin{figure}[hbt]
\hypertarget{fig-http-polling-wrong}{}%
\begin{center}
{{\includegraphics[]{vp/http-polling-wrong}}\hypertarget{idp5501296}{}%
\label{idp5501296}
}
{{\caption[{Forbidden HTTP response without respective request}]{{{Forbidden HTTP response without respective request}}}\label{fig-http-polling-wrong}}}
\end{center}
\end{figure}
On the other hand, \hyperlink{fig-http-polling-wrong}{Figure {\ref{fig-http-polling-wrong}}} depicts the forbidden situation of generating an HTTP response without a respective previous request. When the server gets the data (2), it is not allowed to initiate the connection and send an HTTP response without an appropriate preceding request (2.1). Even though the delay, mentioned in the previous paragraph, can be minimized in this situation, HTTP servers cannot use such a technique. To sum it up, a response (2.1) is forbidden by HTTP protocol and, therefore, this situation solution is not valid.
\subsection{HTTP long polling\index{HTTP!long polling}}
\label{chap-http-long-polling}\hypertarget{chap-http-long-polling}{}%
The essence of HTTP long polling springs from the idea of prolonging the time span between two poll requests. In traditional {``}short polling{''}, a client sends regular requests to the server and each request attempts to {``}pull{''} the available data. If no data is available, an empty response is sent. \docbooktolatexcite{rfc-bidirectional-http}{} That generates unnecessary overhead for both the client and the server.
On the contrary, long polling tries to reduce this load. After receiving the request, the server {\em{does not}} answer immediately and holds the connection opened. When the server receives (or even makes up by itself) new data, it carries out the response with the respective content, as depicted in \hyperlink{fig-http-long-polling}{Figure {\ref{fig-http-long-polling}}}.
% figure ------------------------------------------------------
\begin{figure}[hbt]
\hypertarget{fig-http-long-polling}{}%
\begin{center}
{{\includegraphics[]{vp/http-long-polling}}\hypertarget{idp5511168}{}%
\label{idp5511168}
}
{{\caption[{HTTP long polling}]{{{HTTP long polling}}}\label{fig-http-long-polling}}}
\end{center}
\end{figure}
As soon as the client obtains the response, it usually issues a new request immediately, so the process can repeat endlessly. If no data appears on the server for a certain amount of time, it usually responds with an empty data field just to renew the connection.
One of the main drawbacks of long polling is the header overhead. Every chunk of data in RTC applications is usually very short, for example a text message of minimal length. However, each update is served by a full HTTP request/response with the header easily reaching 800 characters. \docbooktolatexcite{pro-html5-programming}{} If the payload is a message 20 characters long, the header constitutes 4000\% overhead!\index{HTTP!header overhead} This drawback has an even bigger impact as the number of clients increases. \hyperlink{fig-http-overhead}{Figure {\ref{fig-http-overhead}}} shows the comparison of 1000 (A), 10000 (B) and 100000 (C) clients polling the server every second with the message 20 characters long, both using classic HTTP requests and WebSockets technology (mentioned in \hyperlink{chap-ws}{Section {\ref{chap-ws}}}). \docbooktolatexcite{pro-html5-programming}{} It is obvious that there is huge unnecessary network overhead when using HTTP polling instead of WebSockets.
% figure ------------------------------------------------------
\begin{figure}[hbt]
\hypertarget{fig-http-overhead}{}%
\begin{center}
{{\includegraphics[width=420pt]{img/http-overhead}}\hypertarget{idp5518848}{}%
\label{idp5518848}
}
{{\caption[{Comparison of network overhead (HTTP and WebSockets)}]{{{Comparison of network overhead (HTTP and WebSockets)}}}\label{fig-http-overhead}}}
\end{center}
\end{figure}
Furthermore, if the server has just received the data and sent the response to the client, there is a {``}blind window{''} when the server cannot notify the client. The whole push system is blocked until the response is received by the client, processed and a new request is delivered back to the server. Considering also the possible packet loss and required retransmission in the TCP protocol, the delay can be even longer than double bandwidth latency. \docbooktolatexcite{rfc-bidirectional-http}{}
\subsection{HTTP streaming\index{HTTP!streaming}}
\label{chap-http-streaming}\hypertarget{chap-http-streaming}{}%
HTTP streaming is a slightly different technique than long polling, although they are confused one with the other very often. What is mutual for both of the approaches is the client initializing the communication by an HTTP request. The server also sends the update as the part of the HTTP response. The main difference is that once the server initializes the response and sends the data, it does not terminate the response and keeps the HTTP connection opened. Meanwhile, the client listens to the response stream and reads the data pushed from the server. When any new data springs up on the server side, it is concatenated to the one existing response stream. \docbooktolatexcite{rfc-bidirectional-http}{} See the scheme in \hyperlink{fig-http-streaming}{Figure {\ref{fig-http-streaming}}}.
% figure ------------------------------------------------------
\begin{figure}[hbt]
\hypertarget{fig-http-streaming}{}%
\begin{center}
{{\includegraphics[]{vp/http-streaming}}\hypertarget{idp5527440}{}%
\label{idp5527440}
}
{{\caption[{HTTP streaming}]{{{HTTP streaming}}}\label{fig-http-streaming}}}
\end{center}
\end{figure}
It is very important not to confuse HTTP streaming with the {``}persistent{''} HTTP requests. As said at the beginning of this chapter, declaring {\texttt{{Connection: Keep-Alive}}} does not allow the server to issue multiple responses to a single request. Such behaviour would be serious violation of HTTP protocol. Instead, the server can declare {\texttt{{Transfer-Encoding: chunked}}}\index{HTTP!chunked response} status in the response header and send the response split into separate pieces, as shown below (chunk of zero length stands for the end of the response): \docbooktolatexcite{rfc-bidirectional-http}{}
\begin{Verbatim}[fontsize=\small]
HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked
25
This is the data in the first chunk
1C
and this is the second one
0
\end{Verbatim}
The main drawback of HTTP streaming can be generally called buffering. There is no requirement for both the client and an intermediary (proxies, gateways, etc.) to handle the incoming data until the whole response is sent. Therefore, all parts of the response could be kept by the proxy and the messages (single HTTP response chunks) are not delivered to the client until the whole response is sent. Similarly, when the response consists of JavaScript statements, the browser does not have to execute them before the whole response is obtained (yet, most of the browsers execute it immediately). In such cases, HTTP streaming will not work. \docbooktolatexcite{rfc-bidirectional-http}{}
% ------------------------
% Section
\section{Permanent TCP streams with WebSockets\index{WebSockets}}
\label{chap-ws}\hypertarget{chap-ws}{}%
Although the World Wide Web with an HTTP request/response scheme has never been intended to serve as an RTC platform, the contemporary applications require such functionality and developers started to bend the protocol in an undesirable way. Most of the patterns described in \hyperlink{chap-http-streaming}{Section {\ref{chap-http-streaming}}} do their jobs and one can achieve sufficient two-way communication. Yet, there are certain performance issues and drawbacks which make them difficult to use. At least, those techniques carry HTTP header overhead which is unnecessary for standard bidirectional streams. Therefore, a brand new standard for creating full-duplex communication channels between a web browser and a server has been created. The technology is called {\em{WebSockets}}\index{WebSocket} (sometimes shortened as WS) and it stands for a communication protocol layered over the TCP along with a browser API for web developers. Anyway, not even WebSockets are allowed to access wider network -- their connection possibilities are limited only to the dedicated WS servers (usually HTTP servers with additional module for WS support attached). \docbooktolatexcite{js-definitive-guide}{}
Similarly as in HTTP, there is an unencrypted version of WebSockets working directly on top of TCP connection. The simplest way to recognize such a connection is WebSocket URI\index{WebSocket!URI}, beginning with {\texttt{{ws://}}}. It should not be used for two reasons. The first one, rather obvious, is security\index{security} -- the communication can be captured during the transmission. Transparent proxy servers are the second reason. If an unencrypted WebSocket connection is used, the browser is unaware of the transparent proxy and as a result, the WebSocket connection is most likely to fail. \docbooktolatexcite{ws-proxy}{}\docbooktolatexcite{definitive-guide-to-ws}{} As opposite, there is a secure way to use WebSockets. WebSockets Secure (WSS) protocol is standard WS wrapped in TLS tunnel, similarly as HTTP can be transmitted over TLS layer. When using WSS, the URI begins with {\texttt{{wss://}}} and it uses port 443 by default. \docbooktolatexcite{definitive-guide-to-ws}{}
\subsection{WebSocket handshake\index{WebSocket!handshake}}
\label{idp5546208}\hypertarget{idp5546208}{}%
WebSockets, as any other multilateral protocol, need to perform a handshake before an actual transmission can occur. During the handshake, a connection is established and both peers acknowledge the properties of the communication.
Since WebSockets emerged as an HTTP supplement, the handshake is initialized by an HTTP request\label{idp5549136}\begingroup\catcode`\#=12\footnote{
According to RFC6455, the protocol is designed to work over the HTTP port 80, as well as 443 to support HTTP proxies. However, the design is not limited to HTTP and the future implementations can use simpler handshake over a dedicated port. \docbooktolatexcite{rfc-ws}{}
}\endgroup\docbooktolatexmakefootnoteref{idp5549136} initialized by a client. The client sends the request as follows: \docbooktolatexcite{rfc-ws}{}
\begin{Verbatim}[fontsize=\small]
GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: RWFzdGVyIGVnZyBmb3IgQWRh
Origin: http://example.com
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Version: 13
\end{Verbatim}
Let us have a look at what each of the lines means. The first two lines are obvious, they represent a typical HTTP GET request. Specifying {\texttt{{Host}}} is important for the server to be able to handle multiple virtual hosts on a single IP address. The following two lines, {\texttt{{Upgrade: websocket}}} and {\texttt{{Connection: Upgrade}}}, are the most important. A~client informs a server about the desire to use WebSockets. The rest of the request stands for additional information for the server to be able to respond correctly. RFC 6455 describes the details. \docbooktolatexcite{rfc-ws}{}
The server should send an HTTP response looking similar to the following example: \docbooktolatexcite{rfc-ws}{}
\begin{Verbatim}[fontsize=\small]
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
\end{Verbatim}
The number 101 on the first line of the response stands for the HTTP status {\texttt{{Switching Protocols}}}, \docbooktolatexcite{rfc-http}{} which means the server supports WebSockets and the connection can be established. Any status code other than 101 indicates that the WebSocket handshake has not completed and the semantics of the HTTP still apply. \docbooktolatexcite{rfc-ws}{}
{\texttt{{Sec-WebSocket-Key}}} is a random secret issued by a client, base64-encoded and added to an initial protocol-switching request. A~server is supposed to concatenate the secret with Globally Unique Identifier (GUID)\index{GUID} "258EAFA5-E914-47DA- 95CA-C5AB0DC85B11" and to hash the result with SHA-1 algorithm. The result is returned as {\texttt{{Sec-WebSocket-Accept}}} header field, base64-encoded. \docbooktolatexcite{rfc-ws}{} However, a security issue here seems to be in this process. If the initial request is not sent over an encrypted HTTP connection (HTTPS), it can be caught by a third party. Since the server does not authenticate in any way and the algorithm does not contain any server secret, the third party attacker could fake the response and pretend to be the server.
\subsection{Frames and masking}
\label{idp5562272}\hypertarget{idp5562272}{}%
All the data sent via the WebSockets protocol is chunked into frames\index{frames}, working similarly to TCP frames. All transmission features are handled by the WebSocket API and the transmission is transparent for the application layer above (for example the JavaScript API) so that every message appears in the same state as it was sent. This means the message, a single portion of WS communication, can be fragmented during the transmission.
There are several special features concerning WS frames, one of the interesting ones is masking\index{masking}. The payload data of every frame sent from a client is XORed by a masking key of a 32-bit size. The purpose of masking is to prevent any third party from picking any part of the payload and reading it. The other goal might be distinguishing a server stream from a client stream instantly since client-to-server frames always {\em{must}} be masked and server-to-client frames {\em{must not}} be masked under any circumstances. In addition, WS peers have to use masking even if the communication is running on top of TLS layer so the {``}encryption{''} function is pointless. \docbooktolatexcite{rfc-ws}{} The security function of masking is also questionable because the masking key is included in a frame header. The only reason is preventing from random cross-protocol attacks. \docbooktolatexcite{pro-html5-programming}{}
\subsection{WS JavaScript API\index{WebSocket!API}}
\label{idp5569024}\hypertarget{idp5569024}{}%
Since the WebSocket technology is intended to be used particularly from browser applications, there is a need for an API web developers can use. The most widespread programming language of web browser client applications is JavaScript and so is the WebSocket API created for it. The API consists of one relatively simple JavaScript interface called {\texttt{{WebSocket}}},\label{idp5572208}\begingroup\catcode`\#=12\footnote{
In the older versions of some browsers, the interface was called differently due to the technology immaturity. For instance, Firefox from version 6 to 10 supports WebSockets only as {\texttt{{MozWebSocket}}}. An interesting fact is that Firefox 4 and 5 provides {\texttt{{WebSocket}}} interface as it is, just implementing a different WebSocket protocol version. Since Firefox 11.0, current (RFC 6455) WS protocol version is accessible via {\texttt{{WebSocket}}} interface. \docbooktolatexcite{mozilla-ws}{}
}\endgroup\docbooktolatexmakefootnoteref{idp5572208} \docbooktolatexcite{ws-api}{} placed as a property of {\texttt{{window}}} object.\label{idp5576656}\begingroup\catcode`\#=12\footnote{
Properties of {\texttt{{window}}} are accessible in JavaScript directly. Simple test {\texttt{{window.WebSocket === WebSocket}}} returning {\texttt{{true}}} proves it.
}\endgroup\docbooktolatexmakefootnoteref{idp5576656} It wraps the WebSocket client functionality performed by a user agent (i.e. a web browser). Using the API is very simple. The object, which handles all the WS functionality, is created by calling a {\texttt{{WebSocket}}} constructor: \docbooktolatexcite{ws-html5rocks}{}
\begin{Verbatim}[fontsize=\small]
var connection = new WebSocket(
'ws://html5rocks.websocket.org/echo',
['soap', 'xmpp']
);
\end{Verbatim}
The first (mandatory) argument stands for a WebSocket URI a client attempts to connect to. It can either begin with a {\texttt{{ws://}}} prefix or {\texttt{{wss://}}}, depending whether the TLS layer is to be used or not. The second parameter is optional -- specific WS subprotocols can be demanded there. Since there are only a few subprotocols recorded by IANA registry, it has been of little use so far. \docbooktolatexcite{ws-iana}{}
{\texttt{{WebSocket}}} interface provides at least four event handlers, to each of whom a custom callback can be attached. \docbooktolatexcite{ws-api}{} Those are {\texttt{{onopen}}}, {\texttt{{onmessage}}}, {\texttt{{onerror}}} and {\texttt{{onclose}}}. The names are rather self-explanatory, they serve as the event listeners watching for an incoming activity -- anytime the websocket obtains a message, its status changes or an error occurs, and the respective callback is fired. The callback registration can look as follows:
\begin{Verbatim}[fontsize=\small]
connection.onmessage = function (message) {
console.log('We got a message: ' + message.data);
};
\end{Verbatim}
In addition, there is a property {\texttt{{readyState}}} (it would be {\texttt{{connection.readyState}}} in the previous example), keeping the current WebSocket status all the time. The status can be retrieved by testing the property against one of the {\texttt{{WebSocket}}} property constants {\texttt{{CONNECTING}}}, {\texttt{{OPEN}}}, {\texttt{{CLOSING}}} or {\texttt{{CLOSED}}}.
Sending the data to a server is also rather straightforward. Either {\texttt{{ArrayBufferView}}}, {\texttt{{DOMString}}}, {\texttt{{ArrayBuffer}}} or {\texttt{{Blob}}} can be sent via the {\texttt{{send}}} method. See the examples below: \docbooktolatexcite{ws-html5rocks}{}
\begin{Verbatim}[fontsize=\small]
// Sending a String
connection.send('string message');
// Sending the canvas ImageData as an ArrayBuffer
var img = canvas_context.getImageData(0, 0, 400, 320);
var binary = new Uint8Array(img.data.length);
for (var i = 0; i < img.data.length; i++) {
binary[i] = img.data[i];
}
connection.send(binary.buffer);
// Sending a file as a Blob
var file = document.querySelector('input[type="file"]').files[0];
connection.send(file);
\end{Verbatim}
To sum it up, using WebSockets became a very simple and elegant way to provide a real-time communication channel between a web browser and a WS server. The main drawback of WS is lack of support not only in the older versions of web browsers but also in the mobile platform browsers. Currently, less than 60 \% of users can make use of the WebSocket full support. \docbooktolatexcite{ws-caniuse}{} Particularly, all versions of Internet Explorer below 10 (which means more than 98 \% of IE users in November 2012) \docbooktolatexcite{ie-statistics}{} do not implement the WebSockets JavaScript API. There are two favourable aspects in favour of WebSockets. Firstly, more and more web browsers add the JavaScript API to support WebSockets. Secondly, the ratio of clients who use an old version of a web browser without the WS support tends to diminish. Nevertheless, if the real-time functionality constitutes the application core functionality, there is a strong need for offering a fallback technology that every browser supports -- usually represented by the HTTP long polling or streaming mechanism, described in \hyperlink{chap-http-requests}{Section {\ref{chap-http-requests}}}.
\subsection{WebSocket API wrappers}
\label{idp5601536}\hypertarget{idp5601536}{}%
WebSockets is a powerful technology, yet there are many browsers which do not support it. In that case, when the real-time communication constitutes the core functionality, a fallback (i.e. an alternative technology used when the original is missing) must be defined to substitute the WebSockets. It might be Adobe Flash or the HTTP polling. It would be great not to have to define a fallback in every project again and again. Luckily, there are several API wrappers for this, doing this part of job for the developer automatically.
The basic use case is obvious. Using the wrapper instead of the WS API itself guarantees the developer that a fallback is used when the application runs in an environment without WebSockets. The whole process of choosing the transport technology is transparent and not necessary to be specified. As examples, however not used in Talker application, the projects Socket.IO\label{idp5603856}\begingroup\catcode`\#=12\footnote{
http://socket.io/
}\endgroup\docbooktolatexmakefootnoteref{idp5603856} and SockJS should be mentioned.\label{idp5604624}\begingroup\catcode`\#=12\footnote{
https://github.com/sockjs/sockjs-client
}\endgroup\docbooktolatexmakefootnoteref{idp5604624}
% ------------------------
% Section
\section{Server-sent events\index{Server-sent events}}
\label{idp5605648}\hypertarget{idp5605648}{}%
The Server-sent events (aka EventSource, from this point referred only as SSE) should in fact not be listed here but in the next section. It is a technology based on the HTTP streaming\index{HTTP!streaming}, described in \hyperlink{chap-http-streaming}{Section {\ref{chap-http-streaming}}} so it is not at the basic {``}zero{''} level. However, SSE are often compared to WebSockets so that the topic is introduced here. SSE have been standardized as part of the HTML5 standard. \docbooktolatexcite{sse-api}{} There is a very brief summary of an SSE API and its usage in this section.
As any other web technology, an SSE connection must be initialized by a client. There is a JavaScript API providing event handlers, very similar to the WS API. An event stream is opened with a constructor, pointing to a resource on a server:
\begin{Verbatim}[fontsize=\small]
var eventSource = new EventSource("sse-example.php");
\end{Verbatim}
A~script on the server, {\texttt{{sse-example.php}}} in our case, pushes the data to an opened HTTP response stream. What is important to get the SSE work, the {\texttt{{Content-Type}}} header must be set to the value {\texttt{{text/event-stream}}}. The data has to be organized in a form of {``}paragraphs{''}, separated by a blank line, where every paragraph stands for one message. Have a look at an example below:
\begin{Verbatim}[fontsize=\small]
data: This is one-line message
id: 123
event: myevent
data: Message of type "myevent" which consists of several lines
data: Another line of the event message
\end{Verbatim}
As shown above, every line in the message consists of a key and a value, separated by a colon, similar to the JSON format. When we need to transfer a multi-line message, we can repeat the key several times. \docbooktolatexcite{sse-multiline}{} Depending on the {\texttt{{event}}} entry ({\texttt{{myevent}}} in our example), the respective event handler is triggered in the JavaScript API. In this case, it would be the following event listener (if it has been attached in JavaScript before) logging the event to the console:
\begin{Verbatim}[fontsize=\small]
eventSource.addEventListener("myevent", function(e) {
// process the event
console.log(e);
}, false);
\end{Verbatim}
A~connection is closed either by a client by calling {\texttt{{close}}} method on the {\texttt{{EventSource}}} object or by a server (when all data is sent). However, if a server closes the connection, a client attempts to reconnect to the same resource. So, a server cannot close the connection permanently.
Server-sent events are often compared to WebSockets, though they are much less known. The support of both in the current web browsers is very similar. The only main difference is the lack of support of SSE in Internet Explorer 10, which finally provides the WebSocket API. Another difference is the fact that SSE, unlike WS, does not provide a real bidirectional stream. Only a server can publish new messages through an opened HTTP connection. A~client has to push the data to the server via another (standard) HTTP requests. Finally, there is rather a big limitation in the message format. While WebSockets provide a real TCP connection any data can be transferred through (including binary streams), SSE is restricted to the textual data in the form of key/value pairs. So, the only compelling reason to use SSE when WebSockets exist is the fact that WS communication is not so matured. There are proxies and NATs (network address translation points) which do not respect the long-lived nature of WebSockets and close them after relatively short period of time (1 minute for example). Therefore, it comes in handy to know about the SSE when working with real-time applications in a web browser.
% ------------------------
% Section
\section{Media streaming with a WebRTC technology\index{WebRTC}}
\label{idp5622912}\hypertarget{idp5622912}{}%
Up to here, none of the technologies mentioned was fully suited as a complete solution for a web-browser-based media communication, such as video calling. Although WebSockets are the most advanced approach, it is only a low level API providing a TCP stream. On that account, the web browser developers, with Chromium developers in the vanguard, invented a WebRTC technology. It is an API linking up a user media API (webcam, microphone) with the streaming API for sending the multimedia from a browser to the other communication node.\label{idp5625280}\begingroup\catcode`\#=12\footnote{
Apart from the video calling, the WebRTC provides an API for sending the files from one peer to another.
}\endgroup\docbooktolatexmakefootnoteref{idp5625280}
While WebSockets serve as an interconnection between a client (a web browser) and a dedicated server, which makes the technology suitable for the {``}server-based{''} protocols such as XMPP, WebRTC provides a real peer-to-peer connection\index{peer-to-peer connection}, directly between two web browsers. That makes WebRTC a perfect tool for implementing the direct media communication, such as the video calls.
\subsection{Signaling\index{WebRTC!Signaling}}
\label{idp5628400}\hypertarget{idp5628400}{}%
In fact, a server has to mediate the {``}meta data{''} in WebRTC, such as initializing a connection or negotiating the available media capabilities (such as codecs). This level of communication, exchanging the information about the connection itself, is called {\em{signaling}}\index{signaling}. WebRTC does not take care either about a layer the signaling data is transferred at, or the signaling protocol itself. It can be SIP, XMPP or any other, transferred via XMLHTTPRequest or WebSockets. What is important, signaling is not a part of a WebRTC API. The WebRTC connection itself then concerns only the peers, as shown in \hyperlink{fig-webrtc}{Figure {\ref{fig-webrtc}}}. \docbooktolatexcite{webrtc-signaling}{}
% figure ------------------------------------------------------
\begin{figure}[hbt]
\hypertarget{fig-webrtc}{}%
\begin{center}
{{\includegraphics[width=420pt]{img/webrtc}}\hypertarget{idp5635104}{}%
\label{idp5635104}
}
{{\caption[{WebRTC communication schema}]{{{WebRTC communication schema}}}\label{fig-webrtc}}}
\end{center}
\end{figure}
\subsection{WebRTC API}
\label{idp5636608}\hypertarget{idp5636608}{}%
WebRTC provides a very high level API abstracting the media device access, a network connection and the process of encoding/decoding the media streams from a programmer. Unfortunately, the WebRTC API is still in the phase of a draft and it has not been standardized yet. \docbooktolatexcite{webrtc-rfc}{} It means that JavaScript objects are prefixed by vendor prefixes, so that there is {\texttt{{webkitRTCPeerConnection}}} instead of {\texttt{{RTCPeerConnection}}} in Chromium.
The core API object is a JavaScript object {\texttt{{RTCPeerConnection}}}. Creating one may look as follows (with a respective prefix):
\begin{Verbatim}[fontsize=\small]
var config = {"iceServers": [{"url": "stun:stun.l.google.com:19302"}]};
var pc = new RTCPeerConnection(config); // webkitRTCPeerConnection
\end{Verbatim}
Then, we can send all available ICE\index{ICE}\label{idp5642624}\begingroup\catcode`\#=12\footnote{
ICE stands for Interactive Connectivity Establishment
}\endgroup\docbooktolatexmakefootnoteref{idp5642624} candidates to the other peer, via a specified STUN\index{STUN} server. ICE candidate\index{ICE candidate} is basically a possible transport address for the media stream, later validated for the peer-to-peer connectivity. \docbooktolatexcite{ice}{} So, each possible connection address is sent via the previously created {\texttt{{pc}}} object. The process is still in the phase of negotiation, so the sending is up to a signaling service:
\begin{Verbatim}[fontsize=\small]
pc.onicecandidate = function (event) {
// use existing signaling channel to send the candidate
signalingChannel.send(JSON.stringify({ "candidate": event.candidate }));
};
\end{Verbatim}
In case the other side publishes its video stream, we set up a hook which handles it and shows it in a remote video element, stored in the {\texttt{{remoteView}}} JavaScript variable. \docbooktolatexcite{webrtc-rfc}{} In other words, the incoming URL is assigned to the {\texttt{{video}}} element as a source ({\texttt{{src}}}) attribute. The {\texttt{{src}}} attribute determines the media source bound to the element. \docbooktolatexcite{definitive-guide-to-html5}{}
\begin{Verbatim}[fontsize=\small]
pc.onaddstream = function (event) {
remoteView.src = URL.createObjectURL(event.stream);
};
\end{Verbatim}
Sending a media stream from the local browser to the other peer is similar. First, we capture the audio and video stream from the local multimedia devices. The result, multimedia stream, is passed as an argument to the function which adds it to the {\texttt{{RTCPeerConnection}}} object {\texttt{{pc}}}. Besides, there is a common habit to add the video of self to the page as well. It is handled by the {\texttt{{selfView}}} variable, containing a reference to another video element on the page. The example below represents the described situation:
\begin{Verbatim}[fontsize=\small]
navigator.getUserMedia({"audio": true, "video": true}, function(stream) {
selfView.src = URL.createObjectURL(stream);
pc.addStream(stream);
});
\end{Verbatim}
Closing the connection is rather straightforward -- by invoking the {\texttt{{close}}} method on the {\texttt{{RTCPeerConnection}}} object, which has been instantiated before as {\texttt{{pc}}}.
\subsection{WebRTC in various environments}
\label{idp5658368}\hypertarget{idp5658368}{}%
WebRTC is truly a new technology, not yet adopted by many web browsers. And for those which support it, the implementation may differ since the standard has not been fully defined and finished yet. Chromium browser (i.e. Chrome and Chromium) developers were the first to add WebRTC API to their products -- in 2012. At the beginning of 2013, WebRTC API was added also to the newest Firefox builds so that Chrome and Firefox can {``}talk{''} to each other.\label{idp5660144}\begingroup\catcode`\#=12\footnote{
Note that in current Firefox version (22, nightly build by the time this thesis part is created), the user has to enable {\texttt{{media.peerconnection.enabled}}} field in {\texttt{{about:config}}} to get WebRTC run.
}\endgroup\docbooktolatexmakefootnoteref{idp5660144} \docbooktolatexcite{webrtc-interop}{} Opera browser also takes part in this initiative, yet no existing official claim of support has been released. All mentioned browsers used the same back-end implementation, hosted at {\textless}\url{http://www.webrtc.org}{\textgreater}. In theory, this back-end implementation (written in C++) can be built into any application to support WebRTC, not only a web browser.
For the web browsers which do not support WebRTC yet, several attempts have been made to add its functionality via browser plugins. It can be useful as a temporary improvement for the experienced users but one can never rely on the user having the plugin installed. The library providing WebRTC functionality for Safari, Opera, IE and older versions of Firefox is called webrtc4all. \docbooktolatexcite{webrtc4all}{} However, it is still a proprietary solution, adding another prefix ({\texttt{{w4a}}}) to the world of WebRTC JavaScript APIs.
There are also other parties which implement WebRTC in their own way, keeping to the API, more or less. One of such initiatives is Ericsson Browser (called just Browser), which claimed to be the first browser to implement WebRTC. Ericsson Browser uses a different back-end implementation but it tries to be in accordance with the official API. \docbooktolatexcite{webrtc-ericsson}{}
Microsoft came up with a different approach. Even though Internet Explorer does not support WebRTC, Microsoft invented their own API standard proposal. \docbooktolatexcite{webrtc-ms}{} It differs from the {``}official{''} API mainly in the extended possibilities to control more aspects of WebRTC communication, including low level {``}transport{''} layers. Since none of the standards has been finished yet, it is possible (and probable) that the final API definition will end up somewhere in between.
One of the logical reasons why Microsoft does want to intervene in the process of defining WebRTC API as much as possible is Skype\index{Skype}. Skype has been bought by Microsoft some time ago and, as everything nowadays, is planned to be available in the web browser. Microsoft seems to bet on WebRTC technology. \docbooktolatexcite{webrtc-skype}{} As a pleasant side-effect of Skype working on top of WebRTC, it would be finally possible to play along with Skype with other technologies. Of course, the technology barrier is the smaller one compared to licences and legal regulations, but it is another story.
% ------------------------
% Section
\section{Other RTC solutions}
\label{chap-high-level-rtc}\hypertarget{chap-high-level-rtc}{}%
Actually, there are several other, mostly higher-level solutions for achieving the bidirectional (and thus real-time) communication in a web browser. Some of them use the HTTP requests described above (such as Bayeux), some of them are based on WebSockets (OpenTok) or even WebRTC (OpenTok), and several of them are built completely independently, installed as web browser plugins and thus behaving as separate runtime platforms (Adobe Flash, Google Talk). And, to be precise, some of the frameworks are built on top of the others, for example OpenTok uses Adobe Flash in some cases. See the sections below to understand each of the technologies.
\subsection{Adobe Flash\index{Flash} and Microsoft Silverlight}
\label{idp5673952}\hypertarget{idp5673952}{}%
Among all, one of the most widespread technologies is Adobe Flash.\label{idp5675952}\begingroup\catcode`\#=12\footnote{
http://www.adobe.com/software/flash/about/
}\endgroup\docbooktolatexmakefootnoteref{idp5675952} Apart from the possibility to establish a bidirectional persistent TCP connection, Flash allows the developer to create almost any graphics, animations and user interface with nearly no limits.
Nevertheless, the disadvantages of Flash are significant. First, Flash is not a native part of any web browser. Until recently, it had to be installed manually as a plugin. Now, it is bundled and shipped with the Chromium-based browsers (Chrome, Chromium), but it is still an external plugin. \docbooktolatexcite{flash-bundled-with-chrome}{} Another drawback of Flash is the lack of support on mobile devices. Apple has been clear about it: iPhones and iPads have never supported Flash technology and it is not likely to change in the future. \docbooktolatexcite{iphone-flash-support}{} Android devices supported Flash at first, but later Adobe quit Google Play. \docbooktolatexcite{android-flash-support}{}
There are many other technologies similar to Adobe Flash, but they all suffer the same pain. Because they are installed as proprietary plugins, a developer can never be sure the application will run in any environment. This concerns the technologies as Microsoft Silverlight\index{Silverlight}\label{idp5680656}\begingroup\catcode`\#=12\footnote{
http://www.microsoft.com/silverlight/
}\endgroup\docbooktolatexmakefootnoteref{idp5680656} or Adobe AIR\index{AIR}\label{idp5682304}\begingroup\catcode`\#=12\footnote{
http://www.adobe.com/products/air.html
}\endgroup\docbooktolatexmakefootnoteref{idp5682304} (even though AIR has been intended to be browser-independent platform).
To sum it up, Adobe Flash is often used as a fallback for older browsers, running on non-mobile devices, which do not support WebRTC yet. It has been used as a fallback for the video calls in Talker application. However, creating a new application based on Flash (as a {\em{primary}} technology) in 2013 is not a good idea.
\subsection{OpenTok video call library}
\label{chap-opentok}\hypertarget{chap-opentok}{}%
OpenTok is a high-level video call platform for building real-time communication applications in a web browser. It allows the developer to easily set up a video call session, using an OpenTok server as a mediator. OpenTok is available as a free library for one-to-one calls. If more users were participating in one call, a payed subscription would have to be bought. \docbooktolatexcite{opentok-pricing}{}
From the technical point of view, there are two versions of the client-side OpenTok library. The first one is built on top of Adobe Flash, using it for establishing a connection with the OpenTok server and handling the media stream. The other one uses WebRTC (and it is therefore limited to the browsers which support it). Unfortunately, the two versions are not mutually compatible so it is impossible for one user to have a Flash version and for the other to use WebRTC when communicating together.
Each OpenTok session is identified by a session ID and every user who wants to join the session must hold a token generated for the session. Therefore, OpenTok contains a server-side library (SDK) for issuing and managing the session IDs and tokens, available for the most commonly used server-side languages. \docbooktolatexcite{opentok-server-side-lib}{}
OpenTok library is the tool that has been used for implementing video calls in the Talker application, hence the library API usage is described separately in \hyperlink{chap-video-calling}{Section {\ref{chap-video-calling}}}. The main advantage of OpenTok is a low entry barrier and a simple API that can be used out of the box. On the other hand, it is still a proprietary tool and every application built with OpenTok heavily relies on the third party service. Nevertheless, we have not experienced any serious problems during almost one year running OpenTok in Celebrio.
\subsection{Bayeux protocol\index{Bayeux}}
\label{idp5690048}\hypertarget{idp5690048}{}%
Bayeux is one of the higher level protocols designed specifically for the bidirectional communication between a web browser and a server, primarily intended to work on top of HTTP. The idea of communication is almost the same as in case of HTTP long polling. To transfer the messages from a server to a client, the server holds the request and responds only when there is an available message. Sending the data from a client to a server is straightforward, an ordinary HTTP request is sufficient.
Apart from HTTP requests, Bayeux defines the structure for the transmitted data, contained in the body of HTTP requests/responses. Each message has to be in form of JSON, containing structured data such as a channel name, a client ID and of course the transmitted data itself. \docbooktolatexcite{bayeux}{} Although Bayeux provides an interesting way to communicate, it has not been used in this thesis. The main reason for choosing XMPP is its better interoperability with other existing services and ability to easily communicate with the clients not running in a web browser.
\subsection{SignalR framework\index{SignalR}}
\label{idp5694384}\hypertarget{idp5694384}{}%
SignalR is a framework taking care of both a client (a web browser) and a server side of the application. SignalR is designed for the .NET platform on a server side, so the web application should be powered by the ASP.NET framework. SignalR abstracts the user from finding out which technology the web browser, in which an application runs, supports. Instead, SignalR provides an API for sending the messages and handling the incoming ones. Internally, it uses WebSockets for establishing the connection. When the WebSocket API is not available (for example in IE9, which is funny considering that SignalR is Microsoft-platform-based technology), it tries Server Sent Events and then falls back on the long polling technique. \docbooktolatexcite{signalr}{} To take it short, SignalR provides an envelope for the client-side WebSockets and HTTP long polling techniques, with ASP.NET API for handling the messages on a server.
\subsection{Real-time communication tools from Google\index{Google}}
\label{chap-google-tools}\hypertarget{chap-google-tools}{}%
There are three topics briefly mentioned in this section, all developed by Google -- Channel API, the Google Talk chat service and Google Hangouts API, all of which serve as RTC tools or even complete applications.
The first one, Google Channel API\index{Channel API}, is a tool for creating persistent connections between a browser-based client and a server. It simply wraps common Comet techniques (i.e. HTTP long polling or streaming) with convenient methods for binding the handlers to incoming events, usually messages. Nevertheless, the Channel API is part of Google App Engine and it is intended to be used only within the web applications built on it. Although it is possible to use only the client side of the Channel API in a custom web application, using only a half of the tool does not seem to be very helpful.
The other tool (more precisely standalone application) from Google is a chat service. It became natural part of Gmail, serving for quick and more informal messages than e-mail. It is a service built on XMPP protocol, not dissimilar to Talker application. Thanks to using XMPP, it is possible to connect with a Gmail account from various IM clients such as Pidgin or Miranda. Recently, Google added the support for video calls in a web browser -- originally powered by a browser plugin that had to be installed, now switching to WebRTC.
The last topic in this section is about Hangouts API. Hangouts is a platform for creating group video chat, running within a web application. Although Hangouts offer neat user experience, they can only be customised to a small extent. In other words, it is possible to add a Hangout to an arbitrary web application but its look and feel cannot be altered. This is an insurmountable problem for Celebrio Talker since it must strictly keep up with the user interface of Celebrio, the system it is built into.
Recently (on May 16, 2013), Google announced transforming {\em{Hangouts}}\index{Hangouts} to a complex real-time communication platform, incorporating the Google Talk service, previously mentioned {``}old{''} Hangouts and Google+ Messenger chat application. \docbooktolatexcite{new-hangouts}{} The new Hangouts platform is currently available either as a standalone Android/iOS application or an extension for Google Chrome, so it is not purely a technology of a web browser. On the other hand, the browser plugin allows the user to use new Hangouts at any web page, offering very simple user interface for both a text chat or a video call.
The new platform will preserve XMPP as an underlying protocol, so it will basically extend the Google Talk service. However, Google declared one important constraint. Their XMPP service will no longer support a server-to-server communication. \docbooktolatexcite{new-hangouts}{} More information about interoperability problems is to be found in \hyperlink{chap-interoperability-problems}{Section {\ref{chap-interoperability-problems}}}.
% -------------------------------------------------------------
% Chapter Extensible Messaging and Presence Protocol
% -------------------------------------------------------------
\chapter{Extensible Messaging and Presence Protocol}
\label{chap-xmpp}\hypertarget{chap-xmpp}{}%
Extensible Messaging and Presence Protocol (XMPP) technologies were invented by Jeremie Miller in 1998. \docbooktolatexcite{xmpp-the-definitive-guide}{} It is one of the most widespread technologies for instant messaging (IM),\label{idp5710960}\begingroup\catcode`\#=12\footnote{
Actually, the IM client or even the technology itself is sometimes called {``}Instant Messenger{''}. This term is registered as a trademark by AOL company. \docbooktolatexcite{aol-trademarks}{}
}\endgroup\docbooktolatexmakefootnoteref{idp5710960} i.e. exchanging the text or multimedia data between several endpoints. The {``}native{''} implementation of XMPP works right on top of the TCP protocol: XMPP endpoint (called client as it represents the first actor in the client-server architecture) opens a long-lived TCP connection. Then, both the client and the server negotiate and open XML streams, so there is one stream in each direction. \docbooktolatexcite{xmpp-the-definitive-guide}{} When the connection is established, both the client and the server can push any changes as XML elements to the stream and the other side obtains them immediately. The usual XMPP clients are standalone applications able to open a TCP connection and listen to the stream opened by the server.
XMPP stands for communication protocol handling not only sending and receiving the messages, but also presence notification, contact list (roster) management and others. The architecture is distributed and decentralized. There is no central or top level XMPP server. Anyone can run an XMPP server, very similarly as an HTTP or FTP server. Identification and recognition on the network is also similar -- XMPP relies on Domain Name System (DNS), so that every server is identified via string domain name with arbitrary subdomain level (e.g. xmpp.example.com or just example.org). \docbooktolatexcite{xmpp-the-definitive-guide}{}
The user name, called Jabber ID (or shortly as JID), has the same structure as an e-mail address so the user name is followed by {\texttt{{@}}} and the server domain name. This rule also guarantees that every XMPP user is registered on a certain server. If there is a message or notification for the particular user, her {``}home{''} server is looked up first, the message is transferred to that server and then, the respective server (that the user belongs to) is responsible for delivering the message to the user or saving it until she logs in. Therefore, two possible connection types take place in XMPP. Client-to-server communication is the first one, when the clients can talk only to their {``}home{''} server. Then, server-to-server communication is designed for delivering the messages to users at different domains. When two servers are exchanging any data, a direct connection to the target server has been established. This approach is dissimilar to the way SMTP servers exchange e-mail messages. It helps to prevent address spoofing or spamming. \docbooktolatexcite{xmpp-the-definitive-guide}{}
XMPP has been chosen as the communication protocol for this thesis topic -- Talker application. XMPP has been verified by big companies such as Google or Facebook. In addition, the openness of the protocol allows a very easy connection to existing wide communication networks, using their server infrastructure, client software and an existing user base.
% ------------------------
% Section
\section{XML Stanzas -- XMPP building blocks}
\label{chap-xmpp-stanzas}\hypertarget{chap-xmpp-stanzas}{}%
As mentioned in the introduction to this chapter, when an XMPP connection is established, two streams are opened and both the client and the server can send any XML elements at any time. The meanings of various pieces of XML are described in this section.
There are three basic XML elements that every XMPP communication consists of. Those are {\texttt{{\textless{}message/\textgreater{}}}}, {\texttt{{\textless{}presence/\textgreater{}}}} and {\texttt{{\textless{}iq/\textgreater{}}}} (which stands for an Info/Query), altogether called {\texttt{{Stanza}}}\index{stanza}s. \docbooktolatexcite{xmpp-the-definitive-guide}{} Each stanza element usually contains several attributes which specify the exact meaning of it. An actual content is usually placed in the element body. An example message\index{XMPP!message} stanza looks as follows:
\begin{Verbatim}[fontsize=\small]
<message from="[email protected]/talker"
to="[email protected]"
type="chat">
<body>Hello, how are you?</body>
</message>
\end{Verbatim}
The attributes {\texttt{{from}}} and {\texttt{{to}}} identify a sender and a recipient of the message. Actually, the value set to the {\texttt{{from}}} attribute does not matter at all, it can even be left out. The {``}home{''} XMPP server the sender is registered at (it would be the one running at {\texttt{{celebrio.cz}}}, in the previous example) has to set the from attribute according to the real user name and the domain name. This is one of the interesting defensive mechanisms distinguishing XMPP from other communication protocols such as SMTP.
You might have noticed that the {\texttt{{from}}} field does not contain only an XMPP address. There is a {\em{resource}}\index{XMPP!resource} identifier following the domain name. Since it is possible to connect multiple times with the same user name, the resource makes a difference between the sessions of the same user. In addition, it is useful information for other peers the user might communicate with. It is usual to set the resource field according to a place the user logs from or a device she uses.
Receiving a message stanza is not acknowledged by the recipient so the sender has no information whether it has been delivered successfully or not. On the contrary, an IQ stanza can be used in case the sender requires an answer -- it usually constitutes a {\em{query}}. The best example is obtaining a contact list -- in XMPP terms called {\em{a roster}}\index{XMPP!roster}:
\begin{Verbatim}[fontsize=\small]
<iq id="123456789" type="get">
<query xmlns="jabber:iq:roster"/>
</iq>
\end{Verbatim}
As an answer, a server sends the result as another IQ stanza (notice that the {\texttt{{id}}} attribute remains the same while the {\texttt{{type}}} attribute changed): \docbooktolatexcite{xmpp-the-definitive-guide}{}
\begin{Verbatim}[fontsize=\small]
<iq id="123456789" type="result">
<query xmlns="jabber:iq:roster">
<item jid="[email protected]"/>
<item jid="[email protected]"/>
<item jid="[email protected]"/>
</query>
</iq>
\end{Verbatim}
\subsection{Subscription mechanism}
\label{chap-xmpp-subscriptions}\hypertarget{chap-xmpp-subscriptions}{}%
The third letter of the abbreviation XMPP stands for the {\em{presence}}\index{XMPP!presence}, in practice represented by sending {\texttt{{presence}}} stanzas. It is one of the important signs of a real-time communication (not only in XMPP but overall) that the peers can see each other's presence -- whether the other side is online, alternatively whether it is available or busy. Even though such functionality is generally desired, it might slip to a huge privacy breach when anyone could see your presence status.
XMPP solves the potential privacy problem with a subscription\index{XMPP!subscriptions} mechanism. Each user has full control over the peers who can monitor her online status. If anyone else wants to track a presence status, a subscription request must be sent. When received, the user decides whether a permission will be granted or not. Unfortunately, the subscription request can be blocked by the respective {``}home{''} XMPP server of the user we try to reach. To be specific: there are two widely used XMPP providers -- {\texttt{{jappix.com}}} and {\texttt{{gmail.com}}}. If a user of the former sends a subscription to another user registered at the latter, it is not guaranteed it will be delivered (actually, it is not, see \hyperlink{chap-interoperability-problems}{Section {\ref{chap-interoperability-problems}}} for details). It is one of the drawbacks of an opened protocol that one can never be sure that the other party co-operates.
% ------------------------
% Section
\section{XMPP over BOSH}
\label{chap-bosh}\hypertarget{chap-bosh}{}%
Having described XMPP as a communication protocol over TCP, it might be unclear how it is related to the topic of this thesis. XMPP is a nice and mature technology and it would be nice to use it in a web browser, but it does not support communication over HTTP. Fortunately, XMPP offers many extensions (indeed, the first letter X stands for {``}extensible{''}) providing an additional functionality. In fact, we speak about XMPP {\em{Extension Protocols}} and thus they are called XEPs\index{XEP}.
This section briefly describes one of the XEP extensions called BOSH\index{BOSH} (XEP-0124), designed for transferring XMPP over HTTP.\label{idp5756144}\begingroup\catcode`\#=12\footnote{
In fact, there are two more XEPs related to HTTP. First of them, XEP-0025: Jabber HTTP Polling, has been replaced by BOSH. It is obsolete and recommended not to be used any longer. \docbooktolatexcite{xep-0025}{} The other one is XEP-0206: XMPP Over BOSH. It is currently used as a standard but it constitutes just a supplement for BOSH (XEP-0124). XEP-0206 describes mainly the session creation and an authentication process in BOSH. \docbooktolatexcite{xep-0206}{}
}\endgroup\docbooktolatexmakefootnoteref{idp5756144} \docbooktolatexcite{xep-0124}{} The idea behind this extension is very simple: BOSH uses an HTTP long polling technique (described in \hyperlink{chap-http-long-polling}{Section {\ref{chap-http-long-polling}}}) to imitate a bidirectional TCP communication necessary for XMPP. We can imagine BOSH (itself a protocol) as a middle layer protocol or a wrapper protocol between HTTP (only capable of sending requests from a client to a server) and XMPP (understanding only the XML stanzas). BOSH requests and responses are subset of all conceivable HTTP requests or responses (they include all HTTP features such as an HTTP method in a request or a status code in a response). The constraint defined by BOSH protocol restricts the body part to have a specific structure.
Each BOSH request or response body should be valid XML, which wraps up XMPP stanzas in a special {\texttt{{\textless{}body/\textgreater{}}}} element. For the purposes of the protocol itself, it is also possible to send just the {\texttt{{body}}} element with no child (XMPP) nodes -- for example when starting a session or reporting an error. So, the XMPP part of the communication is clearly separated from the BOSH part: the former is represented by payload elements inside the {\texttt{{body}}}, the latter consists of {\texttt{{body}}} attributes. Have a look at an example of a BOSH request: \docbooktolatexcite{xep-0124}{}
\begin{Verbatim}[fontsize=\small]
POST /webclient HTTP/1.1
Host: httpcm.example.com
Accept-Encoding: gzip, deflate
Content-Type: text/xml; charset=utf-8
Content-Length: 188
<body rid='1249243562'
sid='SomeSID'
xmlns='http://jabber.org/protocol/httpbind'>
<message to='[email protected]'
xmlns='jabber:client'>
<body>Good morning!</body>
</message>
<message to='[email protected]'
xmlns='jabber:client'>
<body>Hey, what's up?</body>
</message>
</body>
\end{Verbatim}
As you can see, the request header is an ordinary HTTP header. So much for the HTTP part. The request body consists of a {\texttt{{body}}} element, which represents a BOSH layer, along with the element attributes (plus the namespace). {\texttt{{sid}}} attribute represents a {\em{session}} ID, identifying the connection. It should not be mutated during one session. The other one, {\texttt{{rid}}}, stands for an ID of the {\em{request}} and it gets incremented with each request. Ultimately, the child nodes of the {\texttt{{body}}} element represent the XMPP stanzas, which would be two message stanzas in this case. It is obvious that multiple XMPP stanzas can be transmitted via a single BOSH request.
Those {\texttt{{sid}}} and {\texttt{{rid}}} properties are rather important in BOSH. The security is ensured just by that pair of strings in BOSH, because of the stateless nature of HTTP. If an attacker stole {\texttt{{sid}}} and {\texttt{{rid}}}, he could communicate with a server on behalf of the actual user. On the other hand, there is one big advantage. A~connection can be established and handed over from one point to another. For example, a web server can initiate a connection (carry out a handshake) and then only {\texttt{{sid}}} and {\texttt{{rid}}} are passed to a client (a web browser), which can continue communicating. This approach is used in the Talker application so the user credentials (JID and a password) are not sent to a browser at all.
BOSH protocol is an important part of the Talker application implemented as a programming part of this thesis. Despite bearing the disadvantages of an HTTP bidirectional communication, as described before, it is the only reliable technology nowadays. There are several mature client-side libraries using BOSH (such as Strophe.js we used) and it is also easy to install, configure and run a BOSH extension on the server side. An HTTP server usually hands over a BOSH HTTP request to an XMPP server with a relevant module enabled, as described in \docbooktolatexcite{setting-up-bosh}{} and depicted in \hyperlink{fig-xmpp-server-plugins}{Figure {\ref{fig-xmpp-server-plugins}}}. However, the server side XMPP is not the topic of this thesis so it is not further discussed.
% ------------------------
% Section
\section{XMPP over WebSockets}
\label{chap-xmpp-ws}\hypertarget{chap-xmpp-ws}{}%
Since there is a possibility to transmit arbitrary data from a web browser application to a server via WebSockets, it could be handy to transfer XMPP stanzas using WS as well. Using WebSockets saves a considerable amount of overhead and fixes several issues that can happen with BOSH (for example unreliability of HTTP). It generally works, yet the programmer should be wary of several pitfalls that WebSockets bring. First, a server side must accept a WebSockets connection. Usually, XMPP servers do provide such functionality through addons or modules.\label{idp5777600}\begingroup\catcode`\#=12\footnote{
Additional module for a Prosody server has been used as well when running the Talker application. The process of the installation includes downloading the module, adding it to the path that Prosody searches for modules. Then, it must be enabled in a configuration file. Moreover, {\texttt{{luajit}}} and {\texttt{{liblua5.1-bitop0}}} packages had to be downloaded for the module to work correctly (assuming Debian/Ubuntu on the server side).
}\endgroup\docbooktolatexmakefootnoteref{idp5777600} Provided that a WebSocket extension to the XMPP server is running on localhost, using WS to connect to the server is as simple as follows:
\begin{Verbatim}[fontsize=\small]
var ws = new WebSocket("ws://localhost:5280/xmpp-websocket/", "xmpp");
// XMPP handshake takes place here, omitting in the example
ws.send(
"<message to='[email protected]' xmlns='jabber:client'> \
<body>Hello, lab!</body> \
</message>"
);
\end{Verbatim}
Probably the most important difference compared to BOSH is that every WebSocket message (i.e. one chunk of incoming or outcoming communication -- can be compared to BOSH request) can contain only one XMPP stanza. \docbooktolatexcite{xmpp-over-websockets}{} It means that a client or a server cannot send more XMPP messages packed together, even if they are available by the time a WS message is sent.
The main drawback of using XMPP over WebSockets is still partial lack of support both in the web browsers (which includes WS support itself and the JavaScript XMPP libraries) and on the XMPP servers. Nevertheless, there is a huge trend of implementing it at all sides.\label{idp5783056}\begingroup\catcode`\#=12\footnote{
You might want to have a look at some of the current discussions concerning client-side (i.e. JavaScript) libraries:
{\textless}\url{https://github.com/metajack/strophejs/issues/68}{\textgreater}
{\textless}\url{https://github.com/metajack/strophejs/pull/95}{\textgreater}
{\textless}\url{http://stackoverflow.com/questions/1850162/}{\textgreater}
}\endgroup\docbooktolatexmakefootnoteref{idp5783056} Both BOSH and WS require additional plugins to be installed on a server since they are not part of XMPP. The plugins basically handle a connection (or a request, in case of BOSH) and hand over a pure XMPP message to the core XMPP server. In case of BOSH, the request can be sent to the plugin directly when the port is specified, or redirected by default HTTP server. The schema of transitions between the users at one side (using either WS, direct BOSH with specified plugin port or BOSH request to default HTTP port) and a server side is depicted in \hyperlink{fig-xmpp-server-plugins}{Figure {\ref{fig-xmpp-server-plugins}}}.
% figure ------------------------------------------------------
\begin{figure}[hbt]
\hypertarget{fig-xmpp-server-plugins}{}%
\begin{center}
{{\includegraphics[]{vp/bosh-ws-xmpp}}\hypertarget{idp5788864}{}%
\label{idp5788864}
}
{{\caption[{Communicating to XMPP server with WS and BOSH plugins}]{{{Communicating to XMPP server with WS and BOSH plugins}}}\label{fig-xmpp-server-plugins}}}
\end{center}
\end{figure}
% ------------------------