-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathindex.html
executable file
·668 lines (358 loc) · 39.8 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
<!DOCTYPE HTML>
<html>
<head>
<meta charset="utf-8" />
<title>Flatorize</title>
<link rel="stylesheet" type="text/css" href="prettify/prettify.css">
<link rel="stylesheet" type="text/css" href="style.css">
<script type="text/javascript" src="log.js"></script>
<script type="text/javascript" src="flatorize.js"></script>
<script type="text/javascript" src="examples.js"></script>
<script type="text/javascript" src="expl.js"></script>
<script type="text/javascript" src="expl/matmulrows_loops.js"></script>
<script type="text/javascript" src="expl/matmulrows_zip.js"></script>
<script type="text/javascript" src="expl/matmulrows_zip_flatorize.js"></script>
<script type="text/javascript">generate_small_functions();</script>
</head>
<body>
<div id="top"></div>
<ul class="screen-hidden">
<li>github: <a href="https://github.com/glathoud/flatorize">https://github.com/glathoud/flatorize</a></li>
<li>[1] <a href="http://glat.info">http://glat.info</a></li>
<li>[2] <a href="http://asmjs.org">http://asmjs.org</a></li>
</ul>
<a href="https://github.com/glathoud/flatorize" class="print-hidden">
<img style="position: fixed; top: 0; right: 0; border: 0; margin: 0; padding: 0;"
src="https://s3.amazonaws.com/github/ribbons/forkme_right_green_007200.png"
alt="Fork me on GitHub">
</a>
<h2>Flatorize: Generate fast, flat, factorized code for mathematical expressions</h2>
<div class="fixed-br print-hidden"><a href="#top">Back to the top</a></div>
<p style="text-align:right">by <a href="http://glat.info">Guillaume Lathoud</a><span class="print"> [1]</span>, April 2013, October 2014</p>
<p><code>flatorize</code> lets you write well-encapsulated math code - i.e. many small functions - and automatically generates flat, <strong>very fast</strong> JavaScript code.</p>
<p>Speedups exceed +1000% in many cases, including matrix multiplication and the Discrete Fourier Transform (DFT).</p>
<p>Additionally:</p>
<ul>
<li>A library for <a href="lib/flatmat_speedtest.html">fast linear algebra</a>.</li>
<li>A plugin for <a href="asmjs.html"><code>asm.js</code></a> generates even faster JavaScript using <code>TypedArray</code>.</li>
<li>Plugins for the <a href="c.html">C</a> and <a href="d.html">D</a> languages are also available.
<ul><li>On the DFT task: <code>flatorize_d</code> <a href="test/speed_test_c_fftw/dftreal.html#i7_dell_ubuntu15_04">faster</a> than state-of-the-art FFTW.</li></ul>
</li>
</ul>
<p>Tests on the command line:</p>
<ul>
<li>Unit tests for <code>asm.js</code>, C and D (source on <a href="https://github.com/glathoud/flatorize/tree/master/test">GitHub</a>).</li>
<li><a href="test/speed_test_c_fftw/dftreal.html">Speed tests</a> using JavaScript, <code>asm.js</code>, C, D… (<a href="https://github.com/glathoud/flatorize/tree/master/test/speed_test_c_fftw/">GitHub</a>).</li>
</ul>
<p>External links:</p>
<ul>
<li>Source files on <a href="https://github.com/glathoud/flatorize">GitHub</a>.</li>
<li>Budapest 2014 nodebp/bpjs meetup <a href="http://glat.info/bpjs2014/index.html">slides</a> & <a href="http://www.youtube.com/watch?v=FxNNSvNDbW8">video</a>.</li>
</ul>
<h3 class="anchorable" id="editable-use-case">Update 2019-04-26: Editable use case<a href="#editable-use-case" class="anchor">#</a></h3>
<p>Here is an editable practical use case, to extract the rotation quaternion of a smartphone,
which should illustrate well the various basic commands of <code>flatorize</code>:
<a href="http://glat.info/hafi/#o=7KjcK5K99varJF%5CJXJflatorizeQZnZnvarJfJXJF%5C('alpha5float9beta5float9gamma5float-4%3C5H3JfloatW'9%3CInfo2Gen)QZnoutput.inner%5BTMLJXJf.getDirect()!''QZnZnfunctionJ%3CInfo2Gen(alpha9beta9gamma)Zn7Zn~1returnJ%3CInfo2GenImpl(alpha9beta9gamma)QZn6ZnZnfunctionJ%3CInfo2GenImpl(alpha9beta9gamma)ZnYYJflatorizeJexpressionJofJtheJ%3CuaternionJimplementationJgivenJbyZnYYJhttps5YYw3c.github.ioYdeviceorientationY%23worked-example-2Zn7Zn~1varJdegtoradJXJMath.PIJYJ1%400.0Zn~19~0radtodegJXJ1%400.0JYJMath.PIZnZn~19JJ_xJXJF%5C.expr(Jbeta9J'%3F'9Jbeta9J'*'9Jdegtorad9J'5'9J0.0J)Zn~19JJ_yJXJF%5C.expr(Jgamma9J'%3F'9Jgamma9J'*'9Jdegtorad9J'5'9J0.0J)Zn~19JJ_zJXJF%5C.expr(Jalpha9J'%3F'9Jalpha9J'*'9Jdegtorad9J'5'9J0.0J)ZnZn~19JJcosJXJF%5C('a'9JfunctionJ(a)J7JreturnJF%5C.expr('Math.cos('9Ja9J')'J)QJ6)Zn~19JJsinJXJF%5C('a'9JfunctionJ(a)J7JreturnJF%5C.expr('Math.sin('9Ja9J')'J)QJ6)ZnZn~19JhalfJXJF%5C('a'9Jfunction(a)7returnJF%5C.expr(a9'Y'92.0)Q6)Zn~1Zn~19Jc%3DJXJcos(Jhalf(J_xJ)J)Zn~19Jc%2FJXJcos(Jhalf(J_yJ)J)Zn~19Jc%5CJXJcos(Jhalf(J_zJ)J)Zn~19Js%3DJXJsin(Jhalf(J_xJ)J)Zn~19Js%2FJXJsin(Jhalf(J_yJ)J)Zn~19Js%5CJXJsin(Jhalf(J_zJ)J)ZnZn~19JwJXJF%5C.expr(Jc%3D9J'*'9Jc%2F9J'*'9Jc%5C9J'-'9Js%3D9J'*'9Js%2F9J'*'9Js%5C)Zn~19JxJXJF%5C.expr(Js%3D9J'*'9Jc%2F9J'*'9Jc%5C9J'-'9Jc%3D9J'*'9Js%2F9J'*'9Js%5C)Zn~19JyJXJF%5C.expr(Jc%3D9J'*'9Js%2F9J'*'9Jc%5C9J'!'9Js%3D9J'*'9Jc%2F9J'*'9Js%5C)Zn~19JzJXJF%5C.expr(Jc%3D9J'*'9Jc%2F9J'*'9Js%5C9J'!'9Js%3D9J'*'9Js%2F9J'*'9Jc%5C)Zn~3Zn~19J%3CJXJHw9x9y9zWZn~1QZn~1returnJ%3CQZn6ZnK9KhtmlK5KqscriptJtypeXZKtextYjavascriptZKJsrcXZKhttp5YYglat.infoYflatorizeYflatorize.jsZK4qYscript4ZnZnqpre4qcodeJidXZKoutputZK4qYcode4qYpre4K6">click here to show, edit, and play with it</a>.</p>
<h3 class="anchorable" id="contents">Contents <a href="#contents" class="anchor">#</a></h3>
<ul class="contents-ul"></ul>
<h3 class="anchorable" id="tldr">Idea <a href="#tldr" class="anchor">#</a></h3>
<p>Think like this:</p>
<pre>Matrix multiplication a * b: Each row of a with each column of b</pre>
<p>...write code like you think, without caring about performance...</p>
<pre class="prettyprint lang-python"># Python c = a * b
c = [[ sum(x*y for x,y in zip(ra,cb)) for cb in zip(*b) ] for ra in a ]
</pre>
<pre class="prettyprint lang-js">// JavaScript
function matmulrows_zip( a, b )
{
return a.map( function (ra) {
return zip.apply( null, b ).map( function (cb) {
return sum(
zip( ra, cb )
.map( function (xy) { return xy[ 0 ] * xy[ 1 ]; } )
);
} );
} );
}
// --> Well encapsulated code, but slow due to function call overhead.
</pre>
<p>...and let <code>flatorize</code> transform the above into <strong>very fast</strong> JavaScript code:</p>
<pre class="prettyprint lang-js">// Generated "flatorized" code == factorized + flattened
//
// - factorized: avoid repeating computations.
// - flattened: no function call.
<span id="tldr-generated-js"></span></pre>
<h3 class="anchorable" id="details-howto">Details (HOWTO)<a href="#details-howto" class="anchor">#</a></h3>
<p>The two core ideas are to <a href="#factorize">factorize</a> and to <a href="#flatten">flatten</a> the implementation of mathematical expressions, hence the name <code>flatorize</code>.</p>
<p>Consider the function <code>f</code> of complex numbers <code class="prettyprint lang-js">a,b,c</code>:</p>
<pre class="prettyprint lang-js">function f(a,b,c) { return csub( csub(a,cadd(b,c)), cadd(b,c) ); };</pre>
<h4 class="anchorable" id="factorize">Factorize <a href="#factorize" class="anchor">#</a></h4>
<p>To spare CPU, one can modify <code>f</code>
by evaluating <code class="prettyprint lang-js">add(b,c)</code>
only once:</p>
<pre class="prettyprint lang-js"><code>function f(a,b,c) {
var bc = add(b,c);
return sub( sub(a,bc), bc );
}
</code></pre>
<h4 class="anchorable" id="flatten">Flatten <a href="#flatten" class="anchor">#</a></h4>
<p>Small "building block" functions like <code class="prettyprint
lang-js">add</code> or <code class="prettyprint
lang-js">sub</code> are useful to <em>write</em> code, but <em>at runtime</em> the
CPU spends more time in function overhead than just doing the simple operation(s) in them.<p>
<p>In more complex cases like recursive expressions, the function overhead becomes even bigger.</p>
<p>One can eliminate function overhead by inlining each small function, that is replacing each small function with its implementation:</p>
<pre class="prettyprint lang-js"><code>function f(a,b,c) {
var bc = b+c;
return (a - bc) - bc;
}
</code></pre>
<p>The CPU performance should be pretty good at this point.</p>
<p>One could try to further optimize,
e.g. writing <code class="prettify lang-js">a-2*bc</code>, which
amounts to mathematical expression simplification. This can be quite
complicated, and it is not sure how much CPU speed can be gained this
way - especially compared to function overhead elimination -
so if we ever attempt simplification, we'll keep it very simple.
</p>
<h4 class="anchorable" id="usage">Usage: <code>expr</code>, <code>part</code> and <code>flatorize</code> <a href="#usage" class="anchor">#</a></h4>
<p>
<code class="prettyprint lang-js">flatorize('a,b,c', function f(a,b,c) { ... })</code> returns a new function, obtained by:
<ol>
<li>calling <code class="prettyprint lang-js">f('a','b','c')</code>, which returns an expression,</li>
<li>factorizing and flattening the expression,</li>
<li>generating the new code for the expression.</li>
</ol>
</p>
<p><code class="prettyprint lang-js">f(a,b,c)</code> can use 3 ways to build and return an expression:</p>
<ul>
<li><code class="prettyprint lang-js">flatorize.expr(a,'+',b)</code> defines the code expression <code class="prettyprint lang-js">a+b</code>.</li>
<li><code class="prettyprint lang-js">flatorize.part(arr,3)</code> or <code class="prettyprint lang-js">flatorize.part(obj,'prop')</code> describe the extraction of a property value from an object (array or not).</li>
<li><code class="prettyprint lang-js">[ 'a', anExpr ]</code> is an array expression composed of 2 sub-expressions. Similary an object <code class="prettyprint lang-js">{ a : anExpr }</code> can be returned.
</li>
</ul>
<h4 class="anchorable" id="example">Example: complex numbers <a href="#example" class="anchor">#</a></h4>
<p>We now work with complex numbers, which we represent as arrays of two numbers <code class="prettyprint lang-js">[re,im]</code>. Consider the function <code>f</code> of complex numbers <code class="prettyprint lang-js">a,b,c</code>:</p>
<pre class="prettyprint lang-js"><code>f = <script type="text/javascript">document.write(f+'')</script>;</code></pre>
<p>To produce an optimized implementation <code>f2</code> of the function <code>f</code>,
we use <a href="flatorize.js">flatorize</a> and write the following code:</p>
<pre class="prettyprint lang-js"><code>FZ = flatorize;
cadd = FZ('a,b', function (a,b) {
return cplx( FZ.expr( creal(a), '+', creal(b) ),
FZ.expr( cimag(a), '+', cimag(b) )
);
});
csub = FZ('a,b', function (a,b) {
return cplx( FZ.expr( creal(a), '-', creal(b) ),
FZ.expr( cimag(a), '-', cimag(b) )
);
});
creal = FZ('a', function (a) { return FZ.part( a, 0 ); });
cimag = FZ('a', function (a) { return FZ.part( a, 1 ); });
cplx = FZ('re,im', function (re,im) { return [ re, im ]; });
f = <script type="text/javascript">document.write(f+'');</script>
f2 = FZ('a,b,c',f);
</code></pre>
<p>All functions returned by <code class="prettyprint lang-js">flatorize</code>
can be used to build further expressions: in the above example, <code>creal</code> is used to express <code>cadd</code>, which itself is used to expressed <code>f</code>. Definition order does not matter.
</p>
<p>All functions returned by <code class="prettyprint lang-js">flatorize</code>
can also be called with values, for computation. For example, calling <code class="prettyprint lang-js">f2(a,b,c)</code> executes
the flat, factorized:</p>
<pre class="prettyprint lang-js" id="example-generated"><code><script type="text/javascript">document.write(f2.getDirect()+'')</script></code></pre>
<h3 class="anchorable" id="matmulrows">Example: matrix multiplication (arrays of rows)<a href="#matmulrows" class="anchor">#</a></h3>
<h4 class="anchorable" id="loops">LOOPS <a href="#loops" class="anchor">#</a></h4>
<p>Consider matrices expressed as 2-dimensional arrays of rows. Here is an implementation of matrix multiplication using 3 nested loops:</p>
<pre class="prettyprint lang-js">// LOOPS
<script type="text/javascript">expl_run( expl_matmulrows_loops );</script></pre>
<p>Guessing the intent of <code>matmulrows_loops</code> from its implementation alone can be hard. Concepts are <em>not</em> encapsulated into separate functions, which slows down understanding, debugging and maintenance.</p>
<h4 class="anchorable" id="zip">ZIP <a href="#zip" class="anchor">#</a></h4>
<p>Can we express the concept of matrix multiplication in a more concise and better encapsulated way? For example, in Python:</p>
<pre class="prettyprint lang-python"><code># Python
a = [ [1,2,3,4,], [5,6,7,8,], [9,10,11,12,], ]
b = [ [13,14,], [15,16,], [17,18,], [19,20,]]
# Matrix multiplication: c = a * b
# Each number in c is the dot-product
# of a row ra with a column cb
c = [[ sum(x*y for x,y in zip(ra,cb)) for cb in zip(*b) ] for ra in a ]
# zip(*b) implements the "transposed matrix"
# Result:
# >>> c
# [[170, 180], [426, 452], [682, 724]]
</code></pre>
<p>The expression is concise, shorter than the nested loops used above, and the various concepts are well encapsulated: <code>map</code>, <code>sum</code> and <code>zip</code>.</p>
<p>The reference to rows of <code>a</code> and columns of <code>b</code> is <em>explicit</em> through variables <code>ra</code> and <code>cb</code>, respectively. This can facilitate understanding.</p>
<p>A similar implementation in JavaScript requires to define <code>zip</code> and <code>sum</code>:</p>
<pre class="prettyprint lang-js">// ZIP (JavaScript)
<script type="text/javascript">expl_run( expl_matmulrows_zip );</script></pre>
<p>Great! But given the number of nested function calls involved, <code>matmulrows_zip</code> may well run much slower than <code>matmulrows_loops</code>:</p>
<p><button id="tryit_speed_matmulrows_zip" onclick="tryit_speed_matmulrows_zip( this )">Measure the speed!</button> (Feel free to do it multiple times.)</p><pre id="tryit_speed_matmulrows_zip_output">(The speed measurement can last long in some browsers.)</pre>
<p>Example of result:</p>
<pre>ZIP matrix multip.: 3.56e+4 matmulrows_zip(a,b) calls per second
LOOPS matrix multip.: 1.76e+6 matmulrows_loops(a,b) calls per second
-> speedup relative to ZIP: +4855%
</pre>
<!--p>So we have a fast but hard-to-read <code>matmulrows_loops</code>, and a concise, readable, well encapsulated but slow <code>matmulrows_zip</code>:
<pre class="prettyprint lang-js"><script type="text/javascript">document.write(expl_matmulrows_zip.matmulrows_zip)+''</script></pre-->
<h4 class="anchorable" id="zipflat">ZIPFLAT: symbols first, numbers later <a href="#zipflat" class="anchor">#</a></h4>
<p>Can we keep the nice encapsulations in <code>matmulrows_zip</code> *and* speed it up? We now write similar code using <code>flatorize</code>: instead of working on numbers, we work on symbols:</p>
<pre class="prettyprint lang-js"><script type="text/javascript">expl_run( expl_matmulrows_zip_flatorize );</script></pre>
<p>The core remains very similar. Indeed, compare the number version:</p>
<pre class="prettyprint lang-js">// ZIP
<script type="text/javascript">document.write( expl_matmulrows_zip.matmulrows_zip );</script></pre>
<p>with the new symbolic version:</p>
<pre class="prettyprint lang-js">// ZIPFLAT
<script type="text/javascript">document.write(expl_matmulrows_zip_flatorize.symbol_matmulrows_zip)</script></pre>
<p>All we had to do was to replace:</p>
<ul><li><code>sum</code> with <code>symbol_sum</code></li>
<li><code>xy[ 0 ] * xy[ 1 ]</code> with <code>FZ.expr( xy[ 0 ], '*', xy[ 1 ] )</code></li>
</ul>
<p>Here is the generated implementation:</p>
<pre class="prettyprint lang-js">// ZIPFLAT
// Generated code (function matmulrows_zip_342)
<script type="text/javascript">
var tmp = expl_matmulrows_zip_flatorize.matmulrows_zip_342.getDirect();
document.write( tmp );
document.getElementById( 'tldr-generated-js' ).textContent = tmp;
</script>
</pre>
<p class="small">(This code, as a few others below, was produced and inserted as you loaded the page.)</p>
<p>Now a look at speed:</p>
<p><button id="tryit_speed_matmulrows_zip_342" onclick="tryit_speed_matmulrows_zip_342( this )">Measure the speed!</button> (Feel free to do it multiple times.)</p><pre id="tryit_speed_matmulrows_zip_342_output">(The speed measurement can last long in some browsers.)</pre>
<p>Example of result:</p>
<pre>ZIP matrix multip.: 3.53e+4 matmulrows_zip(a,b) calls per second
LOOPS matrix multip.: 1.50e+6 matmulrows_loops(a,b) calls per second
-> speedup relative to ZIP: +4147%
ZIPFLAT matrix multip.: 1.69e+6 matmulrows_zip_342(a,b) calls per second
-> speedup relative to ZIP: +4698%
</pre>
<p>Depending on the JavaScript engine, LOOPS written by hand may run faster or slower than the generated ZIPFLAT code. However, both speeds have similar orders of magnitude.</p><p> More importantly:</p>
<blockquote class="important">We write well-encapsulated code, <code>flatorize</code> it automatically, and the generated code runs orders of magnitude faster (ZIPFLAT).</blockquote>
<p>Trade-off: the matrix dimensions must be known to generate the <code>flatorize</code>'d code.</p>
<h3 class="anchorable" id="dft">Example: Discrete Fourier Transform (DFT) <a href="#dft" class="anchor">#</a></h3>
<p>We now implement a DFT using the
Cooley-Tukey algorithm for both
real & complex inputs (option). The implementation <code>dft_ditfft2</code> shows that we can express recursive
mathematical expressions in a "natural" way, that is very close to the
original <a href='http://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm#Pseudocode'>pseudocode</a>.</p>
<pre class="prettyprint lang-js"><code>dftreal16flat = flatorize('arr', dft_exprgenF( 4, { real: true } ));
<script type="text/javascript">document.write(dft_exprgenF+'');</script>
</code></pre>
<p id="dft16-generate">Let us now test it on a "small" scale (16 point). First, let us generate the code:</p>
<p><button id="tryit_flatorize_dft16" onclick="tryit_flatorize_dft( this, 16 )">Try <code>flatorize</code>!</button> </p><p id="tryit_flatorize_dft16_output" class="generated-code-output"></p>
<p>So the code generation of a "flatorized" 16-point DFT takes very little time. This has to be done only one time.</p>
<p>Now that we have a "flatorized" implementation, let us check whether it runs faster than the original Cooley-Tukey:</p>
<p><button id="tryit_speed_dft16" onclick="tryit_speed_dft( this, 16 )">Measure the speed!</button> (Feel free to do it multiple times.)</p><pre id="tryit_speed_dft16_output">(The speed measurement can last long in some browsers.)</pre>
<p>On a linux Ubuntu 12.04, I obtained with Firefox 21 a speedup of about +2000%, and with Chrome 27 about +1250%.</p>
<h3 class="anchorable" id="push-dft1024">Pushing it: 1024-point Discrete Fourier Transform (DFT) <a href="#push-dft1024" class="anchor">#</a></h3>
<pre class="prettyprint lang-js"><code>dftreal1024flat = flatorize('arr', dft_exprgenF( 10, { real: true } ));</code></pre>
<p>Since the recursion depth \(log_2{1024}=10\) starts to be quite significant, one can wonder how long the call to `flatorize` itself will take.</p>
<p><button id="tryit_flatorize_dft1024" onclick="console.profile('xxxdft1024gen');tryit_flatorize_dft( this, 1024 );console.profileEnd('xxxdft1024gen');">Try <code>flatorize</code>!</button> (This one can last a few seconds.)</p><p id="tryit_flatorize_dft1024_output" class="generated-code-output"></p>
<p>Now, let us measure how fast the flatorized DFT1024 implementation runs, as compared
with the original Cooley-Tukey:</p>
<p><button id="tryit_speed_dft1024" onclick="tryit_speed_dft( this, 1024 )">Measure the speed!</button> (Feel free to do it multiple times.)</p><pre id="tryit_speed_dft1024_output">(The speed measurement can last long in some browsers.)</pre>
<p>In Chrome 27 on a linux Ubuntu 12.04, I got a small speedup +50% by the first measurement, and afterwards a consistent speedup of about +150%. In Firefox 21 I got a consistent speedup of about +350%.</p>
<h3 class="anchorable" id="push-all">Pushing it: all performance results <a href="#push-all" class="anchor">#</a></h3>
<p>Speedup brought by <code>flatorize</code>:</p>
<ul>
<li>System: linux Ubuntu 12.04 on a Thinkpad T420s.</li>
<li>(1st): first measurement.</li>
<li>(2+): second measurement and following measurements.</li>
</ul>
<table id="all-perf">
<thead>
<tr><th class="table-left">Speedup on:</th><th><code><a href="#matmul">matmul342</a></code></th><th><code><a href="#dft">DFT16</a></code></th><th><code>DFT32</code></th><th><code>DFT64</code></th><th><code>DFT128</code></th><th><code>DFT256</code></th><th><code>DFT512</code></th><th><code><a href="#push-dft1024">DFT1024</a></code></th></tr>
</thead>
<tbody>
<tr class="table-sep"><td class="table-left" colspan="9"><hr></td></tr>
<!--tr id="all-perf-firefox19-first"><td class="table-left">Firefox 19 (1st)</td><td class="sad" id="all-perf-firefox19-first-matmul342">-28%</td><td class="happy" id="all-perf-firefox19-first-dft16">+1574%</td><td class="happy" id="all-perf-firefox19-first-dft32">+1822%</td><td class="happy" id="all-perf-firefox19-first-dft64">+1733%</td><td class="happy" id="all-perf-firefox19-first-dft128">+819%</td><td class="happy" id="all-perf-firefox19-first-dft256">+452%</td><td class="happy" id="all-perf-firefox19-first-dft512">+324%</td><td class="happy" id="all-perf-firefox19-first-dft1024">+291%</td></tr>
<tr id="all-perf-firefox19-after"><td class="table-left">Firefox 19 (2+)</td><td class="sad" id="all-perf-firefox19-after-matmul342">-27%</td><td class="happy" id="all-perf-firefox19-after-dft16">+1546%</td><td class="happy" id="all-perf-firefox19-after-dft32">+1798%</td><td class="happy" id="all-perf-firefox19-after-dft64">+1858%</td><td class="happy" id="all-perf-firefox19-after-dft128">+842%</td><td class="happy" id="all-perf-firefox19-after-dft256">+470%</td><td class="happy" id="all-perf-firefox19-after-dft512">+334%</td><td class="happy" id="all-perf-firefox19-after-dft1024">+297%</td></tr-->
<!-- <tr><td class="table-left">Firefox 21 (1st)</td><td class="sad" id="all-perf-firefox-21-run-1-matmul342">-33%</td><td class="happy" id="all-perf-firefox-21-run-1-dft16">+2094%</td><td class="happy" id="all-perf-firefox-21-run-1-dft32">+2451%</td><td class="happy" id="all-perf-firefox-21-run-1-dft64">+2383%</td><td class="happy" id="all-perf-firefox-21-run-1-dft128">+1562%</td><td class="happy" id="all-perf-firefox-21-run-1-dft256">+429%</td><td class="happy" id="all-perf-firefox-21-run-1-dft512">+354%</td><td class="happy" id="all-perf-firefox-21-run-1-dft1024">+329%</td></tr> -->
<!-- <tr><td class="table-left">Firefox 21 (2nd)</td><td class="sad" id="all-perf-firefox-21-run-2-matmul342">-33%</td><td class="happy" id="all-perf-firefox-21-run-2-dft16">+1665%</td><td class="happy" id="all-perf-firefox-21-run-2-dft32">+2361%</td><td class="happy" id="all-perf-firefox-21-run-2-dft64">+2399%</td><td class="happy" id="all-perf-firefox-21-run-2-dft128">+1593%</td><td class="happy" id="all-perf-firefox-21-run-2-dft256">+811%</td><td class="happy" id="all-perf-firefox-21-run-2-dft512">+425%</td><td class="happy" id="all-perf-firefox-21-run-2-dft1024">+371%</td></tr> -->
<!-- <tr><td class="table-left">Firefox 21 (3rd)</td><td class="sad" id="all-perf-firefox-21-run-3-matmul342">-33%</td><td class="happy" id="all-perf-firefox-21-run-3-dft16">+2425%</td><td class="happy" id="all-perf-firefox-21-run-3-dft32">+2434%</td><td class="happy" id="all-perf-firefox-21-run-3-dft64">+2350%</td><td class="happy" id="all-perf-firefox-21-run-3-dft128">+1572%</td><td class="happy" id="all-perf-firefox-21-run-3-dft256">+783%</td><td class="happy" id="all-perf-firefox-21-run-3-dft512">+416%</td><td class="happy" id="all-perf-firefox-21-run-3-dft1024">+363%</td></tr> -->
<tr><td class="table-left">Firefox 26 (1st)</td><td class="happy" id="all-perf-yours-run-1-matmul342">+615%</td><td class="happy" id="all-perf-yours-run-1-dft16">+2532%</td><td class="happy" id="all-perf-yours-run-1-dft32">+1755%</td><td class="happy" id="all-perf-yours-run-1-dft64">+1011%</td><td class="happy" id="all-perf-yours-run-1-dft128">+638%</td><td class="happy" id="all-perf-yours-run-1-dft256">+602%</td><td class="happy" id="all-perf-yours-run-1-dft512">+597%</td><td class="happy" id="all-perf-yours-run-1-dft1024">+624%</td></tr>
<tr><td class="table-left">Firefox 26 (2nd)</td><td class="happy" id="all-perf-yours-run-2-matmul342">+399%</td><td class="happy" id="all-perf-yours-run-2-dft16">+2067%</td><td class="happy" id="all-perf-yours-run-2-dft32">+1309%</td><td class="happy" id="all-perf-yours-run-2-dft64">+830%</td><td class="happy" id="all-perf-yours-run-2-dft128">+649%</td><td class="happy" id="all-perf-yours-run-2-dft256">+654%</td><td class="happy" id="all-perf-yours-run-2-dft512">+702%</td><td class="happy" id="all-perf-yours-run-2-dft1024">+301%</td></tr>
<tr><td class="table-left">Firefox 26 (3rd)</td><td class="happy" id="all-perf-yours-run-3-matmul342">+434%</td><td class="happy" id="all-perf-yours-run-3-dft16">+1945%</td><td class="happy" id="all-perf-yours-run-3-dft32">+2076%</td><td class="happy" id="all-perf-yours-run-3-dft64">+936%</td><td class="happy" id="all-perf-yours-run-3-dft128">+713%</td><td class="happy" id="all-perf-yours-run-3-dft256">+743%</td><td class="happy" id="all-perf-yours-run-3-dft512">+685%</td><td class="happy" id="all-perf-yours-run-3-dft1024">+383%</td></tr>
<tr class="table-sep"><td class="table-left" colspan="9"><hr></td></tr>
<!-- <tr><td class="table-left">Chrome 27 (1st)</td><td class="happy" id="all-perf-chrome27-run-1-matmul342">+220%</td><td class="happy" id="all-perf-chrome27-run-1-dft16">+1063%</td><td class="happy" id="all-perf-chrome27-run-1-dft32">+1040%</td><td class="happy" id="all-perf-chrome27-run-1-dft64">+951%</td><td class="happy" id="all-perf-chrome27-run-1-dft128">+685%</td><td class="happy" id="all-perf-chrome27-run-1-dft256">+557%</td><td class="happy" id="all-perf-chrome27-run-1-dft512">+541%</td><td class="happy" id="all-perf-chrome27-run-1-dft1024">+472%</td></tr> -->
<!-- <tr><td class="table-left">Chrome 27 (2)</td><td class="happy" id="all-perf-chrome27-run-2-matmul342">+216%</td><td class="happy" id="all-perf-chrome27-run-2-dft16">+981%</td><td class="happy" id="all-perf-chrome27-run-2-dft32">+1063%</td><td class="happy" id="all-perf-chrome27-run-2-dft64">+945%</td><td class="happy" id="all-perf-chrome27-run-2-dft128">+695%</td><td class="happy" id="all-perf-chrome27-run-2-dft256">+624%</td><td class="happy" id="all-perf-chrome27-run-2-dft512">+597%</td><td class="happy" id="all-perf-chrome27-run-2-dft1024">+572%</td></tr> -->
<!-- <tr><td class="table-left">Chrome 27 (3)</td><td class="happy" id="all-perf-chrome27-run-3-matmul342">+221%</td><td class="happy" id="all-perf-chrome27-run-3-dft16">+1060%</td><td class="happy" id="all-perf-chrome27-run-3-dft32">+1063%</td><td class="happy" id="all-perf-chrome27-run-3-dft64">+924%</td><td class="happy" id="all-perf-chrome27-run-3-dft128">+683%</td><td class="happy" id="all-perf-chrome27-run-3-dft256">+614%</td><td class="happy" id="all-perf-chrome27-run-3-dft512">+610%</td><td class="happy" id="all-perf-chrome27-run-3-dft1024">+573%</td></tr> -->
<tr><td class="table-left">Chrome 32 (1st)</td><td class="happy" id="all-perf-yours-run-1-matmul342">+87%</td><td class="happy" id="all-perf-yours-run-1-dft16">+1388%</td><td class="happy" id="all-perf-yours-run-1-dft32">+1072%</td><td class="happy" id="all-perf-yours-run-1-dft64">+1421%</td><td class="happy" id="all-perf-yours-run-1-dft128">+577%</td><td class="happy" id="all-perf-yours-run-1-dft256">+264%</td><td class="sad" id="all-perf-yours-run-1-dft512">-3%</td><td class="sad" id="all-perf-yours-run-1-dft1024">-67%</td></tr>
<tr><td class="table-left">Chrome 32 (2nd)</td><td class="happy" id="all-perf-yours-run-2-matmul342">+102%</td><td class="happy" id="all-perf-yours-run-2-dft16">+1044%</td><td class="happy" id="all-perf-yours-run-2-dft32">+1284%</td><td class="happy" id="all-perf-yours-run-2-dft64">+1557%</td><td class="happy" id="all-perf-yours-run-2-dft128">+1409%</td><td class="happy" id="all-perf-yours-run-2-dft256">+1600%</td><td class="happy" id="all-perf-yours-run-2-dft512">+1440%</td><td class="happy" id="all-perf-yours-run-2-dft1024">+1462%</td></tr>
<tr><td class="table-left">Chrome 32 (3rd)</td><td class="happy" id="all-perf-yours-run-4-matmul342">+98%</td><td class="happy" id="all-perf-yours-run-4-dft16">+1107%</td><td class="happy" id="all-perf-yours-run-4-dft32">+1141%</td><td class="happy" id="all-perf-yours-run-4-dft64">+1285%</td><td class="happy" id="all-perf-yours-run-4-dft128">+1428%</td><td class="happy" id="all-perf-yours-run-4-dft256">+1404%</td><td class="happy" id="all-perf-yours-run-4-dft512">+1515%</td><td class="happy" id="all-perf-yours-run-4-dft1024">+1580%</td></tr>
<tr class="table-sep"><td class="table-left" colspan="9"><hr></td></tr>
<!-- <tr id="all-perf-v8_3_19_10_first"><td class="table-left">V8 3.19.10 (1st)</td><td class="happy" id="all-perf-v8_3_19_10_first-matmul342">+72%</td><td class="happy" id="all-perf-v8_3_19_10_first-dft16">+1618%</td><td class="happy" id="all-perf-v8_3_19_10_first-dft32">+1783%</td><td class="happy" id="all-perf-v8_3_19_10_first-dft64">+858%</td><td class="happy" id="all-perf-v8_3_19_10_first-dft128">+396%</td><td class="sad" id="all-perf-v8_3_19_10_first-dft256">+95</td><td class="sad" id="all-perf-v8_3_19_10_first-dft512">-48%</td><td class="happy" id="all-perf-v8_3_19_10_first-dft1024">-87%</td></tr> -->
<!-- <tr id="all-perf-v8_3_19_10-after"><td class="table-left">V8 3.19.10 (2nd)</td><td class="happy" id="all-perf-v8_3_19_10-after-matmul342">+83%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft16">+1781%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft32">+2639%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft64">+1544%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft128">+1146%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft256">+1373%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft512">+1238%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft1024">+1221%</td></tr> -->
<!-- <tr id="all-perf-v8_3_19_10-after"><td class="table-left">V8 3.19.10 (3rd)</td><td class="happy" id="all-perf-v8_3_19_10-after-matmul342">+79%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft16">+1790%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft32">+2639%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft64">+1547%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft128">+1389%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft256">+1373%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft512">+1233%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft1024">+1187%</td></tr> -->
<tr id="all-perf-v8_3_19_10_first"><td class="table-left">V8 3.19.10 (1st)</td><td class="happy" id="all-perf-v8_3_19_10_first-matmul342">+74%</td><td class="happy" id="all-perf-v8_3_19_10_first-dft16">+1548%</td><td class="happy" id="all-perf-v8_3_19_10_first-dft32">+1987%</td><td class="happy" id="all-perf-v8_3_19_10_first-dft64">+1271%</td><td class="happy" id="all-perf-v8_3_19_10_first-dft128">+548%</td><td class="happy" id="all-perf-v8_3_19_10_first-dft256">+194%</td><td class="sad" id="all-perf-v8_3_19_10_first-dft512">-12%</td><td class="sad" id="all-perf-v8_3_19_10_first-dft1024">-77%</td></tr>
<tr id="all-perf-v8_3_19_10-after"><td class="table-left">V8 3.19.10 (2nd)</td><td class="happy" id="all-perf-v8_3_19_10-after-matmul342">+80%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft16">+1735%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft32">+2618%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft64">+2233%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft128">+1695%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft256">+1785%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft512">+1548%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft1024">+1587%</td></tr>
<tr id="all-perf-v8_3_19_10-after"><td class="table-left">V8 3.19.10 (3rd)</td><td class="happy" id="all-perf-v8_3_19_10-after-matmul342">+79%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft16">+1643%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft32">+2461%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft64">+2233%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft128">+1695%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft256">+1793%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft512">+1546%</td><td class="happy" id="all-perf-v8_3_19_10-after-dft1024">+1590%</td></tr>
<tr class="table-sep"><td class="table-left" colspan="9"><hr></td></tr>
<tr id="all-perf-yours-choice">
<td></td>
<td><input type="checkbox" id="tryit_cb_matmul342" checked="checked"></td>
<td><input type="checkbox" id="tryit_cb_dft16" checked="checked"></td>
<td><input type="checkbox" id="tryit_cb_dft32" checked="checked"></td>
<td><input type="checkbox" id="tryit_cb_dft64" checked="checked"></td>
<td><input type="checkbox" id="tryit_cb_dft128" checked="checked"></td>
<td><input type="checkbox" id="tryit_cb_dft256" checked="checked"></td>
<td><input type="checkbox" id="tryit_cb_dft512" checked="checked"></td>
<td><input type="checkbox" id="tryit_cb_dft1024" checked="checked"></td>
</tr>
<tr id="all-perf-yours"><td class="table-left"><button id="tryit_all" onclick="tryit_all( this, { except : { matmul342 : !document.getElementById('tryit_cb_matmul342').checked, dft16 : !document.getElementById('tryit_cb_dft16').checked, dft32 : !document.getElementById('tryit_cb_dft32').checked, dft64 : !document.getElementById('tryit_cb_dft64').checked, dft128 : !document.getElementById('tryit_cb_dft128').checked, dft256 : !document.getElementById('tryit_cb_dft256').checked, dft512 : !document.getElementById('tryit_cb_dft512').checked, dft1024 : !document.getElementById('tryit_cb_dft1024').checked } } )">Run now!</button><td id="all-perf-yours-matmul342"></td><td id="all-perf-yours-dft16"></td><td id="all-perf-yours-dft32"></td><td id="all-perf-yours-dft64"></td><td id="all-perf-yours-dft128"></td><td id="all-perf-yours-dft256"></td><td id="all-perf-yours-dft512"></td><td id="all-perf-yours-dft1024"></td></tr>
</tbody>
</table>
<p>(Feel free to do it multiple times.)</p>
<pre id="tryit_speed_matmul342_output">(The speed measurement can last long in some browsers.)</pre>
<h4 class="anchorable" id="push-all-speedup">Result analysis: speedup <a href="#push-all-speedup" class="anchor">#</a></h4>
<p>Firefox 26 shows excellent speedups everywhere.</p>
<p>Chrome 32 and V8 3.19.10 mostly show excellent speedups, especially after the first pass, which likely triggers code optimization.</p>
<h4 class="anchorable" id="push-all-memory">Result analysis: memory <a href="#push-all-memory" class="anchor">#</a></h4>
<p>In all environments the memory consumption remains very stable.</p>
<h3 class="anchorable" id="conclusion">Conclusion <a href="#conclusion" class="anchor">#</a></h3>
<p>The huge performance improvements validate the interest
of <code>flatorize</code>.</p>
<p> A <a href="asmjs.html">plugin</a> is available that
generates <code> asm.js
</code>
code, which runs even faster (tested in
Firefox and Chrome).</p>
<p>Two other plugins generate fast code for the <a href="c.html">C language</a> and the <a href="d.html">D language</a>.</p>
<p>See also: <a href="test/speed_test_c_fftw/dftreal.html">Speed tests</a> of the various solutions & languages (JS, <code>asm.js</code>, C).</p>
<p>In the future, one could also try <a href="http://www.khronos.org/webcl/">WebCL</a>, <a href="http://code.google.com/p/nativeclient/">NaCl</a> or <a href="http://www.chromium.org/nativeclient/pnacl/building-and-testing-portable-native-client">PNaCl</a>.
</p>
<hr class="section-hr">
<h3 class="anchorable" id="ack">Acknowledgments <a href="#ack" class="anchor">#</a></h3>
<p>Big thanks to the V8 developers, who solved issue <a href="http://code.google.com/p/v8/issues/detail?id=2612">#2612</a> in record time (concerning Chrome 24 and V8 3.17).</p>
<hr class="section-hr">
<h3 class="anchorable" id="disclaimer">Disclaimer <a href="#disclaimer" class="anchor">#</a></h3>
<p>By no means did I intend <code>flatorize</code> to compete with highly specialized and/or native implementations <em>of a particular task</em> (e.g. <a href="http://fftw.org">FFTW</a> to implement DFT).</p>
<p>Rather, this simple, task-agnostic approach – factorizing and flattening – can give <em>huge enough</em> performance gains in many cases, which spares the development and maintainance costs of writing optimized native code by hand.</p>
<hr class="section-hr">
<h3 class="anchorable" id="misc-details">Miscellaneous details <a href="#misc-details" class="anchor">#</a></h3>
<p>Above, the three main functions were described: <code>flatorize</code>, <code>flatorize.expr</code> and <code>flatorize.part</code>. In addition:</p>
<ol>
<li>The method <code>getDirect</code> and the convenience function <code>flatorize.now</code> give you direct access to the flatorized implementation, which you can use for computations (even better performance). <code>flatorize.now</code> makes sense when you know that you will never use the flatorized function to build further expressions, e.g. in specific cases of a parameterizable implementation (<code>matmul342</code>); else use <code>getDirect</code>.</li>
<li>Within a function that builds an expression, a call to the method <code>evalnow</code> delivers a value (instead of an expression). See <a href="#dft">above</a> the call <code>cpol.evalnow</code> within <code>dft_exprgenF</code>.</li>
</ol>
<p>For more details about these two points, and examples, you can have a look at the sources on <a href="https://github.com/glathoud/flatorize">github</a> or here (files <a href="flatorize.js">flatorize.js</a> and <a href="examples.js">examples.js</a>).</p>
<p>.</p>
<p>V8: Several <code>.sh</code> files are provided to run the tests on
V8,
including <a href="d8_tryit_speed_all.sh">d8_tryit_speed_all.sh</a>
and <a href="d8_tryit_speed_dft1024.sh">d8_tryit_speed_dft1024.sh</a>;
the rest on <a href="https://github.com/glathoud/flatorize">github</a>.</p>
<p>.</p>
<p>Flatorizing DFT1024 in a tractable time – order of magnitude: 1 second – actually required a bit of work. The method is not specific to DFT, but rather very generic. The resulting, cleaned-up implementation: <a href="flatorize.js">flatorize.js</a>
</p>
<hr class="section-hr">
<h3 class="anchorable" id="license">License <a href="#license" class="anchor">#</a></h3>
<object width="100%" height="350px" type="text/plain" data="LICENSE.TXT"></object>
<script type="text/javascript" src="contents_ul.js"></script>
<script type="text/javascript" src="prettify/prettify.js"></script>
<script type="text/javascript">setTimeout( prettyPrint );</script>
<script type="text/javascript" src="btn_anchor.js"></script>
<script type="text/javascript" src="../MathJax/MathJax.js?config=default"></script>
<script type="text/javascript" src="ga.js"></script>
</body>
</html>