-
Notifications
You must be signed in to change notification settings - Fork 1
/
CudaMat.html
1110 lines (925 loc) · 48.2 KB
/
CudaMat.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<title>CudaMat - using the computing powers of graphics cards within
Malab</title>
<style type="text/css">
<!--
@media print {
body {
padding-top: 0.000000in;
padding-bottom: 0.000000in;
padding-left: 0.982639in;
padding-right: 0.982639in;
}
}
body {
font-family: 'Times New Roman';
font-style: normal;
text-indent: 0in;
font-weight: normal;
font-variant: normal;
color: #000000;
text-decoration: none;
text-align: left;
font-size: 12pt;
widows: 2;
font-stretch: normal;
background-color: #ffffff;
}
h1, .Heading1 {
font-size: 17pt;
margin-bottom: 0.0417in;
font-weight: bold;
font-family: 'Arial';
margin-top: 0.3056in;
}
h2, .Heading2 {
font-size: 14pt;
margin-bottom: 0.0417in;
font-weight: bold;
font-family: 'Arial';
margin-top: 0.3056in;
}
h3, .Heading3 {
font-size: 12pt;
margin-bottom: 0.0417in;
font-weight: bold;
font-family: 'Arial';
margin-top: 0.3056in;
}
p, .Normal {
font-family: 'Times New Roman';
font-style: normal;
margin-left: 0pt;
text-indent: 0in;
margin-top: 0pt;
font-weight: normal;
font-variant: normal;
color: #000000;
text-decoration: none;
margin-bottom: 0pt;
text-align: left;
margin-right: 0pt;
font-size: 12pt;
widows: 2;
font-stretch: normal;
}
-->
</style>
<meta content="This package allows to harness the GPU power within
Matlab with no or minimal change of the code" name="description">
<meta http-equiv="CONTENT-TYPE" content="text/html; charset=UTF-8">
<meta name="GENERATOR" content="OpenOffice.org 3.0 (Unix)">
<style type="text/css">
<!--
@page { margin: 2cm }
P { margin-bottom: 0.21cm }
-->
</style>
<meta http-equiv="CONTENT-TYPE" content="text/html; charset=UTF-8">
<meta name="GENERATOR" content="OpenOffice.org 3.0 (Unix)">
<style type="text/css">
<!--
@page { margin: 2cm }
P { margin-bottom: 0.21cm }
--></style>
</head>
<body>
<div>
<div style="text-align: center;"></div>
<h1 style="margin-right: 0in; text-align: center;" dir="ltr"><span
style="font-weight: bold; font-size: 17pt; font-family:
'Arial';">Information on CudaMat </span></h1>
<h1 style="text-align: center;"><span style="font-weight: bold;
font-size: 14pt; font-family: 'Arial';">CudaMat (current
version: 2.0.00 beta, 01. August 2016)<br>
</span></h1>
<div style="text-align: center;"><span style="font-weight: bold;
font-size: 14pt; font-family: 'Arial';"> </span><span
style="font-weight: bold; font-family: 'Arial';">Rainer
Heintzmann, Friedrich Schiller University of Jena &
IPHT, Jena, Germany.</span><br>
<span style="font-weight: bold; font-family: 'Arial';"></span></div>
<h3 style="text-align: center; margin-right: 0in;" dir="ltr"><span
style="font-weight: bold; font-family: 'Arial';">(heintzmann
at gmail dot com)</span></h3>
<br>
<p class="western" style="margin-bottom: 0cm;">CudaMat enables
fast computing on graphics cards that supports the <a
href="http://www.nvidia.com/object/cuda_home.html">CUDA
programming language</a>. Currently such cards are available
from NVidia. CudaMat is, as much as possible, invisible to the
user. The idea is that the user can transform any existing
Matlab code into a CudaMat code with minimal effort. E.g. with a
single line like<i> a=cuda(a)</i> the Matlab object <span
style="font-style: italic;">'a</span>' gets transformed into a
CudaMat object <span style="font-style: italic;">'a</span>'.
This can be checked using the matlab command <span
style="font-style: italic;">whos</span>.</p>
<h2>Under which conditions will CudaMat be fast?</h2>
CudaMat will greately improve the speed of your code, when the
main time of your Matlab code is spent in computing 'expensive'
operations between large matrices and/or vectors, sums over them
or Fourier transformations. However, when the problem consists of
many operations on small matrices and vectors, CudaMat will
probably not help you and might in fact turn out to be slower than
standard matlab code. One way to think of this is that every start
of a function execution in CudaMat has some overhead, but once it
is running, it is quite fast.<br>
It may be possible to adjust the performance a little bit by
changing the two <span style="font-style: italic;">#define</span>
commands for <span style="font-style: italic;">BLOCKSIZE</span>
and <span style="font-style: italic;">CUIMAGE_REDUCE_THREADS</span>
given at the top of the file <span style="font-style: italic;">cudaArith.cu</span>.<br>
<h2>Is there a demo to quickly check the performance increase?</h2>
Yes. CudaMat comes with a two test programs '<span
style="font-style: italic;">applemantest.m</span>' and '<span
style="font-style: italic;">speedtestDeconv.m</span>'.<br>
<span style="font-style: italic;"><br>
applemantest.m</span> calculates the famous Mandelbrot set in a
straight forward way. This test has the advantage that it does not
require any toolboxes other than <a
href="https://github.com/RainerHeintzmann/CudaMat">CudaMat</a>
and <a href="http://www.nvidia.com/object/cuda_get.html">NVidia's
cuda library</a> to be installed. The speedup optained depend on
the chosen datasize. On my Intel(R) Core(TM) i7 CPU @ 2,8 GHz, 64
bit processor, Windows 7 is about a factor of 30 (2.35 versus 75,5
seconds) for a 2048x2048 image with iteration depth 300.<br>
The new (as of version 1.0.0.06 beta) on-the-fly compilation
allows a further speedup by writing code snippets for the GPU. In
this case the graphic card needs 0.088 second for the example
above, yielding <span style="font-weight: bold;">a total speedup
bigger than 850</span>! Type "edit applemantest" under matlab to
get an example how to achieve such speed.<br>
<br>
<span style="font-style: italic;">speedtestDeconv.m</span>
measures the performance for an example deconvolution of a 3D
microscopy dataset (using the DipImage '<span style="font-style:
italic;">chromo3d</span>' example image).<br>
To run this demo, <a href="http://www.diplib.org">DipImage</a>
with the example images and <a
href="https://github.com/RainerHeintzmann/CudaMat">CudaMat</a>
need to be installed, as well as the optimisation toolbox with the
function<span style="font-style: italic;"> <a
href="http://www.di.ens.fr/%7Emschmidt/Software/minFunc.html">minFunc()</a>
</span>written by <a href="http://www.di.ens.fr/%7Emschmidt/">Mark
Schmidt</a> (line 103 in the file <span style="font-style:
italic;">polyinterp.m</span> needs to be changed to: <span
style="font-style: italic;">for qq=1:length(cp);xCP=cp(qq);</span>
and the appearances of <span style="font-style: italic;">ones()</span>
and <span style="font-style: italic;">zeros()</span> need to be
changed to <span style="font-style: italic;">ones_cuda() </span>and<span
style="font-style: italic;"> zeros_cuda()</span>)<font
face="tahoma"><font size="3"></font></font>. A GeForce GTX 280
card gave about 10x speedup (3.3 versus 30.3 seconds) in
comparison to a 2,4 GHz AMD Hammer 64 bit processor and gcc 4.3.2
run under OpenSuse11.1 .<br>
<h2>What is CUDA?</h2>
Cuda is a programming language extension to C which enables code
to run in parallel on multi-processor graphics cards. Current
graphics cards can have more than 200 processors running
simultaneously. They all execute the same code (SIMD = single
instruction, multiple data). If a branch point (e.g. initiated by
an '<span style="font-style: italic;">if</span>') is reached,
where some processors have to execute different code than others,
these processes are temporarily suspended. The beauty of the
hardware is that this switching between many thousands of
processes is very efficient.<br>
<h2>What changes may be necessary to existing Matlab code to run
under CudaMat?</h2>
Note that CudaMat currently only supports the <span
style="font-style: italic; font-weight: bold;">single</span>
floatingpoint <span style="font-weight: bold;">datatype</span> of
matlab (4 bytes). Since Matlab usually computed with doubles, the
results can differ depending on how sensitive the algorithm is to
roud-off errors.<br>
The general idea is that <span style="font-weight: bold;">only
large marix (image) input objects </span>requiring time
intensive conputations should converted to cuda before the
existing Matlab code is run. <span style="font-weight: bold;">Ideally
no changes to the Matlab code should be necessary.</span><br>
However, practically minor changes can be necessary, if CudaMat
does not support the operation used in the Matlab code. This is
especially the case for <br>
<ul>
<li>Additional datatypes defined by the Matlab code</li>
<li>Using a standard Matlab operation that is not yet
implemented in CudaMat</li>
<li>If the Matlab code checks for the datatype with operations
other than isreal() or isfloat(). E.g. if the operation isa()
is used, the result is probably wrong.</li>
<li>for loops iterating over the contense of a vector need a
minor change (iterating over an access index and the assigning
the component by indexing in the vector) to be compatible with
CudaMat<br>
</li>
</ul>
Sometimes the system may perform an automatic conversion to a
Matlab object, with the associated overhead involved in
transferring from the graphics card.<br>
In other cases the user will have to either force this conversion
(e.g. using <span style="font-style: italic;">single_force(a)</span>),
find an alternative expression, which is supported in CudaMat or
extend the CudaMat algorithms to support this additional feature
(please send me an email with the new code, so I can put it up on
the website). <br>
<br>
In addition, there may be changes necessary inside the Matlab
code, if new objects are generated, as these will be by default
Matlab matrices.<br>
Prominent examples are the Matlab commands <span
style="font-style: italic;">zeros() </span>and <span
style="font-style: italic;">ones() </span>, which by default
generate Matlab objects. These function calls should be changed to<span
style="font-style: italic;"> zeros_cuda()</span>, <span
style="font-style: italic;">ones_cuda()</span>.<br>
Global variables influencing the behaviour of <span
style="font-style: italic;">zeros_cuda()</span>, <span
style="font-style: italic;">ones_cuda()</span> but also
overloaded DIPImage funcitons<span style="font-style: italic;"> </span><span
style="font-style: italic;">newim(), </span><span
style="font-style: italic;">xx(), yy(), zz(), </span>rr(),
phiphi(). <br>
Whether they then generate a standard or a <span
style="font-style: italic;">cuda</span> object) can conveniently
be set via the functions <span style="font-style: italic;">set_ones_cuda(state)</span>
and <span style="font-style: italic;">set_zeros_cuda(state) and
alike</span>.<br>
<br>
Other command which generate Matlab objects are enumerations such
as <span style="font-style: italic;">[1:N]</span> or <span
style="font-style: italic;">meshgrid()</span>.<br>
In future versions, it will be possible to define by a set of
global variables whether these functions should generate standard
Matlab objects of cuda objects.<br>
In addition it may (in rare cases) be necessary to convert
standard Matlab matrices to cuda (e.g. using the command <span
style="font-style: italic;">cuda(a)</span>) within the Matlab
code to run, as some CudaMat functions may not yet automatically
do so.<br>
<h2>Why a separate datatype 'cuda'?</h2>
<br>
To realize the idea of accessing the speed of the graphics cards
from within the convenient programming environment of Matlab
efficiently, one has to avoid memory transfer to and from the
graphics card as much as possible. To this aim a datatype 'cuda'
was introduced. <br>
Whenever matlab needs to execute a function that involves a cuda
object as one of it’s arguments, it checks for the presence of
this function in the folder <span style="font-style: italic;">@cuda
</span>and executes the code given there. In this way it is
ensured that code can efficiently be executed on the graphics
card, without the cuda objects leaving the card.<br>
<h2>When will transfers be made to and from the graphics card?</h2>
<br>
If a cuda object is created (e.g.<span style="font-style: italic;">
a=cuda(a)</span>), the matlab object is transferred to the
graphics card. This costs some time and should thus ideally not be
performed within the inner loop of a calculation. With every
output operation (e.g. printing the values on the screen or
displaying an image) the data is transferred back from the
graphics card to Matlab. <br>
The commands <span style="font-style: italic;">double_force(a) </span>and
<span style="font-style: italic;">single_force(a) </span>will
force a conversion from a cuda object back to matlab (and not
affect the object if it is already a standard matlab double or
single).<br>
In the event that a CudaMat operation results in a single value,
the result will automatically transferred back to an ordinary
Matlab object.<br>
<br>
Why do ordinary conversion operations '<span style="font-style:
italic;">single(a)</span>' and 'double(a)' not convert back to a
Matlab matrix?<br>
Currently these operations leave the objects on the graphics card,
with the aim to require as little modification as possible to
existing Matlab programs to be able to run under CudaMat.
Currently these command are essentially ignored. To force a
conversion use the command <span style="font-style: italic;">single_force(a)
</span>or <span style="font-style: italic;">double_force(a) </span>with
a cuda object 'a'.<br>
<h2>How can I reset the graphics card when something went wrong?</h2>
<br>
If an error appeared during the execution of code on the graphics
card, it is possible that cuda is in a state, where it needs a
reset. In this case the first thing to try is the matlab command '<span
style="font-style: italic;">clear classes</span>', which will
reload the cuda class and force cuda to initialize on the next
cuda call. If this does not work, one will have to quite Matlab
and restart it.<br>
<h2>Supported Datatypes</h2>
<br>
Currently only the datatypes single and single complex are fully
supported by CudaMat. This means that in the current version all
computations in double are simply performed at single precision.
This results in a loss in precisions, which is sometimes not
acceptable in an application. Future versions will support more
datatypes (e.g. int datatypes). Currently the cuda libraries (and
in part the hardware) often also just supports single precision
computations.<br>
<h2>How can I change the behaviour of certain operations in
CudaMat?</h2>
<br>
Currently there are very few possibilities to influence the
behaviour of CudaMat. However, it is planned that the following
can be influenced by global environment variables in the future:<br>
<ul>
<li>adjusting (optimizing) the threading parameters for the cuda
code, by entering the number of processors that the code
should assume. Also other optimisation parameters can be set.</li>
<li>Defining whether the commands <span style="font-style:
italic;">double()</span> and <span style="font-style:
italic;">single()</span> will convert cuda objects back to
Matlab objects or not.</li>
<li>Defining the behaviour subasgn should be executed (optimized
or compatible)</li>
<li>Control whether warning should be printed when automatic
conversions to cuda objects are performed.</li>
</ul>
<h2>Interfacing with DipImage</h2>
<br>
CudaMat is designed to be compatible with standard Matlab objects
as well as objects of the dipimage datatyp. This does not mean
that DipImage needs to be installed. If no version of DipImage is
installed, all objects are simply of Matlab origin (<span
style="font-style: italic;">object.fromDip=false</span>).<br>
DipImage is an image processing toolbox from Delft university (see<a
href="http://www.diplib.org"> www.diplib.org</a>) which can be
obtained free of charge for the academic community.<br>
This compatibility could be achieved by having the datatype cuda
remember where each object came from using a tag 'fromDip' within
each object. However, currently only very basic operations of
DipImage are supported within CudaMat.<br>
<h2>Known incompatibilities</h2>
<br>
Matlab subassign operations such as '<span style="font-style:
italic;">b=a;a(3:5,7:10)=10</span>' would change the variable b
in the current version. The reason for this is that by simply
changing the object 'a' the code currently avoids an extra copy
and delete operation as it simply performs the subassign. However,
if another identical copy of the object exists this object '<span
style="font-style: italic;">b</span>' will be modified too
(contrary to standard Matlab code), as Matlab is tricked in
avoiding the extra copy operation.<br>
<h2>Additional CudaMat operations not present in standard Matlab</h2>
<br>
Many of the dipimage operations are implemented also for the cuda
datatype when imported from a standard matlab object.<br>
E.g. ft and ift perform fft and fft shift operations<br>
<br>
<h2>The really big speedup: Implementing your own Cuda function</h2>
If you type<br>
edit applemantest.m<br>
and look at the code, you get an idea, about how to really speed
up the code. The essential bit is to write a small pice of C-style
code which is automatically wrapped up by CudaMat into its own
function that can then be called. This is possible for a number of
standard functions.<br>
The two essential commands which do the magic are:<br>
"cuda_define" and "cuda_compile_all". The former defines a new
cuda function with its own name and a program code as given by a
string. Then many such definitions can be collected and finally
the cuda_compile_all command wraps them all up in the correct ways
and compiles them such that they can be called from within matlab
simply by their given name.<br>
However, the programming of such new functions has to observ
certain rule as described in the <a
href="on-the-fly-programming-guide.html">on-the-fly-programming-guide</a>.<br>
<h2>Known errors / incompatibilies<br>
</h2>
<ul>
<li>sum, min and max for arrays always sum over all elements in
CudaMat. This has to be changed to be compatible with standard
Matlab code (partial sums) and the possibility in DipImage to
sum over arbitrary dimensions.</li>
<li>for loops assigning vectors do not work (e.g. : <span
style="font-style: italic;">for q=cuda([1 2 3 4 5 4 3 2
1]);fprintf('Hello Wold\n');end</span> would not
produce the same result as standard matlab code)<br>
</li>
<li>as CudaMat works always with floating point datatypes,
certain kind of operations (integer division) and overflow
errors (e.g. for byte datatype in dipimage) are not supported.<br>
</li>
</ul>
<h2>The internal structure of CudaMat</h2>
<br>
CudaMat is based on the cuda datatype. All the methods operating
on this datatype are stored in the <span style="font-style:
italic;">@cuda </span>folder and other methods (which also do
something for other datatypes) are stored outside in the main
CudaMat folder.<br>
A cuda object stores a reference (<span style="font-style:
italic;">myobject.ref</span>) and the information whether it
should be treated according to Matlab or DipImage conventions (<span
style="font-style: italic;">myobject.fromDIP</span>). The cuda
functions are either taken direction from the Cuda fft and CuBlas
libraries or are written in CUDA (all in the file <span
style="font-style: italic;">cudaArith.cu</span>). The mex file <span
style="font-style: italic;">cuda_cuda.c</span> is a frontend to
cuda which supports all the functionalilty. The main mex function
in this file is invoked always with a command string, telling it
which command to execute. At the moment this sting is parsed
simply by a daisy chain of strcmp operations. As the number of
commands has grown, this might eventually present an unacceptable
overhead, but I believe at the moment it should still not pose a
problem.<br>
This interface should make it comparably easy to adapt the code
for working under Octave, Mathematica or in fact any other
interpreter driven language.<br>
<h2 style="text-align: left; margin-right: 0in;" dir="ltr"><span
style="font-weight: bold; font-size: 14pt; font-family:
'Arial';">How to obtain</span></h2>
<p style="text-align: left; margin-bottom: 0in; margin-top: 0in;
margin-right: 0in;" dir="ltr">Download <a
href="https://github.com/RainerHeintzmann/CudaMat">the current
version</a> as a tar-gzip-file with all the necessary classes
and an example html-file in it. Just place the CudaMat folder
somewhere, add it to the Matlab path and call initCuda (see
installation details below). Depending on the operation system
it may be necessary to recompile the modules cudaArith.cu and
cuda_cuda.c. A makefile for unix environment is provided. <br>
</p>
For CudaMat you will need <a
href="http://www.nvidia.com/object/cuda_get.html">NVidia's cuda
library</a> installed on your operating system and a
graphics card which can run cuda programs (above GeForce 8800).<br>
This software is released under the GPL2 license. It can be used
for non-commercial purposes.
<h2>Installation instructions</h2>
<p>CudaMat can be installed in two different ways. The easy way
is, if there is no need to modify any cuda code. You can simply
download the newest version of CudaMat and unzip it. It will
contain a folder called "user64bitCuda6VC11" or similar.<br>
This folder has to be copied to the temp file location as
obtained by typing "tempdir" in you Matlab installation and
renamed to "user". This directory will be user-specific.<br>
Then only a Cuda Runtime library needs to be installed
corresponding to the Cuda version in the filename and possibly
C-runtime libraries corresponding to the C-version in the
filename.<br>
</p>
<p>However, it should be noted, that this does not give you the
capability of recompiling code or introducing user-defined cuda
funtions. Thus you do not get the full benefit of CudaMat but
should be able to run some fast code anyway.<br>
</p>
<h2>Installation instructions (64 Linux system)<br>
</h2>
<p style="text-align: left; margin-bottom: 0in; margin-top: 0in;
margin-right: 0in;" dir="ltr">Download <a
href="https://github.com/RainerHeintzmann/CudaMat">the current
version</a> into a folder<span style="font-style: italic;">
/usr/local/CudaMat/ </span>and unpack it with <span
style="font-style: italic;">tar -xzf CudaMat.tgz</span> .<br>
<a href="http://www.nvidia.com/object/cuda_get.html">NVidia's
cuda driver and toolkit</a> needs to be installed according to
the manufacturer's instruction. Make sure this is really the
version corresponding to<br>
the Cuda Toolkit.<br>
sudo vi /usr/local/cuda/bin/nvcc.profile<br>
</p>
<div id=":12o">add option "-fPIC" to nvcc.profile. The line should
now read:<br>
INCLUDES += -fPIC
"-I$(TOP)/include" "-I$(TOP)/include/cudart" $(_SPACE_)</div>
<p style="text-align: left; margin-bottom: 0in; margin-top: 0in;
margin-right: 0in;" dir="ltr"><br>
</p>
<p style="text-align: left; margin-bottom: 0in; margin-top: 0in;
margin-right: 0in;" dir="ltr">To leave the X-window system under
SuSe Linux, log off and the click on "menu" and select Console.
The in the console (as superuser) you can run the driver
installation program.<br>
</p>
<p style="text-align: left; margin-bottom: 0in; margin-top: 0in;
margin-right: 0in;" dir="ltr">Edit the file ".profile" in your
user home directory and add the lines:<br>
export
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/MATLAB/dip/Linuxa64/lib:/usr/local/cuda/lib64:/usr/local/cula/lib64:/usr/lib64:/usr/lib<br>
export PATH=$PATH:/usr/local/cuda/bin</p>
<p style="text-align: left; margin-bottom: 0in; margin-top: 0in;
margin-right: 0in;" dir="ltr"><br>
export CULA_ROOT=/usr/local/cula/<br>
export CULA_INC_PATH=/usr/local/cula/include<br>
export CULA_BIN_PATH_64=/usr/local/cula/<br>
export CULA_LIB_PATH_64=/usr/local/cula/lib64<br>
export CULA_BIN_PATH_32=/usr/local/cula/<br>
export CULA_LIB_PATH_32=/usr/local/cula/lib<br>
<br>
</p>
<p style="text-align: left; margin-bottom: 0in; margin-top: 0in;
margin-right: 0in;" dir="ltr">Install CULA (needs a free
registration) from <a href="http://www.culatools.com/">http://www.culatools.com/</a>
to add support for the matlab "svd" and equation system solving
commands.<br>
</p>
<p style="text-align: left; margin-bottom: 0in; margin-top: 0in;
margin-right: 0in;" dir="ltr"><br>
To fix a problem with mex compilation in Matlab, modify the file<br>
/usr/local/matlab2010a/bin/.matlab7rc.sh<br>
</p>
<div id=":131">and modify LDPATH_PREFIX to<br>
LDPATH_PREFIX='/usr/lib64'<br>
in all theachitechure configurations.<br>
<br>
edit<br>
/usr/local/MATLAB/R2010a/bin/gccopts.sh<br>
<div id=":12o">and delete all occurances of "-ansi" to avoid
compilation problems with C++ style comments.<br>
type<br>
mex -setup<br>
as a standart user in Matlab, to copy the above change into
the local user directory<br>
<div class="im"><br>
</div>
</div>
If compiling with mex inside matlab (after restart of matlab)
still does not work, it might have to be done outside Matlab,
since Matlab uses a wrong LD_LibraryPath the same mex command
works also outside.<br>
</div>
<p style="text-align: left; margin-bottom: 0in; margin-top: 0in;
margin-right: 0in;" dir="ltr">In some versions of Matlab the
following links need to be created:</p>
<p style="text-align: left; margin-bottom: 0in; margin-top: 0in;
margin-right: 0in;" dir="ltr">su<br>
cd /usr/lib64<br>
ln -s libGLU.so.1 libGLU.so<br>
ln -s libX11.so.6 libX11.so<br>
ln -s libXi.so.6 libXi.so<br>
ln -s libXmu.so.6 libXmu.so<br>
ln -s libglut.so.3 libglut.so<br>
ln -s libcuda.so.1 libcuda.so<br>
exit<br>
</p>
<p style="text-align: left; margin-bottom: 0in; margin-top: 0in;
margin-right: 0in;" dir="ltr"><br>
In some Matlab versions it needs to know about the library. If
matlab is installed in<span style="font-style: italic;">
/usr/local/matlab</span> type:<br>
su<br>
</p>
<p style="text-align: left; margin-bottom: 0in; margin-top: 0in;
margin-right: 0in;" dir="ltr"><span style="font-style: italic;">cd
/usr/local/matlab/bin/glnxa64/</span><br>
<span style="font-style: italic;">ln -s
/usr/local/CudaMat/libcudaArith.so </span><br
style="font-style: italic;">
</p>
<p style="text-align: left; margin-bottom: 0in; margin-top: 0in;
margin-right: 0in;" dir="ltr">exit<br>
</p>
<p style="text-align: left; margin-bottom: 0in; margin-top: 0in;
margin-right: 0in;" dir="ltr"><br>
</p>
<p style="text-align: left; margin-bottom: 0in; margin-top: 0in;
margin-right: 0in;" dir="ltr">The commands for compilation under
Matlab are<br>
system('nvcc -c cudaArith.cu -I/usr/local/cuda/include/')<br>
</p>
<p style="text-align: left; margin-bottom: 0in; margin-top: 0in;
margin-right: 0in;" dir="ltr">and</p>
<p style="text-align: left; margin-bottom: 0in; margin-top: 0in;
margin-right: 0in;" dir="ltr">mex cuda_cuda.c cudaArith.o
-I/usr/local/cula/include -I/usr/local/cuda/include
-L/usr/local/cula/lib64 -L/usr/local/cuda/lib64 -lcublas -lcufft
-lcudart -lcula<br>
</p>
<p style="text-align: left; margin-bottom: 0in; margin-top: 0in;
margin-right: 0in;" dir="ltr"><br>
</p>
<p style="text-align: left; margin-bottom: 0in; margin-top: 0in;
margin-right: 0in;" dir="ltr">with appropriately modified -I and
-LC paths from the cuda and cula installation.<br>
</p>
<p style="text-align: left; margin-bottom: 0in; margin-top: 0in;
margin-right: 0in;" dir="ltr"></p>
<p style="text-align: left; margin-bottom: 0in; margin-top: 0in;
margin-right: 0in;" dir="ltr"><br>
</p>
<p style="text-align: left; margin-bottom: 0in; margin-top: 0in;
margin-right: 0in;" dir="ltr"> For more details on the setup and
testing see Windows 64 bit installation below.<br>
</p>
<h2>Installation instructions (Windows 32 bit system)</h2>
add the path of the (visual studio) cl.exe comiler into PATH
(windows -> home, or right click computer)<br>
<a href="http://www.nvidia.com/object/cuda_get.html">NVidia's cuda
library</a> and SDK needs to be installed according to the
manufacturer's instruction.<br>
compile under Matlab: Change to the directory where CudaMat was
downloaded to, e.g.:<br>
<span style="font-style: italic;">cd c:\Pro'gram
Files'\dip\CudaMat\</span><br>
<span style="text-decoration: underline;"><br>
</span>Compile the cuda part of the program using NVidia's nvcc
compiler:<br style="font-style: italic;">
<span style="font-style: italic;">system('nvcc --compile
cudaArith.cu')</span><br style="font-style: italic;">
<span style="font-style: italic;">mex -setup</span><br
style="font-style: italic;">
<span style="font-style: italic;">mex cuda_cuda.c cudaArith.obj
-Ic:\CUDA\include\ -LC:\CUDA\lib -lcublas -lcufft -lcuda
-lcudart</span><br style="font-style: italic;">
See if the installation was successful by typing in matlab:<br
style="font-style: italic;">
<span style="font-style: italic;">applemantest(1)</span>;<br>
<br>
For more details on the setup and testing see Windows 64 bit
installation below.<br>
<h2>Installation instructions (Windows 64 bit system)</h2>
<br>
<b>- Install VC++ Express and <span class="il">Windows</span>
SDK:</b> Visual Studio does not come with 64-bit compiler (not
quite sure) and 64-bit libraries (for sure). You have to obtain
the <span class="il">windows</span> SDK for your OS which
provides the 64-bit libraries, headers, and the compiler. Ensure
that 64-bit packages are selected when installing <span
class="il">Windows</span> SDK.<br>
VC++ Express: <a
href="http://www.microsoft.com/express/Downloads/#2010-Visual-CPP"
target="_blank">http://www.microsoft.com/<wbr>express/Downloads/#2010-<wbr>Visual-CPP</a><br>
<span class="il">Windows</span> SDK: <a
href="http://msdn.microsoft.com/en-us/windows/bb980924.aspx"
target="_blank">http://msdn.microsoft.com/en-<wbr>us/<span
class="il">windows</span>/bb980924.aspx</a><br>
<b><br>
- Install <span class="il">CUDA</span>: </b>There are three
things to install, all available from <a
href="http://developer.nvidia.com/object/cuda_3_0_downloads.html"
target="_blank"></a><a
href="http://www.nvidia.com/content/cuda/cuda-downloads.html">http://www.nvidia.com/content/cuda/cuda-downloads.html<br>
</a>Download and install development version of NVIDIA
drivers, <span class="il">CUDA</span> Toolkit, <span class="il">CUDA</span>
SDK. Current version is 4.2<br>
<br>
<b>- Install CudaMat as described above</b>. To be able to use
cudamat one needs to compile the custom library cudaArithmatic.obj
(with nvcc) and the mex file cuda_cuda.mexw64 (with mex).
Precomiled version might possibly work, but not guaranteed (due to
mismatch of systems).<br>
<br>
<b>Configuration of mex and MatLab:</b><br>
> mex -setup<br>
Works well iff VC++ and <span class="il">Windows</span> SDK are
installed and the 64-bit compiler (cl.exe) is visible on the
system PATH.<br>
<br>
If you have not installed and set up Cula you should add the
following lines to your startup.m file:<br>
<span style="font-style: italic;">addpath('/usr/local/CudaMat/');<br>
initCuda();</span><br>
<br>
If you do not want to by default create ones (using "ones_cuda"),
zeros (using "zeros_cuda"), you should change these places
in the code by replacing the matlab function "ones" with
"ones_cuda" and "zeros" with "zeros_cuda". See ones_cuda,
zeros_cuda for more detail.<br>
<br>
The dipimage generator functions "newimage", "xx","yy","zz", "rr",
"phiphi" are overwritten by CudaMat. By default they now generate
cuda output. However this behaviour (and also of "ones_cuda" and
"zeros_cuda") can invidually be controlled by the global
variables:<br>
use_zeros_cuda=1; use_ones_cuda=1; use_newim_cuda=1;
use_newimar_cuda=1; use_xyz_cuda=1;<br>
<br>
<br>
<b> Configuration of nvcc:</b> <br>
Trying to compile the cuda file (e.g. by going to the cuda
directory and executing "applemantest(2)" you will get the error:<br>
<div id=":ew"> nvcc fatal : Visual Studio configuration file
'(null)' could not be found...."<br>
This can be fixed by creating a file named <br>
C:\Program Files (x86)\Microsoft Visual Studio
10.0\VC\bin\vcvars64.bat<br>
with the only text in it:<br>
CALL setenv /x64<br>
which you can also download <a href="vcvars64.bat">here</a>.<br>
<br>
see also<br>
<a
href="http://stackoverflow.com/questions/8900617/how-can-i-setup-nvcc-to-use-visual-c-express-2010-x64-from-windows-sdk-7-1"
target="_blank">http://stackoverflow.com/<wbr>questions/8900617/how-can-i-<wbr>setup-nvcc-to-use-visual-c-<wbr>express-2010-x64-from-windows-<wbr>sdk-7-1</a><br>
<span class="gI"><span email="[email protected]" class="gD"
style="color: rgb(0, 104, 28);"><br>
Testing the installation<br>
You should go to the CudaMat installation directory and type<br>
applemantest(1)<br>
After about 6 seconds you should have a nice image in front
of you.<br>
If the compilation is installed all correctly you can type<br>
applemantest(2)<br>
which will first recompile but then yield a result in a few
milliseconds. Running it again will make it even faster.<br>
</span></span></div>
<h2>Bug reports</h2>
If you find any bugs, please send them to me under<span
style="font-style: italic;"> heintzmannd at gmail dot com </span>stating
the system you were using as well as the version of CudaMat.
Please put 'CudaMat bug' in the subject line.
<h2>History of CudaMat and Acknowledgements<br>
</h2>
CudaMat started with the incentive to write faster deconvolution
software for microscopy image processing. Using the fft code
provided by NVidia, it quickly became clear that something more
general would be useful and the idea of CudaMat was born. CudaMat
was written by Rainer Heintzmann with discussions and
contributions from Martin Kielhorn, Kai Wicker, Wouter Caarls,
Bernd Rieger and Keith Lidke. <br>
<h2>Recent changes:</h2>
<ul>
<li>The first version <a href="CudaMat1_0_00.tgz">V 1.0.0beta</a>
was started around November 2008 and finished March 2009.</li>
<li><a href="CudaMat1_0_01.tgz">V 1.0.1beta</a> , bug fixes,
added <span style="font-style: italic;">newim</span> overload
and <span style="font-style: italic;">complex</span>
function.</li>
<li><a href="CudaMat1_0_02.tgz">V 1.0.2beta</a> , bug fixes,
added <span style="font-style: italic;">repmat</span> and
assignment and referencing with mask images <span
style="font-style: italic;">(subsref</span> and <span
style="font-style: italic;">subsasgn</span>) and <span
style="font-style: italic;">dip_fouriertransform</span>.</li>
<li><a href="CudaMat1_0_03.tgz">V 1.0.3beta</a>, bug fixes,
partial reduction functions (such as<span style="font-style:
italic;"> [m,mm]=max(cuda(readim('chromo3d')),[],3)</span> )
fully supported now. Also sum, max and min have now correct
performance for Matlab type arrays. Functions <span
style="font-style: italic;">phase</span> and <span
style="font-style: italic;">angle</span> were added. The
functions<span style="font-style: italic;"> zeros()</span>, <span
style="font-style: italic;">ones()</span> and <span
style="font-style: italic;">newim()</span> were renamed to<span
style="font-style: italic;"> zeros_cuda()</span>, <span
style="font-style: italic;">ones_cuda()</span> and <span
style="font-style: italic;">newim_cuda()</span> due to
conflicts with the native code of dipimage and Matlab.</li>
<li><a href="CudaMat1_0_04.tgz">V 1.0.4beta</a>, made the file
cuda_cuda.c compatible with older style ANSI C, as it would
previously not compile under some compilers which require
declarations at the beginning of a block.</li>
<li><a
href="http://www.nanoimaging.de/CudaMat/CudaMat1_0_05.tgz">V
1.0.5beta</a>, a few bug fixes. Introduced the first version
of on-the-fly compilation (commands: 'cuda_define' and
'cuda_compile_all') for new cuda functions and included an
impressive example (speedup 54000) by the command appleman(2)</li>
<li><a
href="http://www.nanoimaging.de/CudaMat/CudaMat1_0_06.tgz">V
1.0.6beta</a>, bug fixes. Added support for CULA, the cuda
lapack library, which needs to be installed. svd and equation
system solving ("\" and "/", i.e. mldivide and mrdivide).
Binary function on-the-fly compilation is now possible.
Updated installation instructions and web page.</li>
<li><a
href="http://www.nanoimaging.de/CudaMat/CudaMat1_0_07.zip">V
1.0.7beta</a>, bug fixes. Added functions (e.g. circshift).
Improved the performance significantly by using an internal
heap. Half-complex ffts are now available ("rft" and "rift").
They are fast and memory-efficient. Deconvolution toolbox now
works with cudaMat. Now available as a zip file.</li>
<li><a
href="http://www.nanoimaging.de/CudaMat/CudaMat1_0_08.zip">V
1.0.8beta</a>, bug fix. </li>
<li><a
href="http://www.nanoimaging.de/CudaMat/CudaMat1_1_00.zip">V
1.1.0beta</a>, bug fixes (especially memory bug for reduce
operations in older versions). New generator functions xx, yy,
zz, rr and phiphi. These are now overloaded DIPImage
functions. The same holds for newim and newimar, which are
from now on (sorry for no backward compatibility here!)
overloaded. Funktions "disableCuda()" and "enableCuda()" where
introduced, which allow to easily switch off and on the use of
cuda. New functions introduced (real and complex datatype):
sin, cos, sinh, cosh. Also mpower (only partially implemented)
was added. reshape bug was fixed and the function permute was
implemented.</li>
<li><a
href="http://www.nanoimaging.de/CudaMat/CudaMat1_1_01.zip">V
1.1.1beta</a>, bug fixes (plus a complex number was buggy
adn the sum function had hickups). Introduced the rfftshift
and rifftshift functions.</li>
<li><a
href="http://www.nanoimaging.de/CudaMat/CudaMat1_1_02.zip">V
1.1.2beta</a>, bug fixes. Introduced "initCuda()" function,
which should be started in the startup.m file. disableCuda()
and enableCuda() allow easy turn on and turn off of CudaMat.</li>
<li><a
href="http://www.nanoimaging.de/CudaMat/CudaMat1_1_03.zip">V
1.1.3beta</a>, bug fixes (especially the subassign
function). The cuda_compile_all() function now uses the local
temp directory to store the user-defined cuda sources and