-
Notifications
You must be signed in to change notification settings - Fork 63
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
2 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
e4bd1af
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
412125
ns414875
ns0.99
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
322375
ns321479
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
321625
ns322521
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
739375
ns740000
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
44132
ns40861
ns1.08
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
647917
ns1343250
ns0.48
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
2404667
ns2434250
ns0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
13901084
ns474937.5
ns29.27
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
2211917
ns2252271
ns0.98
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
201549
ns182562
ns1.10
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
740042
ns1328292
ns0.56
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
2593084
ns2620521
ns0.99
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
14418542
ns610500
ns23.62
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
2199209
ns2229562.5
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1526583
ns1765917
ns0.86
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1096708
ns1031334
ns1.06
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1529625
ns1365416
ns1.12
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3028083
ns2818125
ns1.07
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
210375.5
ns204521
ns1.03
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12223834
ns12152917
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
8813167
ns8828833
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9206687.5
ns9300834
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18597853.5
ns18599875
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1948580
ns1492272
ns1.31
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17338770.5
ns17275187
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
13950583.5
ns13914875
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14476791.5
ns14281833
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21850833
ns21819042
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
124925271
ns250296521
ns0.50
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148389000
ns148101750
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
115877562.5
ns148130792
ns0.78
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447112875
ns448565625
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5460574
ns5496241
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
600322042
ns1226292708
ns0.49
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
930867334
ns930446334
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
825580604
ns443990041
ns1.86
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1687470250.5
ns1653613542
ns1.02
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
31224338
ns35420264
ns0.88
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
706851312.5
ns1147479875
ns0.62
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
988058125.5
ns996058750
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1348418729
ns629339646
ns2.14
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1806342854
ns1740843604
ns1.04
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
863834
ns1116250
ns0.77
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
1622583.5
ns1624229.5
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
3450625
ns1206375.5
ns2.86
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
784875
ns782041
ns1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA
267055.5
ns260633
ns1.02
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
2714792
ns2984374.5
ns0.91
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
4119812
ns4127166
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
10424458
ns3295208.5
ns3.16
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3144166
ns3137625
ns1.00
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1090149.5
ns1049614
ns1.04
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
2166312
ns2315396
ns0.94
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1479000
ns1424437
ns1.04
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1744292
ns1685208
ns1.04
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
4339875
ns4196250
ns1.03
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
208596
ns208669.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
20428875
ns19413145.5
ns1.05
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
16963479
ns16084375
ns1.05
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
17405708
ns17133041.5
ns1.02
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
26734729
ns25866542
ns1.03
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
2018993
ns1576194
ns1.28
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
45033583
ns34217167
ns1.32
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
40993666.5
ns30754459
ns1.33
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
41173500
ns31341542
ns1.31
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
47738437
ns37132709
ns1.29
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
4301666.5
ns4525792
ns0.95
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2844667
ns2744125
ns1.04
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2996709
ns2881375
ns1.04
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
8653334
ns8371458
ns1.03
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
472874
ns423036
ns1.12
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
40060542
ns38892667
ns1.03
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
33920959
ns32085104.5
ns1.06
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
33907687.5
ns32057770.5
ns1.06
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
53575541.5
ns52159979.5
ns1.03
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
3254220
ns2618584
ns1.24
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
90139000
ns89172458
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
135574958.5
ns113776875
ns1.19
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
249787833
ns62985709
ns3.97
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
96223792
ns74986500
ns1.28
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
142522459
ns268884125
ns0.53
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
161123167
ns159000000
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
128478042
ns158925750
ns0.81
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
493238750
ns486715083
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
7031961.5
ns6941165
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
881412625
ns1474467645.5
ns0.60
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
1203181667
ns1134657750
ns1.06
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
1089986000.5
ns687890791.5
ns1.58
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
2129205729
ns2033574500
ns1.05
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
34708690
ns33495275
ns1.04
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1668841500
ns1720167208
ns0.97
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1865068750
ns1551435312.5
ns1.20
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
2075940833.5
ns1147814729
ns1.81
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
2608730625
ns2245015792
ns1.16
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
1545708
ns2039500
ns0.76
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
3042541
ns3006583
ns1.01
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
7339916
ns1618791.5
ns4.53
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2318125
ns2424854.5
ns0.96
lenet(28, 28, 1, 128)/forward/GPU/CUDA
277569.5
ns258194
ns1.08
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
7874959
ns9325667
ns0.84
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
12022125
ns11994291.5
ns1.00
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
23765959
ns7128604
ns3.33
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
11654708
ns11753792
ns0.99
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1196174
ns1096609.5
ns1.09
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
186253812
ns380363500.5
ns0.49
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
283266353.5
ns286893354
ns0.99
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
242835500
ns129833291
ns1.87
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
463794333
ns456069146
ns1.02
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4830735
ns5018425
ns0.96
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
630927250
ns1154815958
ns0.55
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
990257541
ns934037667
ns1.06
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
1035740417
ns609039458
ns1.70
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
1415342041
ns1585642292
ns0.89
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
16300060
ns19065478
ns0.85
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1085229
ns1049833.5
ns1.03
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
2098166
ns2073542
ns1.01
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
4972000
ns1348479.5
ns3.69
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1299500
ns1287021
ns1.01
lenet(28, 28, 1, 64)/forward/GPU/CUDA
278783
ns259724.5
ns1.07
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
6008145.5
ns6258792
ns0.96
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
12421208
ns12411416
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
20005041
ns4953146
ns4.04
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
6082792
ns6086709
ns1.00
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1220466
ns1149352.5
ns1.06
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
23693938
ns70546083
ns0.34
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43500791.5
ns43491792
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39526833.5
ns37811479.5
ns1.05
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132823145.5
ns134717229.5
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1948314
ns1859024
ns1.05
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
184396041
ns355554354.5
ns0.52
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
270116291
ns270317625
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
253589145.5
ns146113896
ns1.74
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
534281562.5
ns537066979.5
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13222993
ns12142155.5
ns1.09
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
297123437
ns396257791
ns0.75
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
404377895.5
ns404428375.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
696065958
ns302176729
ns2.30
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
713613916
ns712116709
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
656595541
ns1190477625
ns0.55
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
689413604.5
ns689814958.5
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
634330625
ns404795334
ns1.57
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
1789031312.5
ns1876404250
ns0.95
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12386066
ns12324333
ns1.01
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
1908648333.5
ns3610008479.5
ns0.53
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
2827932125
ns2831662833
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
2698654250
ns1516977229.5
ns1.78
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
5716413416
ns5143819000
ns1.11
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
49345511
ns50066391.5
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3047688
ns3345708.5
ns0.91
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2062437
ns2078625
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2519583
ns2287083
ns1.10
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6053042
ns6026917
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
574063
ns330146
ns1.74
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
25654333
ns25733291.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
19054583.5
ns18989125
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
19323500
ns19553792
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
39330000
ns39739583.5
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
3195551.5
ns2459398
ns1.30
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
35130041.5
ns54593479
ns0.64
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
82097417
ns78905375
ns1.04
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
168348625
ns29660083.5
ns5.68
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
45591875
ns45812146
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1644375
ns1660583.5
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1090250
ns1105770.5
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1572750
ns1392229
ns1.13
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3038167
ns3035959
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
214850
ns210818
ns1.02
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12701083
ns12525958.5
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9189625
ns9221375
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9640458.5
ns9699583
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18968854.5
ns19002416.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1987617.5
ns1509113
ns1.32
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17682875
ns17662604.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14327834
ns14311479
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14625958
ns14590875
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
22177500
ns22225541
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
23739833.5
ns70524583
ns0.34
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43469541
ns43452458
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39647750
ns37882479.5
ns1.05
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132812271.5
ns132685187
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1879600
ns1859436
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
189384875
ns359287667
ns0.53
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
346944938
ns347693812.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
303748958
ns197401167
ns1.54
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
748909417
ns730607333
ns1.03
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
14283912.5
ns13254127
ns1.08
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
302085833
ns420436292
ns0.72
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
421708625
ns419235583
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
689499625
ns310533750
ns2.22
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
719890000
ns718184500
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
1926375
ns1442542
ns1.34
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
1579042
ns1346416.5
ns1.17
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
1571792
ns1331812.5
ns1.18
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
2497917
ns2403021
ns1.04
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
573991
ns549048
ns1.05
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
6186000
ns8851250
ns0.70
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
13018375
ns12939667
ns1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
31151958
ns5552708
ns5.61
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
9378042
ns9880416.5
ns0.95
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1403069
ns1258951
ns1.11
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
18793000
ns16575062
ns1.13
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
27709979.5
ns20954208
ns1.32
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
49574542
ns13338833
ns3.72
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
18852542
ns13092416
ns1.44
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
68959
ns822708
ns0.08381953256805574
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
541125
ns528084
ns1.02
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
1011562
ns71146
ns14.22
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
728542
ns725750
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
47294
ns46414.5
ns1.02
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
277500
ns1506500
ns0.18
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
988020.5
ns1020854
ns0.97
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
1388416.5
ns323833
ns4.29
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
2250812
ns2281417
ns0.99
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
225164
ns211160.5
ns1.07
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
407500
ns1512416
ns0.27
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
1045583
ns1090125
ns0.96
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
1418917
ns446562.5
ns3.18
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
2256958
ns2259375
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3042083
ns3176750
ns0.96
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2062771
ns2053979
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2510104.5
ns2268708
ns1.11
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6011000
ns6008875
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
564983
ns282441.5
ns2.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
23609021
ns24059292
ns0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
17178792
ns17235458
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
17120458
ns16956292
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
37462729
ns37778228.5
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
3146695
ns2390107
ns1.32
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
33304750
ns52955708.5
ns0.63
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
83679583.5
ns84900333
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
167872042
ns27496312.5
ns6.11
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
44785187.5
ns44513375.5
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
120247125
ns250307750
ns0.48
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148479500
ns148084625
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
115610813
ns148444250
ns0.78
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447816417
ns455285000
ns0.98
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5450922
ns5327018
ns1.02
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
470730291
ns1102117541
ns0.43
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
856712645.5
ns856978792
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
825513875.5
ns437778208
ns1.89
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1750589417
ns1768146583
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
28864938
ns33525724
ns0.86
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
640143291
ns1027855937
ns0.62
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
964190458
ns965570792
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1286413958
ns584455270.5
ns2.20
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1842051438
ns1726926104.5
ns1.07
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1241583
ns1135584
ns1.09
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
917166
ns989209
ns0.93
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
906584
ns923667
ns0.98
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
1938583
ns2052500
ns0.94
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
553409.5
ns548882.5
ns1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
2941333
ns5867833
ns0.50
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
6314437.5
ns6531896
ns0.97
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
24719833.5
ns2613541.5
ns9.46
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
7090125
ns7097417
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1346593.5
ns1222578
ns1.10
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
6639250
ns9683896
ns0.69
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
13128667
ns13118666
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
30481375
ns6497583
ns4.69
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
7632854
ns7614083.5
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
39042
ns512667
ns0.07615469690851956
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
372792
ns391292
ns0.95
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
1833875
ns32750
ns56.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
91792
ns87812.5
ns1.05
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
27047.5
ns25759
ns1.05
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
175458
ns382125
ns0.46
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
455792
ns444875
ns1.02
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
4338875
ns160875
ns26.97
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
272792
ns258750
ns1.05
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
210187.5
ns188723
ns1.11
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
441709
ns420291.5
ns1.05
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
728375
ns475750
ns1.53
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
4896125
ns194375
ns25.19
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
511041.5
ns270958
ns1.89
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
12416.5
ns461312.5
ns0.026915594092941336
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
303334
ns326666.5
ns0.93
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
721771
ns14792
ns48.79
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
55209
ns54145.5
ns1.02
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
27615.5
ns26082
ns1.06
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
25917
ns340312
ns0.07615658572133807
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
336500
ns342500
ns0.98
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
850083
ns25958.5
ns32.75
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
151500
ns151625
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
198567.5
ns181930
ns1.09
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
45208.5
ns357792
ns0.13
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
351625
ns357833
ns0.98
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
712459
ns46437.5
ns15.34
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
151084
ns151209
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
318202459
ns602226667
ns0.53
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
430387020.5
ns427648645.5
ns1.01
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
368378458.5
ns207084708
ns1.78
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
883484291
ns882976625
ns1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7628205
ns6984740
ns1.09
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
1097576562.5
ns1997486771
ns0.55
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
1620619666.5
ns1621644791.5
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
1583682354
ns856167166
ns1.85
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
2698758083
ns2637178042
ns1.02
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
26674131
ns26468421.5
ns1.01
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
189813
ns520062.5
ns0.36
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
443792
ns429271
ns1.03
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
1747875
ns166000
ns10.53
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
873374.5
ns866083
ns1.01
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
46821
ns46206
ns1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1205958.5
ns1874625
ns0.64
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
2354667
ns2508792
ns0.94
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
14475333.5
ns1021958
ns14.16
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
2826417
ns2650063
ns1.07
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
237435.5
ns217141.5
ns1.09
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
2299604.5
ns1862417
ns1.23
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
5735750
ns5033959
ns1.14
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
14836917
ns1161917
ns12.77
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
3683375
ns2752500
ns1.34
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1579292
ns1462229
ns1.08
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
1180250
ns1192834
ns0.99
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
1174479
ns1192667
ns0.98
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2370125
ns2221791
ns1.07
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
570253.5
ns550464
ns1.04
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
3184000
ns5883792
ns0.54
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
4719584
ns4676563
ns1.01
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
24816709
ns2871000
ns8.64
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
7307438
ns7325000.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1344428.5
ns1196239.5
ns1.12
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
8830562.5
ns11670958.5
ns0.76
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
15640333.5
ns16372334
ns0.96
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
34223791
ns8780584
ns3.90
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
9547375
ns9544250
ns1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
2209
ns2458
ns0.90
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
2167
ns2542
ns0.85
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
3541
ns2875
ns1.23
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
2625
ns4625
ns0.57
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
24463
ns22670
ns1.08
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
7000
ns6916
ns1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
6833
ns7083
ns0.96
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
7292
ns7250
ns1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
7167
ns7333
ns0.98
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
202989.5
ns180475.5
ns1.12
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
8334
ns8250
ns1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
8250
ns8292
ns0.99
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
8375
ns8542
ns0.98
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
6041
ns6125
ns0.99
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
10583
ns10916.5
ns0.97
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
15875
ns12625
ns1.26
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
10333
ns10459
ns0.99
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
7625.5
ns9729
ns0.78
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
24500
ns22420
ns1.09
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
21542
ns19916
ns1.08
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
21625
ns19875
ns1.09
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
21750
ns19958
ns1.09
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
21667
ns20000
ns1.08
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
221414.5
ns195313
ns1.13
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
56833
ns23542
ns2.41
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
53708
ns23541
ns2.28
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
53625
ns27125
ns1.98
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
51583.5
ns21334
ns2.42
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
28834
ns28834
ns1
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
28584
ns28708
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
28458
ns29042
ns0.98
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
46708
ns46291
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
25617
ns23925
ns1.07
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
44375
ns224750
ns0.20
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
274708
ns276542
ns0.99
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
4275000
ns44250
ns96.61
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
145000
ns145000
ns1
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
206652.5
ns197967
ns1.04
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
68542
ns242125
ns0.28
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
292958
ns293916
ns1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
4229958
ns68604.5
ns61.66
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
145666
ns145584
ns1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
1833
ns1583
ns1.16
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
1750
ns2166
ns0.81
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
2500
ns2166.5
ns1.15
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
1666
ns4333.5
ns0.38
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
22972
ns20975.5
ns1.10
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
5208
ns5084
ns1.02
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
5167
ns5125
ns1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
5208
ns5209
ns1.00
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
5250
ns5500
ns0.95
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
244140
ns234449.5
ns1.04
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
8208
ns7375
ns1.11
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
7375
ns7458
ns0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
7542
ns8125
ns0.93
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
5292
ns5459
ns0.97
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
34124291
ns80045708
ns0.43
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
49799333
ns49037958.5
ns1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
45669229.5
ns42791749.5
ns1.07
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
153888625
ns151490583
ns1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2656121
ns2680013
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
481321500.5
ns606632959
ns0.79
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
424493583
ns411440583
ns1.03
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
412050834
ns292411917
ns1.41
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
724714916
ns737907354
ns0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
15594271
ns16971190.5
ns0.92
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
744920541
ns714524875
ns1.04
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
840757958.5
ns672104708
ns1.25
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
1131213854
ns580514646
ns1.95
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
1186689479.5
ns1012152875
ns1.17
This comment was automatically generated by workflow using github-action-benchmark.