You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.
6683721
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JuliaRegistrator register
6683721
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registration pull request created: JuliaRegistries/General/113958
Tip: Release Notes
Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.
To add them here just re-invoke and the PR will be updated.
Tagging
After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.
This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:
6683721
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
412000
ns411750
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
324166
ns243959
ns1.33
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
322604.5
ns323604
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
740459
ns741209
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
44259
ns44008
ns1.01
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
1321500
ns1392458
ns0.95
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
2457333
ns1249333
ns1.97
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
14240875
ns14034875
ns1.01
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
2200812.5
ns2247000
ns0.98
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
208145
ns206485
ns1.01
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
1417146
ns1411375
ns1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
938625
ns949209
ns0.99
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
1523833
ns1539667
ns0.99
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
2268625
ns2262146
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1759166.5
ns1751333.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1089459
ns1096875
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1525291.5
ns1541583
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2994187.5
ns3026749.5
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
209673
ns209127
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12094750
ns12111771
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
8806875
ns8833083
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9239291
ns9198584
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18581458
ns18601167
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1488235.5
ns1480357.5
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17307520.5
ns17231270.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
13967750
ns13987541.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14528333
ns14519729
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21842125
ns21836292
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250015541.5
ns250395646
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148333104
ns148855375
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
116106000
ns115834062
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447467333
ns446839208
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5463022
ns5444163
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1184048458
ns1176608458
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
987050292
ns976012000
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
853459771
ns837397979.5
ns1.02
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1759966833
ns1759902458
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
31128789
ns31490812.5
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1134468583
ns1129305209
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1007793771
ns991324229.5
ns1.02
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1296264208
ns1295080375.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1736536375
ns1730828646
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
1099667
ns1075249.5
ns1.02
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
1640166
ns1662353.5
ns0.99
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
3652875
ns3521959
ns1.04
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
787583
ns782750
ns1.01
lenet(28, 28, 1, 32)/forward/GPU/CUDA
266751
ns268581
ns0.99
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
3019833
ns3020312.5
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
4176833
ns4174708
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
10973896
ns11483792
ns0.96
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3301792
ns3174584
ns1.04
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1182347
ns1187325
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
2286833.5
ns2334458.5
ns0.98
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1435167
ns1326625
ns1.08
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1676229
ns1671667
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
4190875
ns4228083
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
209336.5
ns208877
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
19397292
ns19371042
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
16083708.5
ns16106687.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
17390250
ns17334333
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
25860291
ns25864812.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1591006
ns1587675
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
34232625
ns33974334
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
30842208
ns30652312
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
31266750
ns30965958
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
36658416
ns36591917
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
4529667
ns4502750
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2779874.5
ns2520667
ns1.10
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2911875
ns2914750
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
8371875
ns8397770.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
421528
ns422071
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
38980750.5
ns38880875
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
32266104
ns32118083
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
32323083
ns32210354
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
51956541
ns51886833.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2622867
ns2617174
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
89073083
ns88740458.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
114658792
ns114655499.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
228331667
ns222624292
ns1.03
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
74438750
ns74153520.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
268170542
ns267012709
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
159370333
ns156293291
ns1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
126596000
ns126425563
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
489032083
ns484968208
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
7010105
ns7022844
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1476972354.5
ns1472853458
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
1173299958
ns1171430875
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
1063552375.5
ns1066813500
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
2013745249.5
ns2007065229.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
34689426.5
ns34464520
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1691606583
ns1687201334
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1531432625
ns1531380729
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1826034958
ns1779981833
ns1.03
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
2222547417
ns2205561250
ns1.01
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
2015500
ns2055417
ns0.98
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
3021583
ns3039333
ns0.99
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
8014708.5
ns6418334
ns1.25
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2428062.5
ns2491084
ns0.97
lenet(28, 28, 1, 128)/forward/GPU/CUDA
270747.5
ns270182.5
ns1.00
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
9364083
ns9710917
ns0.96
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
12044396.5
ns12102375
ns1.00
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
24967708
ns24324021
ns1.03
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
11780916.5
ns11813792
ns1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1257862
ns1260525.5
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
383008958
ns379862541
ns1.01
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
292145604.5
ns310947959
ns0.94
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
236038125
ns239644500
ns0.98
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
452557520.5
ns453270542
ns1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4850720
ns4854774.5
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
1285445416
ns1326926500
ns0.97
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
1002736292
ns962218875
ns1.04
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
859907625
ns954450208
ns0.90
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
1579569083
ns1593232541
ns0.99
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
18356160
ns19082921
ns0.96
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1411708
ns1392292
ns1.01
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
2105292
ns1700416
ns1.24
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
4691958
ns5764584
ns0.81
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1379542
ns1353979
ns1.02
lenet(28, 28, 1, 64)/forward/GPU/CUDA
271009
ns270953.5
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
6609291.5
ns6765209
ns0.98
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
12489687.5
ns13257604.5
ns0.94
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
20500208
ns19997334
ns1.03
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
6137500
ns6085271
ns1.01
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1323036
ns1315018
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70468999.5
ns70450771.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43540000
ns43794458
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39560750
ns39565125
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132524459
ns132519812
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1883621
ns1877581
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
385927667
ns383421896
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
301384020.5
ns297391833.5
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
280325709
ns282075208
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
535114146
ns534360167
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
12284164
ns12294712.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
415345667
ns407452917
ns1.02
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
391116104
ns368882167
ns1.06
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
700495875.5
ns664901875
ns1.05
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
710476750
ns711106916
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
1197297083
ns1188807000
ns1.01
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
699386042
ns829881458
ns0.84
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
633884500
ns629069625
ns1.01
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
1862351209
ns1864484709
ns1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12303587
ns12531429
ns0.98
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
3586148500
ns3583219562.5
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
2755457042
ns2743701542
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
2695342250
ns2801027834
ns0.96
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
4936198167
ns5095250291
ns0.97
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
49530988
ns49598783
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3410084
ns3396375
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2070063
ns2056770.5
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2514250
ns2516292
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6005542
ns6032625
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
290096.5
ns288270
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
25455000
ns25431417
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
18660291
ns18519687.5
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
18838271
ns18816834
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
38883542
ns38902042
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2476777
ns2461496
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
54080542
ns53959750
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
79047875
ns80411625
ns0.98
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
172454667
ns170419166.5
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
45387917
ns45563604
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1659041
ns1774500
ns0.93
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1103229.5
ns1086541.5
ns1.02
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1539041.5
ns1585833
ns0.97
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3029625
ns3036958.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
212328
ns210199.5
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12461896
ns12515229.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9201125
ns9203541
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9593125
ns9648916
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18956624.5
ns18975125
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1542747
ns1537145.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17493333
ns17611666
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14286958
ns14341209
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14556291
ns14587625.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
22159583
ns22164499.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70478895.5
ns70184479
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43472958
ns43685354
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39536104
ns39447354
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132668666.5
ns132435229.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1893825.5
ns1876651
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
359326208
ns363077416
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
304163583
ns286830229.5
ns1.06
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
286141562.5
ns287419458
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
623006583
ns619680250
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13366172
ns13399859
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
422001604
ns417202479
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
419915667
ns427236167
ns0.98
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
690603396
ns702264812
ns0.98
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
715494834
ns716642625
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
1482979
ns1597458
ns0.93
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
1220187.5
ns1041875
ns1.17
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
1164583
ns1238521
ns0.94
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
2192333
ns2311000
ns0.95
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
584838
ns591624
ns0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
8835750
ns8828125
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
12790625
ns13456667
ns0.95
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
30251979
ns30478124.5
ns0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
9820166
ns9827250
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1434577.5
ns1454764
ns0.99
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
16799396
ns17855792
ns0.94
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
17260041.5
ns17325084
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
29571958
ns28978667
ns1.02
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
13433125.5
ns14301167
ns0.94
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
796958.5
ns785166.5
ns1.02
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
518145.5
ns635083
ns0.82
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
1037104
ns1023416
ns1.01
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
726083
ns724437.5
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
47492.5
ns48101
ns0.99
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
1538395.5
ns1546042
ns1.00
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
1012813
ns1039334
ns0.97
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
1870375
ns1418437.5
ns1.32
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
2191167
ns2186167
ns1.00
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
232459.5
ns237446
ns0.98
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
1704104
ns1701854
ns1.00
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
1240521
ns1239604.5
ns1.00
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
2293542
ns1785437.5
ns1.28
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
2258583
ns2312500
ns0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3312229
ns3387542
ns0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2051417
ns2038042
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2523041
ns2513500
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6001521
ns6020041
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
282441
ns285597
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
23984333
ns24084791
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
17181209
ns17173834
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
17275083.5
ns17124375
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
37526208
ns37508333
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2394472
ns2411179
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
52515208.5
ns52430334
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
78819125
ns80022625
ns0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
170453417
ns168792792
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
44401084
ns44511146
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
249599771
ns248838833
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148233249.5
ns148459125
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
115724417
ns115539229
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
453949687
ns447599646
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5444539
ns5455438
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1133035459
ns1123889625
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
886883229
ns882004229
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
803351208
ns805342291
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1759720167
ns1746632750
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
28907451
ns29283460
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1060040999.5
ns1005172333.5
ns1.05
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
980328375
ns985652583
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1300157375
ns1246518292
ns1.04
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1738857750
ns1720077833.5
ns1.01
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1307791
ns1224042
ns1.07
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
904958
ns780375
ns1.16
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
905041.5
ns903792
ns1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
1946625
ns1941500
ns1.00
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
564216
ns574544.5
ns0.98
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
5887083
ns5625979
ns1.05
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
6679291.5
ns8687417
ns0.77
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
24152125
ns23834542
ns1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
7076792
ns7099270.5
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1369695
ns1400097
ns0.98
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
11316000.5
ns10299146
ns1.10
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
9070125
ns10509708.5
ns0.86
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
17109084
ns16674333
ns1.03
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
8181250
ns8726562.5
ns0.94
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
484209
ns386792
ns1.25
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
483542
ns494500
ns0.98
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
2022333.5
ns2152000
ns0.94
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
89541
ns88104.5
ns1.02
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
27791
ns27980
ns0.99
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
385084
ns339145.5
ns1.14
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
431542
ns436062.5
ns0.99
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
4556458.5
ns4118937.5
ns1.11
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
262041.5
ns261500
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
219615
ns223966.5
ns0.98
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
707167
ns643312.5
ns1.10
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
710063
ns709125
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
1060083.5
ns884729
ns1.20
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
444625
ns445958
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
431166.5
ns331291
ns1.30
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
425375
ns438416
ns0.97
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
743313
ns601249.5
ns1.24
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
54583
ns53958
ns1.01
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
27936
ns28317
ns0.99
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
339062.5
ns277271
ns1.22
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
321875
ns319417
ns1.01
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
869459
ns679875.5
ns1.28
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
154000
ns153292
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
204132
ns208854.5
ns0.98
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
403687.5
ns344375
ns1.17
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
385604.5
ns389750
ns0.99
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
1000750
ns870042
ns1.15
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
174292
ns174063
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
605442833
ns602013959
ns1.01
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
433469166.5
ns430551375
ns1.01
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
371264875
ns375744687.5
ns0.99
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
873652000
ns873334145.5
ns1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7020347
ns7025763
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
2058909438
ns2078756625
ns0.99
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
1575303375
ns1607808875
ns0.98
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
1603582812.5
ns1638666770.5
ns0.98
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
2764206375
ns2782335083
ns0.99
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
25821907
ns25908154
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
521958
ns518875
ns1.01
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
433458
ns395396
ns1.10
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
1910708.5
ns1924520.5
ns0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
868375.5
ns865667
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
46967
ns46907
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1805874.5
ns1851083
ns0.98
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
2512750.5
ns1779896
ns1.41
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
14720729.5
ns14384125
ns1.02
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
2772083
ns2660187.5
ns1.04
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
246589
ns247071
ns1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
2754708
ns2699458.5
ns1.02
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
2247625
ns2245375
ns1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
3885729
ns3691833
ns1.05
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
3383500
ns3398958
ns1.00
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1492500
ns1486833.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
1177000
ns933395.5
ns1.26
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
1171916
ns1185562.5
ns0.99
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2316833
ns2210083
ns1.05
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
585700
ns550416
ns1.06
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
5795062
ns5783000
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
4639479
ns7999604
ns0.58
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
24514833
ns23905084
ns1.03
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
7281062.5
ns7315812.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1345001
ns1359665.5
ns0.99
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
12960708
ns12501603.5
ns1.04
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
10744333
ns12176000
ns0.88
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
20566896
ns20858687.5
ns0.99
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
10662416.5
ns10743417
ns0.99
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
2937.5
ns3916.5
ns0.75
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
3083
ns2875
ns1.07
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
3792
ns5104.5
ns0.74
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
2666
ns2645.5
ns1.01
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
24904
ns22876
ns1.09
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
8583
ns8541
ns1.00
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
9354
ns8500
ns1.10
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
9000
ns8583.5
ns1.05
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
8708
ns8625
ns1.01
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
208868.5
ns209808.5
ns1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
16770.5
ns16667
ns1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
16667
ns16875
ns0.99
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
17000
ns16708
ns1.02
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
10958
ns10792
ns1.02
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
10270.5
ns10166.5
ns1.01
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
13333
ns15625
ns0.85
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
10917
ns11375
ns0.96
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
7458
ns7625
ns0.98
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
24820
ns24722
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
22417
ns22270.5
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
22333
ns22333
ns1
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
22667
ns22667
ns1
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
22500
ns22500
ns1
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
229875
ns230109
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
52375
ns52292
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
52334
ns52458
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
52812.5
ns52250
ns1.01
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
44146
ns43916
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
29354.5
ns28708
ns1.02
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
29292
ns29083
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
29562.5
ns29708
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
46270.5
ns46250
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
25924
ns25756.5
ns1.01
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
210354
ns207687.5
ns1.01
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
259833.5
ns259000
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
4170916
ns4070042
ns1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
147875
ns147583
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
216626
ns223667.5
ns0.97
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
309041
ns309125
ns1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
284542
ns289833
ns0.98
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
751146
ns766104
ns0.98
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
161083
ns161834
ns1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
2125
ns2000
ns1.06
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
4125
ns2292
ns1.80
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
2583
ns4416
ns0.58
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
1791.5
ns2083
ns0.86
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
23179
ns22925
ns1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
7375
ns7604.5
ns0.97
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
7792
ns7208
ns1.08
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
8000
ns7542
ns1.06
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
7417
ns7333
ns1.01
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
268810.5
ns270317
ns0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
11500
ns11541
ns1.00
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
11375
ns11542
ns0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
11834
ns11542
ns1.03
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
7125
ns7125
ns1
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
79982812
ns79878271
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
49064104
ns47895875
ns1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
44980062
ns44952396.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
151479750
ns151396791
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2652652.5
ns2712095.5
ns0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
605980917
ns498390209
ns1.22
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
414050750
ns410182750
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
397397813
ns398143833
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
685097750
ns683908500
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
14638421
ns14599220
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
686689124.5
ns686490667
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
677754209
ns660533166
ns1.03
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
1006185875
ns1012950542
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
999323084
ns997113125
ns1.00
This comment was automatically generated by workflow using github-action-benchmark.