-
-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set expand option as default for @layer
#2532
Conversation
@@ -27,7 +27,7 @@ export gradient | |||
CUDADevice, AMDGPUDevice, MetalDevice, oneAPIDevice, | |||
XLADevice, | |||
# get_device, # we define get_device here for retrocompatibility | |||
# gpu_backend!, # have to define here due to https://github.com/JuliaPackaging/Preferences.jl/issues/39 | |||
gpu_backend!, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated change: with v0.15 we don't need to define gpu_backend!
, we can just reexport the one from MLDataDevices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These break, and probably others. Before @layer
there was lots of special-case code to catch all of them.
julia> LayerNorm(10)
LayerNorm(10) # 20 parameters
julia> MultiHeadAttention(64 => 1024 => 1024, nheads = 8)
MultiHeadAttention(64 => 1024 => 1024; nheads=8) # 1_245_184 parameters
This PR:
julia> LayerNorm(10)
LayerNorm(
identity,
Scale(10), # 20 parameters
1.0f-5,
10,
true,
)
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2532 +/- ##
==========================================
+ Coverage 33.54% 33.56% +0.01%
==========================================
Files 31 31
Lines 1881 1871 -10
==========================================
- Hits 631 628 -3
+ Misses 1250 1243 -7 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
@layer
Ok now we have a I'm not sure only LayerNorm and MultiHeadAttention need |
This introduces the new show option
:noexpand
for@layer
and sets the default to:expand
instead.Fix #2531