-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "a" parameter to softplus() #83 #85
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how widely use this variant is (and whether there are other commonly used alternatives, the issue mentions also Liu and Ferber 2016?). If it's added, we should make to sure to test it and to also add support for it in the ChainRules, InverseFunctions, and ChangesOfVariables extensions.
src/basicfuns.jl
Outdated
@@ -165,9 +165,14 @@ Return `log(1+exp(x))` evaluated carefully for largish `x`. | |||
This is also called the ["softplus"](https://en.wikipedia.org/wiki/Rectifier_(neural_networks)) | |||
transformation, being a smooth approximation to `max(0,x)`. Its inverse is [`logexpm1`](@ref). | |||
|
|||
The generalized `softplus` function (Wiemann et al., 2024) takes an additional optional parameter `a` that control |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume there exist earlier references for this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through Liu and Farber to double-check
From my understanding (ML is not my field), they validate "noisy softplus" as an improvement over other activation functions for neurons in NNs.
However, it seems like they named the a
parameter sigma. Their plot looks similar but different in terms of values (?)
I'm not sure how widely use this variant
I share your concern here, I'm also careful not to add niche features to such a base package and add maintaining burden.
I can't say how commonly the generalized version is already used, its development seems fairly recent.
However, I can see its usefulness in quite a lot of cases: the default softplus only becomes close to identity after x > 2, and from experience we often do model parameters smaller than that (typical sigmas in neuroscience/psychology are like between 0 and 1), so using adjusted softplus links would make sense in these contexts. I suppose it's a tradeoff between the complexity of the feature and its (potential) usage
Co-authored-by: David Widmann <[email protected]>
Co-authored-by: David Widmann <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the remaining items here are:
- Include the new docstrings in the documentation
- Add tests for
softplus
andinvsoftplus
- Add support for InverseFunctions for
softplus
andinvsoftplus
and test it - Add support for ChangesOfVariables for
softplus
andinvsoftplus
and test it
I think ChainRules support should not be needed since log1pexp
and log1mexp
are already supported, and we can expect AD to differentiate through the remaining parts of the functions.
Can you clarify? |
Kind bump |
Sorry, I missed your previous comment.
Since this PR adds new functions, we should also add definitions of |
I am not sure how to specify the ChangesofVariables one 🤔 |
Kind bump |
Since there is no preexisting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had re-reviewed the PR but apparently forgotten to submit the review on GH.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Following up the issues related to an exp link-function (TuringLang/Turing.jl#2310), it reinforced the idea that a softplus link could actually be a good alternative. However, I feel like implementing its generalized version (#83) would be key (useful when modelling small parameters), so here my shot at it.