Skip to content

linear2d_layer_no_bias: add optional argument to disable biases #212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

OneAdder
Copy link
Collaborator

Add option to disable biases

Simple additional option. Removing biases helps speed up inference. This technique is used in Llama and, to an extent, in Qwen. Being able to disable biases here is a simple quality of life addition

@jvdp1
Copy link
Collaborator

jvdp1 commented Mar 5, 2025

Being able to disable biases here is a simple quality of life addition

Should we extend this to all types of layers?

@OneAdder
Copy link
Collaborator Author

@jvdp1 yep, I think we should

@OneAdder
Copy link
Collaborator Author

Perhaps, I should rename it to FeedForward or MLP. Thoughts?

@milancurcic
Copy link
Member

milancurcic commented Mar 19, 2025

I don't think MLP is appropriate because it refers to a network. We already have dense. Since you mention naming, and I don't recall if we discussed this in the original linear2d PR: linear layer is just a special case of a dense layer without an activation (which we have via the linear activation function). Should this be called dense2d for consistency?

Regarding the question of whether this should be an option for all layers. My opinion depends on whether this is used more broadly in various layers. If it's used only in dense2d (or linear2d) layers, this option should be available only here.

@milancurcic milancurcic reopened this Mar 19, 2025
@milancurcic
Copy link
Member

This also means that we could have a generic maxpool after all. We could make it so that maxpool1d is invoked by maxpool(pool_size=2), while maxpool2d is invoked by maxpool(pool_size=[2, 2]), because a scalar and array are type-kind-rank distinguishable. However, using this pattern we couldn't extend this to maxpool3d because maxpool(pool_size=[2, 2, 2]) would not be distinguishable from the 2d variant.

Alternatively, we could do:

maxpool(pool_width=2)  ! maxpool1d
maxpool(pool_width=2, pool_height=2)  ! maxpool2d
maxpool(pool_width=2, pool_height=2, pool_depth=2)  ! maxpool3d

Same approach could apply to conv layers.

The more I think about this the more I like it. What do you think?

@OneAdder
Copy link
Collaborator Author

@milancurcic Disregard the naming comment. It belongs in a different PR: #208
I like the maxpool ideas

@OneAdder
Copy link
Collaborator Author

@milancurcic Should we merge it? I really would like to do it before the dense2d refactoring

@milancurcic
Copy link
Member

Yes, that's fine, please go ahead and squash-merge, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants