linear2d_layer_no_bias: add optional argument to disable biases #212

OneAdder · 2025-02-28T15:59:42Z

Add option to disable biases

Simple additional option. Removing biases helps speed up inference. This technique is used in Llama and, to an extent, in Qwen. Being able to disable biases here is a simple quality of life addition

jvdp1 · 2025-03-05T18:22:06Z

Being able to disable biases here is a simple quality of life addition

Should we extend this to all types of layers?

OneAdder · 2025-03-15T20:13:04Z

@jvdp1 yep, I think we should

OneAdder · 2025-03-18T11:59:38Z

Perhaps, I should rename it to FeedForward or MLP. Thoughts?

milancurcic · 2025-03-19T21:24:54Z

I don't think MLP is appropriate because it refers to a network. We already have dense. Since you mention naming, and I don't recall if we discussed this in the original linear2d PR: linear layer is just a special case of a dense layer without an activation (which we have via the linear activation function). Should this be called dense2d for consistency?

Regarding the question of whether this should be an option for all layers. My opinion depends on whether this is used more broadly in various layers. If it's used only in dense2d (or linear2d) layers, this option should be available only here.

milancurcic · 2025-03-19T21:38:37Z

This also means that we could have a generic maxpool after all. We could make it so that maxpool1d is invoked by maxpool(pool_size=2), while maxpool2d is invoked by maxpool(pool_size=[2, 2]), because a scalar and array are type-kind-rank distinguishable. However, using this pattern we couldn't extend this to maxpool3d because maxpool(pool_size=[2, 2, 2]) would not be distinguishable from the 2d variant.

Alternatively, we could do:

maxpool(pool_width=2)  ! maxpool1d
maxpool(pool_width=2, pool_height=2)  ! maxpool2d
maxpool(pool_width=2, pool_height=2, pool_depth=2)  ! maxpool3d

Same approach could apply to conv layers.

The more I think about this the more I like it. What do you think?

OneAdder · 2025-03-26T16:17:00Z

@milancurcic Disregard the naming comment. It belongs in a different PR: #208
I like the maxpool ideas

OneAdder · 2025-04-13T16:59:41Z

@milancurcic Should we merge it? I really would like to do it before the dense2d refactoring

milancurcic · 2025-04-13T17:22:16Z

Yes, that's fine, please go ahead and squash-merge, thank you!

linear2d_layer_no_bias: add optional argument to disable biases

e1c226f

milancurcic closed this Mar 19, 2025

milancurcic reopened this Mar 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

linear2d_layer_no_bias: add optional argument to disable biases #212

linear2d_layer_no_bias: add optional argument to disable biases #212

OneAdder commented Feb 28, 2025

jvdp1 commented Mar 5, 2025

OneAdder commented Mar 15, 2025

OneAdder commented Mar 18, 2025

milancurcic commented Mar 19, 2025 •

edited

Loading

milancurcic commented Mar 19, 2025

OneAdder commented Mar 26, 2025

OneAdder commented Apr 13, 2025

milancurcic commented Apr 13, 2025

linear2d_layer_no_bias: add optional argument to disable biases #212

Are you sure you want to change the base?

linear2d_layer_no_bias: add optional argument to disable biases #212

Conversation

OneAdder commented Feb 28, 2025

Add option to disable biases

jvdp1 commented Mar 5, 2025

OneAdder commented Mar 15, 2025

OneAdder commented Mar 18, 2025

milancurcic commented Mar 19, 2025 • edited Loading

milancurcic commented Mar 19, 2025

OneAdder commented Mar 26, 2025

OneAdder commented Apr 13, 2025

milancurcic commented Apr 13, 2025

milancurcic commented Mar 19, 2025 •

edited

Loading