Skip to content

Mv optimizer from the network level to the layer level #184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jvdp1
Copy link
Collaborator

@jvdp1 jvdp1 commented Jun 14, 2024

As discussed, here is a draft in which I suggst to moved the optimizer from the network level to the layer level.

This is just a draft with an implementation for the dense layer only.

Here are the wall clock times using my dataset (with 2 hidden dense layers):

v0.17.0

  • Forward + backward: 4.79s
  • Update: 4.59s

Current PR

  • Forward + backward: 4.81s
  • Update: 1.40s

@OneAdder
Copy link
Collaborator

OneAdder commented Mar 5, 2025

@jvdp1 That's actually a great idea. Apart from obvious performance gains, it can simplify code for combined layers. I will arrange everything in similar fashion in my project here: https://github.com/OneAdder/llm.f
Then we can backport it here along with implementation for all other layers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants