Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added RMSProp Optimizer subroutine #144

Merged
merged 4 commits into from
Jun 20, 2023
Merged

Conversation

Spnetic-5
Copy link
Collaborator

@Spnetic-5 Spnetic-5 commented Jun 16, 2023

Solves #136
This pull request adds an implementation of the RMSprop optimizer subroutine to the existing quadratic example.

Approach:

  • Initialized rms_weights and rms_gradients arrays of appropriate dimensions.
  • Added a nested loop over the network layers to update the weights using the RMSprop update rule.
  • Calculated rms_weights and rms_gradients using the decay rate and current weights/gradients.
  • Updated the weights using the RMSprop update rule: weights = weights - (learning_rate / sqrt(rms_weights + epsilon)) * gradients.

@Spnetic-5 Spnetic-5 requested a review from milancurcic June 16, 2023 14:14
example/quadratic.f90 Outdated Show resolved Hide resolved
example/quadratic.f90 Outdated Show resolved Hide resolved
@Spnetic-5 Spnetic-5 requested a review from milancurcic June 18, 2023 22:12
@Spnetic-5
Copy link
Collaborator Author

Apologies for late reply @milancurcic

@milancurcic
Copy link
Member

@Spnetic-5 I mostly re-wrote the subroutine so that now it compiles and converges. It's not using mini-batching; for simplicity, for now it's being applied after the entire batch of forward and backward passes.

I understand that this PR was challenging. It took me a bit to find the right approach. In your most recent commit, you made some changes and wrote "made suggested corrections", which made it sound like the PR was good to go. However, the example was not even compiling at this stage. Whenever you struggle with the implementation, please write a comment in the PR explaining where you got stuck and if you need help, rather than just leaving it with a short commit message.

Also please study the implementation in this PR. It introduces a new derived type to allow tracking a moving average of gradients over multiple epochs and for each layer. We are likely to use this approach for other optimizers that need a moving average logic.

@milancurcic milancurcic merged commit 44833c2 into modern-fortran:main Jun 20, 2023
@Spnetic-5
Copy link
Collaborator Author

Spnetic-5 commented Jun 20, 2023

I apologize for the confusion caused by my commit message. It was not my intention to imply that the code was ready to go. But the code was compiling and running well in my PC, I'll make sure to provide detailed comments in the pull request in the future.

Thanks for the changes, I'll study those.

@milancurcic
Copy link
Member

Thank you, @Spnetic-5, and no worries. I apologize for jumping the gun and finishing the implementation in this PR.

Going forward, would you like to give it a shot to continue the work in #139, or would you like to implement another optimizer in the quadratic fit example program? Recall that once we implement #139 for SGD, the new optimizers in quadratic will serve as prototype implementations for porting them into the library.

@Spnetic-5
Copy link
Collaborator Author

Going forward, would you like to give it a shot to continue the work in #139, or would you like to implement another optimizer in the quadratic fit example program? Recall that once we implement #139 for SGD, the new optimizers in quadratic will serve as prototype implementations for porting them into the library.

Thank you, @milancurcic. I would like to work on #137 first. Once we have completed that, we can move on to #139 and then additional new optimizers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants