-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: nanoGPT implementation using Reactant #1062
base: main
Are you sure you want to change the base?
Conversation
844a274
to
d2e7299
Compare
d2e7299
to
f4504bc
Compare
f4504bc
to
9b97721
Compare
|
||
train_loader = DataLoader( | ||
(trainX, trainY); batchsize, shuffle=true, parallel=true | ||
) |> dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) |> dev | |
) .|> dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed? broadcasting over the dataloader will avoid constructing the DeviceIterator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Atleast, on GPU without this the labels weren't being transferred to the device
end | ||
|
||
test_loss = loss_fn( | ||
Array(first(model_compiled(testX, ps, Lux.testmode(st)))), testY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Array(first(model_compiled(testX, ps, Lux.testmode(st)))), testY | |
first(model_compiled(testX, ps, Lux.testmode(st))), testY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to compile the loss_fn for the non-Array
version to work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah this might not be needed for your case here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 3 out of 4 changed files in this pull request and generated no suggestions.
Files not reviewed (1)
- examples/NanoGPT/main.jl: Language not supported
Needs
NNlib.gather
support in Reactant: feat: partial NNlib.gather support + better indexing support EnzymeAD/Reactant.jl#252NNlib.make_causal_mask
: feat: add support for NNlib make causal mask EnzymeAD/Reactant.jl#249dynamic_gather
xref Jmp/scatter gather EnzymeAD/Enzyme-JAX#142