Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: nanoGPT implementation using Reactant #1062

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

avik-pal
Copy link
Member

@avik-pal avik-pal commented Nov 9, 2024

Needs

error: could not compute the adjoint for this operation %51 = "stablehlo.dynamic_gather"(%15, %0, %12) <{dimension_numbers = #stablehlo.gather<offset_dims = [0], collapsed_slice_dims = [1], start_index_map = [1], index_vector_dim = 1>}> : (tensor<64x64xf32>, tensor<64xi64>, tensor<2xi64>) -> tensor<64x64xf32>
error: could not compute the adjoint for this operation %47 = "stablehlo.dynamic_gather"(%14, %46, %12) <{dimension_numbers = #stablehlo.gather<offset_dims = [0], collapsed_slice_dims = [1], start_index_map = [1], index_vector_dim = 1>}> : (tensor<64x67xf32>, tensor<8192xi64>, tensor<2xi64>) -> tensor<64x8192xf32>
ERROR: LoadError: "failed to run pass manager on module"

@avik-pal avik-pal marked this pull request as draft November 9, 2024 02:19
examples/NanoGPT/main.jl Outdated Show resolved Hide resolved
@avik-pal avik-pal force-pushed the ap/nanogpt_reactant branch 4 times, most recently from 844a274 to d2e7299 Compare November 15, 2024 02:58
@avik-pal avik-pal force-pushed the ap/nanogpt_reactant branch from d2e7299 to f4504bc Compare November 16, 2024 03:33
@avik-pal avik-pal force-pushed the ap/nanogpt_reactant branch from f4504bc to 9b97721 Compare November 16, 2024 18:53

train_loader = DataLoader(
(trainX, trainY); batchsize, shuffle=true, parallel=true
) |> dev

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
) |> dev
) .|> dev

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? broadcasting over the dataloader will avoid constructing the DeviceIterator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atleast, on GPU without this the labels weren't being transferred to the device

end

test_loss = loss_fn(
Array(first(model_compiled(testX, ps, Lux.testmode(st)))), testY

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Array(first(model_compiled(testX, ps, Lux.testmode(st)))), testY
first(model_compiled(testX, ps, Lux.testmode(st))), testY

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to compile the loss_fn for the non-Array version to work

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah this might not be needed for your case here

@avik-pal avik-pal requested a review from Copilot December 11, 2024 14:23

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 3 out of 4 changed files in this pull request and generated no suggestions.

Files not reviewed (1)
  • examples/NanoGPT/main.jl: Language not supported
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants