HF Phi3-mini-128k returns very different gradients than reference #1441
Labels
high priority
huggingface
For supporting HF models
thunderfx
for things that could be applicable to the dynamo+thunder frontend
🐛 Bug
When testing HF phi3 the gradients differ by several order of magnitudes from reference:
To Reproduce
Steps to reproduce the behavior:
test-hf-phi3
branchpytest thunder/tests/test_networks.py -k phi3
Environment
Container 20241114 with Thunder at test-phi3@4c71eaa4f15028f94910e365ce6c3894769578a5
Additional context
This is part of #1278.
cc @apaz-cli
The text was updated successfully, but these errors were encountered: