Chapter 3 - lm_head_output shape differs from textbook #14

jgammerman · 2024-10-28T11:01:19Z

Hi! Loving the textbook so far :) I've encountered a minor issue though in the chapter 3 section Choosing a single token from the probability distribution (sampling / decoding)...

When I run lm_head_output.shape I get an output shape of [1, 5, 32064], whereas the source code and textbook states that it should be [1, 6, 32064]. I'm not sure why there's a difference, I've kept all the preceding code the same...

Interestingly, running the next line of code returns the expected output ("Paris"):

token_id = lm_head_output[0,-1].argmax(-1) tokenizer.decode(token_id)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chapter 3 - lm_head_output shape differs from textbook #14

Chapter 3 - lm_head_output shape differs from textbook #14

jgammerman commented Oct 28, 2024

Chapter 3 - lm_head_output shape differs from textbook #14

Chapter 3 - lm_head_output shape differs from textbook #14

Comments

jgammerman commented Oct 28, 2024