Skip to content

Commit

Permalink
src/SequenceTokenizers.jl: Handle indexes out of range which can occu…
Browse files Browse the repository at this point in the history
…r due to zero masking for padding.
  • Loading branch information
mashu committed Aug 31, 2024
1 parent 0ac7304 commit 3a56dfc
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 3 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "SequenceTokenizers"
uuid = "782fbe24-d740-44eb-8e9f-f0831b1cb5e5"
authors = ["Mateusz Kaduk <[email protected]>"]
version = "0.0.4"
version = "0.0.5"

[deps]
OneHotArrays = "0b1bfda6-eb8a-41d2-88d8-f5af5cad476f"
Expand Down
10 changes: 8 additions & 2 deletions src/SequenceTokenizers.jl
Original file line number Diff line number Diff line change
Expand Up @@ -178,8 +178,14 @@ module SequenceTokenizers
println(tokenizer(1)) # Output: 'x' (unknown token)
```
"""
@inline (tokenizer::SequenceTokenizer)(idx::Integer) = tokenizer.alphabet[idx]

@inline function (tokenizer::SequenceTokenizer)(idx::Integer)
if 1 <= idx <= length(tokenizer.alphabet)
return tokenizer.alphabet[idx]
else
return tokenizer.unksym
end
end

"""
(tokenizer::SequenceTokenizer{T})(input::AbstractString) where T
Expand Down

2 comments on commit 3a56dfc

@mashu
Copy link
Owner Author

@mashu mashu commented on 3a56dfc Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator register

Release notes:

Handle transforming back with zero tokens which should get decoded correct to unk symbol instead of zero indexing error.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/114235

Tagging

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.0.5 -m "<description of version>" 3a56dfc12340c30de6fc95a8d475cd1ebf81ea93
git push origin v0.0.5

Please sign in to comment.