-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Made Vocabulary's properties be initialized only ONCE on creation #1110
base: master
Are you sure you want to change the base?
Made Vocabulary's properties be initialized only ONCE on creation #1110
Conversation
@dpmm99 does this fix (or at least improve) the issue you were talking about in Discord? |
c982047
to
8b2b7cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍
Negative. |
Since this didn't fix the bug shall we close this PR? |
I think it’s still a decent improvement — potentially huge performance-wise depending on use case (e.g.: batching). But I can see how this could be viewed as memory duplication, although quite minimal (~7MB assuming ~128k tokens mapping to an average of 5 char strings). The Ultimately your call. Let me know. |
Oh if it's a large performance improvement as well let's merge it! I thought it was jsut a potential bugfix. Do you have any benchmarks for how much of a gain it is? |
These are some quick benchmarks from release (ctrl+F5):
The tests were done with this code: // Placed in Vocabulary class
public void RunTest() {
const float totalIters = 1_000_000;
Console.WriteLine($"Starting test with {totalIters} iterations.");
Console.WriteLine("\nTesting EOS access");
var sw = Stopwatch.StartNew();
for (int i = 0; i < totalIters; i++) { var x = EOS; }
sw.Stop();
Console.WriteLine($"EOS access - PR's way: {sw.ElapsedMilliseconds}");
unsafe {
var _vocabNative = llama_model_get_vocab(_model);
sw.Restart();
for (int i = 0; i < totalIters; i++) { var x = Normalize(LLamaVocabNative.llama_vocab_eos(_vocabNative)); }
sw.Stop();
Console.WriteLine($"EOS access - Current way: {sw.ElapsedMilliseconds}");
}
Console.WriteLine("\nTesting Vocab access");
var decoder = new StreamingTokenDecoder(Encoding.UTF8, _model);
var llamaToken = (LLamaToken) 42;
sw.Restart();
for (int i = 0; i < totalIters; i++) { var x = this.TokenToString[llamaToken]; }
sw.Stop();
Console.WriteLine($"Single token to string - PR's way: {sw.ElapsedMilliseconds}");
sw.Restart();
for (int i = 0; i < totalIters; i++) { decoder.Add(llamaToken); var x = decoder.Read(); }
sw.Stop();
Console.WriteLine($"Single token to string - Current way: {sw.ElapsedMilliseconds}");
Console.WriteLine("\nTesting IsEOG");
sw.Restart();
for (int i = 0; i < totalIters; i++) { var x = EOGTokens.Contains((int) llamaToken); }
sw.Stop();
Console.WriteLine($"EOG Test - PR's way: {sw.ElapsedMilliseconds}");
unsafe {
sw.Restart();
for (int i = 0; i < totalIters; i++) { var x = LLamaVocabNative.llama_vocab_is_eog(VocabNative, llamaToken); }
sw.Stop();
}
Console.WriteLine($"EOG Test - Current way: {sw.ElapsedMilliseconds}");
// Test accuracy
unsafe {
for (int i = 0; i < Count; i++) {
decoder.Add(llamaToken);
Debug.Assert(this.TokenToString[llamaToken] == decoder.Read());
Debug.Assert(LLamaVocabNative.llama_vocab_is_eog(VocabNative, llamaToken) == EOGTokens.Contains((int)llamaToken));
Debug.Assert(LLamaVocabNative.llama_vocab_is_control(VocabNative, llamaToken) == ControlTokens.Contains((int)llamaToken));
}
}
Console.WriteLine($"\nThe test has ended");
}
|
Minor QOL improvement on the

Vocabulary
class with a slight performance increase.Reason was I'm worried about llama.cpp internal deadlocks/mem. corruption based on an issue reported by @dpmm99.
I'm not 100% sure if this will solve the issue, but it's one less thing to worry about. Maybe we can narrow it down little by little.