Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made Vocabulary's properties be initialized only ONCE on creation #1110

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Lyrcaxis
Copy link
Contributor

@Lyrcaxis Lyrcaxis commented Feb 26, 2025

Minor QOL improvement on the Vocabulary class with a slight performance increase.
Reason was I'm worried about llama.cpp internal deadlocks/mem. corruption based on an issue reported by @dpmm99.
image
I'm not 100% sure if this will solve the issue, but it's one less thing to worry about. Maybe we can narrow it down little by little.

@martindevans
Copy link
Member

@dpmm99 does this fix (or at least improve) the issue you were talking about in Discord?

@Lyrcaxis Lyrcaxis force-pushed the vocabulary-minor-QOL-improvement branch from c982047 to 8b2b7cc Compare February 26, 2025 15:07
Copy link
Member

@martindevans martindevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

@dpmm99
Copy link
Contributor

dpmm99 commented Feb 27, 2025

@dpmm99 does this fix (or at least improve) the issue you were talking about in Discord?

Negative.

@martindevans
Copy link
Member

Since this didn't fix the bug shall we close this PR?

@Lyrcaxis
Copy link
Contributor Author

Lyrcaxis commented Mar 16, 2025

Since this didn't fix the bug shall we close this PR?

I think it’s still a decent improvement — potentially huge performance-wise depending on use case (e.g.: batching).

But I can see how this could be viewed as memory duplication, although quite minimal (~7MB assuming ~128k tokens mapping to an average of 5 char strings). The Dictionary<LLamaToken, string> actually isn't needed if it won't be public for users, so it could also be deleted to make this a <1KB footprint -- but I think something like TokenToString deserves to exist and be public.

Ultimately your call. Let me know.

@martindevans
Copy link
Member

Oh if it's a large performance improvement as well let's merge it! I thought it was jsut a potential bugfix. Do you have any benchmarks for how much of a gain it is?

@Lyrcaxis
Copy link
Contributor Author

Lyrcaxis commented Mar 16, 2025

These are some quick benchmarks from release (ctrl+F5):

Starting test with 1000000 iterations.

Testing EOS access
EOS access - PR's way: 1
EOS access - Current way: 10

Testing Vocab access (a bit unfair but QOL)
Single token to string - PR's way: 23
Single token to string - Current way: 175

Testing IsEOG
EOG Test - PR's way: 1
EOG Test - Current way: 17

The test has ended

The tests were done with this code:

// Placed in Vocabulary class
public void RunTest() {
    const float totalIters = 1_000_000;
    Console.WriteLine($"Starting test with {totalIters} iterations.");
    Console.WriteLine("\nTesting EOS access");
    var sw = Stopwatch.StartNew();
    for (int i = 0; i < totalIters; i++) { var x = EOS; }
    sw.Stop();
    Console.WriteLine($"EOS access - PR's way: {sw.ElapsedMilliseconds}");
    unsafe {
        var _vocabNative = llama_model_get_vocab(_model);
        sw.Restart();
        for (int i = 0; i < totalIters; i++) { var x = Normalize(LLamaVocabNative.llama_vocab_eos(_vocabNative)); }
        sw.Stop();
        Console.WriteLine($"EOS access - Current way: {sw.ElapsedMilliseconds}");
    }

    Console.WriteLine("\nTesting Vocab access");
    var decoder = new StreamingTokenDecoder(Encoding.UTF8, _model);
    var llamaToken = (LLamaToken) 42;
    sw.Restart();
    for (int i = 0; i < totalIters; i++) { var x = this.TokenToString[llamaToken]; }
    sw.Stop();
    Console.WriteLine($"Single token to string - PR's way: {sw.ElapsedMilliseconds}");
    sw.Restart();
    for (int i = 0; i < totalIters; i++) { decoder.Add(llamaToken); var x = decoder.Read(); }
    sw.Stop();
    Console.WriteLine($"Single token to string - Current way: {sw.ElapsedMilliseconds}");

    Console.WriteLine("\nTesting IsEOG");
    sw.Restart();
    for (int i = 0; i < totalIters; i++) { var x = EOGTokens.Contains((int) llamaToken); }
    sw.Stop();
    Console.WriteLine($"EOG Test - PR's way: {sw.ElapsedMilliseconds}");
    unsafe {
        sw.Restart();
        for (int i = 0; i < totalIters; i++) { var x = LLamaVocabNative.llama_vocab_is_eog(VocabNative, llamaToken); }
        sw.Stop();
    }
    Console.WriteLine($"EOG Test - Current way: {sw.ElapsedMilliseconds}");

    // Test accuracy
    unsafe {
        for (int i = 0; i < Count; i++) {
            decoder.Add(llamaToken);
            Debug.Assert(this.TokenToString[llamaToken] == decoder.Read());
            Debug.Assert(LLamaVocabNative.llama_vocab_is_eog(VocabNative, llamaToken) == EOGTokens.Contains((int)llamaToken));
            Debug.Assert(LLamaVocabNative.llama_vocab_is_control(VocabNative, llamaToken) == ControlTokens.Contains((int)llamaToken));
        }
    }
    Console.WriteLine($"\nThe test has ended");
}

TokenToString's performance can also be largely improved by using just a List<string> instead.
But the main thought behind exposing this specifically is QOL -- not performance (as it cannot replace StreamingTokenDecoder).

@Lyrcaxis
Copy link
Contributor Author

Lyrcaxis commented Mar 16, 2025

Btw all tests passed on my PC even after updating from upstream/master.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants