Made Vocabulary's properties be initialized only ONCE on creation #1110

Lyrcaxis · 2025-02-26T09:59:15Z

Minor QOL improvement on the Vocabulary class with a slight performance increase.
Reason was I'm worried about llama.cpp internal deadlocks/mem. corruption based on an issue reported by @dpmm99.

I'm not 100% sure if this will solve the issue, but it's one less thing to worry about. Maybe we can narrow it down little by little.

LLama/Native/SafeLlamaModelHandle.cs

martindevans · 2025-02-26T14:55:19Z

@dpmm99 does this fix (or at least improve) the issue you were talking about in Discord?

martindevans

Looks good 👍

dpmm99 · 2025-02-27T02:01:50Z

@dpmm99 does this fix (or at least improve) the issue you were talking about in Discord?

Negative.

martindevans · 2025-03-15T23:22:32Z

Since this didn't fix the bug shall we close this PR?

Lyrcaxis · 2025-03-16T02:52:47Z

Since this didn't fix the bug shall we close this PR?

I think it’s still a decent improvement — potentially huge performance-wise depending on use case (e.g.: batching).

But I can see how this could be viewed as memory duplication, although quite minimal (~7MB assuming ~128k tokens mapping to an average of 5 char strings). The Dictionary<LLamaToken, string> actually isn't needed if it won't be public for users, so it could also be deleted to make this a <1KB footprint -- but I think something like TokenToString deserves to exist and be public.

Ultimately your call. Let me know.

martindevans · 2025-03-16T14:57:37Z

Oh if it's a large performance improvement as well let's merge it! I thought it was jsut a potential bugfix. Do you have any benchmarks for how much of a gain it is?

Lyrcaxis · 2025-03-16T16:46:56Z

These are some quick benchmarks from release (ctrl+F5):

Starting test with 1000000 iterations.

Testing EOS access
EOS access - PR's way: 1
EOS access - Current way: 10

Testing Vocab access (a bit unfair but QOL)
Single token to string - PR's way: 23
Single token to string - Current way: 175

Testing IsEOG
EOG Test - PR's way: 1
EOG Test - Current way: 17

The test has ended

The tests were done with this code:

// Placed in Vocabulary class
public void RunTest() {
    const float totalIters = 1_000_000;
    Console.WriteLine($"Starting test with {totalIters} iterations.");
    Console.WriteLine("\nTesting EOS access");
    var sw = Stopwatch.StartNew();
    for (int i = 0; i < totalIters; i++) { var x = EOS; }
    sw.Stop();
    Console.WriteLine($"EOS access - PR's way: {sw.ElapsedMilliseconds}");
    unsafe {
        var _vocabNative = llama_model_get_vocab(_model);
        sw.Restart();
        for (int i = 0; i < totalIters; i++) { var x = Normalize(LLamaVocabNative.llama_vocab_eos(_vocabNative)); }
        sw.Stop();
        Console.WriteLine($"EOS access - Current way: {sw.ElapsedMilliseconds}");
    }

    Console.WriteLine("\nTesting Vocab access");
    var decoder = new StreamingTokenDecoder(Encoding.UTF8, _model);
    var llamaToken = (LLamaToken) 42;
    sw.Restart();
    for (int i = 0; i < totalIters; i++) { var x = this.TokenToString[llamaToken]; }
    sw.Stop();
    Console.WriteLine($"Single token to string - PR's way: {sw.ElapsedMilliseconds}");
    sw.Restart();
    for (int i = 0; i < totalIters; i++) { decoder.Add(llamaToken); var x = decoder.Read(); }
    sw.Stop();
    Console.WriteLine($"Single token to string - Current way: {sw.ElapsedMilliseconds}");

    Console.WriteLine("\nTesting IsEOG");
    sw.Restart();
    for (int i = 0; i < totalIters; i++) { var x = EOGTokens.Contains((int) llamaToken); }
    sw.Stop();
    Console.WriteLine($"EOG Test - PR's way: {sw.ElapsedMilliseconds}");
    unsafe {
        sw.Restart();
        for (int i = 0; i < totalIters; i++) { var x = LLamaVocabNative.llama_vocab_is_eog(VocabNative, llamaToken); }
        sw.Stop();
    }
    Console.WriteLine($"EOG Test - Current way: {sw.ElapsedMilliseconds}");

    // Test accuracy
    unsafe {
        for (int i = 0; i < Count; i++) {
            decoder.Add(llamaToken);
            Debug.Assert(this.TokenToString[llamaToken] == decoder.Read());
            Debug.Assert(LLamaVocabNative.llama_vocab_is_eog(VocabNative, llamaToken) == EOGTokens.Contains((int)llamaToken));
            Debug.Assert(LLamaVocabNative.llama_vocab_is_control(VocabNative, llamaToken) == ControlTokens.Contains((int)llamaToken));
        }
    }
    Console.WriteLine($"\nThe test has ended");
}

TokenToString's performance can also be largely improved by using just a List<string> instead.
But the main thought behind exposing this specifically is QOL -- not performance (as it cannot replace StreamingTokenDecoder).

Lyrcaxis · 2025-03-16T17:08:44Z

Btw all tests passed on my PC even after updating from upstream/master.

Lyrcaxis added 2 commits February 26, 2025 11:55

Made Vocabulary properties be initialized only ONCE on creation

204ba96

Added cache for EOG and Control tokens to Vocabulary

1df9568

martindevans reviewed Feb 26, 2025

View reviewed changes

LLama/Native/SafeLlamaModelHandle.cs Outdated Show resolved Hide resolved

martindevans reviewed Feb 26, 2025

View reviewed changes

LLama/Native/SafeLlamaModelHandle.cs Outdated Show resolved Hide resolved

Addressed change requests

8b2b7cc

Lyrcaxis force-pushed the vocabulary-minor-QOL-improvement branch from c982047 to 8b2b7cc Compare February 26, 2025 15:07

Lyrcaxis requested a review from martindevans February 26, 2025 15:08

martindevans approved these changes Feb 26, 2025

View reviewed changes

Lyrcaxis added 2 commits March 16, 2025 18:47

Tweaks on Vocabulary

8f21685

Made cached EOG/Control tokens be HashSet<int> for quicker lookup

513c1e3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Made Vocabulary's properties be initialized only ONCE on creation #1110

Made Vocabulary's properties be initialized only ONCE on creation #1110

Lyrcaxis commented Feb 26, 2025 •

edited

Loading

martindevans commented Feb 26, 2025

martindevans left a comment

dpmm99 commented Feb 27, 2025

martindevans commented Mar 15, 2025

Lyrcaxis commented Mar 16, 2025 •

edited

Loading

martindevans commented Mar 16, 2025

Lyrcaxis commented Mar 16, 2025 •

edited

Loading

Lyrcaxis commented Mar 16, 2025 •

edited

Loading

Made Vocabulary's properties be initialized only ONCE on creation #1110

Are you sure you want to change the base?

Made Vocabulary's properties be initialized only ONCE on creation #1110

Conversation

Lyrcaxis commented Feb 26, 2025 • edited Loading

martindevans commented Feb 26, 2025

martindevans left a comment

Choose a reason for hiding this comment

dpmm99 commented Feb 27, 2025

martindevans commented Mar 15, 2025

Lyrcaxis commented Mar 16, 2025 • edited Loading

martindevans commented Mar 16, 2025

Lyrcaxis commented Mar 16, 2025 • edited Loading

Lyrcaxis commented Mar 16, 2025 • edited Loading

Lyrcaxis commented Feb 26, 2025 •

edited

Loading

Lyrcaxis commented Mar 16, 2025 •

edited

Loading

Lyrcaxis commented Mar 16, 2025 •

edited

Loading

Lyrcaxis commented Mar 16, 2025 •

edited

Loading