-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed-up calls to LLM by parallelization of the topic categorization #11
Comments
@jucor thanks for sharing and pointing out the relevant code locations. We're definitely interested in speeding this up, and are looking to implement parallelization likely after our current sprint to tackle hallucinations. There are Vertex quota limits, so we'll need to do some rate limiting ourselves or find a helper library. We did recently create a helper function resolvePromisesInParallel for summarization, this could be a good starting point for categorization |
Nice, thanks! I know very little about TypeScript (mostly a Python guy) and its |
Hi @jucor I just submitted this commit that parallelizes this loop. For a test set of 300 comments it brought the categorization time down from 3.58 minutes to 1.4 minutes. This uses the function "resolvePromiseseInParallel" that @dborkan mentioned and is using the default value of 2 parallel threads at once which is what the Vertex Models allow on the free tier. |
Woohoo! Amazing! A 2.5x speedup! Thanks a lot team! 🚀🎉 |
Oh, and that's a super interesting observation in the comments of the commit: sensemaking-tools/src/sensemaker.ts Lines 223 to 225 in 44600ff
I'm super excited to hear discussion of the distribution, and the factors that affect it! That's super linked to evals discussed in compdemocracy/polis#1866 and compdemocracy/polis#1878 ! Could you say a bit more about what you observed, @alyssachvasta please? @akonya , is this change of distribution depending on the batch size something you observed too with your own LLM experiments, please? |
For the batch size of one comment per LLM call I found that the model was 50% more likely to categorize a comment under multiple topics/subtopics than if 100 comments were categorized at once. The current categorization behavior is quite good so I didn't want to make that change. In the future I may come back to it with some additional prompt changes / other tweaks, but only if I can ensure the categorization behavior will stay the same. |
It's great you've observed that behavior. For me that would be reason to dig a little more into it, to double-check how robust the results are. In theory, if we were using a regular classifier, conditionally on the topics, the classification of each comment should be independent from the others, and thus independent of the batch size or the batch content. The way I would suggest to investigate the robustness (which would also quantify the "quite good" behaviour of the current categorization) would be, for a fixed set of comments, and a fixed set of topics, to run the categorization for several batch sizes, and several times for each batch size, and look even just at the histograms of the count of comments per topic (or per subtopic).0 It will also allow to diagnose at a glance whether there is any change of categorization behaviour, and provide a useful tool for debugging if there is. What do you think? |
For reference, our current benchmark on unparallelized code with 318 comments: https://github.com/compdemocracy/polis/actions/runs/12589605245/job/35089707403#step:9:752 at 17 minutes, a 2.5x speedup would still be over 6 minutes, which would work for a microservice that ran periodically but is still above thresholds for real time users over http, just to flag that |
@jucor -- Yep, we've seen similar batch-dependent effects in our LLM tagger. As batch size increases -- ie more comments being topic-tagged per prompt -- we saw fewer tags per comment as well as some degradation in general tag accuracy. We focus on two levers to optimize the quality and speed tradeoff: model size and batch size. Optimal would obviously be the biggest/best model, with batch size of 1, run fully in parallel. But bigger models from 3rd party providers have more aggressive throttling limits (at least for us); so you get a tradeoff between speed and quality that is mediated by batchsize. Smaller models are way faster and have less aggressive throttling but quality can be lower in general (and the degradation with batch size ramps much quicker). This creates different speed-quality pareto curves for diff models. So you get the best possible speed-quality pareto curve if you make model choice to be a optimization parameter. Currently, using a mid-size off-the-shelf model with batchsize 10 seems to be an OK sweet spot. But this changes as new models are released and as we graduate to higher rate limits. But the batch size degradation effects generally seem to kick in as you go above 10. |
Amazing, thanks @akonya for sharing your experience. This is super valuable. |
We actually have a collection of human-tagged data sets we could use to do this if you're interested in the super rigorous thing. There are 5 diff datasets. Each has 300-500 statements with ground-truth topic and subtopic taxonomies + per-statement tags done by experts. |
Dear Jigsaw team
As discussed by email, it would be really helpful if the library could run faster. The topic learning is very fast. The categorization could do with being faster. In our IRL conversations I remember you mentioned that you definitely had this in mind, so I’m just adding it here to follow-up.
When looking at the categorization code, I suspect you were probably thinking of parallelizing the call accross the mini-batches, i.e. this loop:
sensemaking-tools/src/sensemaker.ts
Lines 217 to 235 in 8eb482e
Parallelizing this loop seems possibly the highest-level way with the less amount of work needed and the maximum return.
Of course, as you also pointed, there’s the question of whether Vertex will throttle the requests. Does Vertex offer an async caller which automatically respects its throttling limits? That would be neat :)
Thanks!
The text was updated successfully, but these errors were encountered: