You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently streaming mode doesn't support phased ranking. This makes it tricky to efficiently run inference with more expensive models e.g. ColBERT max sim.
Describe the solution you'd like
For streaming mode to support phased ranking in the same way as indexing mode, or (if not possible within the design) an alternative approach that achieves something similar.
Describe alternatives you've considered
Using conditional logic to determine whether to run inference:
function myFunction() {
if (cheapExpression > cutoff, cheapExpression, expensiveExpression)
}
Additional context
It's possible I've overlooked some existing features and the use case I'm describing is already doable within the current design.
The text was updated successfully, but these errors were encountered:
With indexed search, second-phase is usually preferable because it runs locally in parallel on each content node, and as fan-out increases this becomes important to achieve parallelism and avoid network saturation. However, with streaming, fan-out is close to 1 on average regardless of the size of the content cluster (since queries are only routed to the buckets having content for that user/group), so global-phase performs well.
Is your feature request related to a problem? Please describe.
Currently streaming mode doesn't support phased ranking. This makes it tricky to efficiently run inference with more expensive models e.g. ColBERT max sim.
Describe the solution you'd like
For streaming mode to support phased ranking in the same way as indexing mode, or (if not possible within the design) an alternative approach that achieves something similar.
Describe alternatives you've considered
Using conditional logic to determine whether to run inference:
Additional context
It's possible I've overlooked some existing features and the use case I'm describing is already doable within the current design.
The text was updated successfully, but these errors were encountered: