Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-User Chat Generation #53

Closed
kevinthedang opened this issue Apr 21, 2024 · 6 comments · Fixed by #83
Closed

Multi-User Chat Generation #53

kevinthedang opened this issue Apr 21, 2024 · 6 comments · Fixed by #83
Assignees
Labels
enhancement New feature or request

Comments

@kevinthedang
Copy link
Owner

Issue

  • When multiple people want to talk with the bot at the same time, it will not run them asynchronously. Refer to image below.
  • This applies to both streaming and non-streaming from the Chat Stream Integration  #52 PR and for both messaging styles.

image

Solution

  • Implement some kind of thing that allows the bot client to run things asynchronously.
  • Implement proxy server to handle multiple Ollama containers (refer to notes)

Notes

  • This will be problematic as multiple instances of Ollama might be required to make this work if the method does not include "streaming."
  • Streaming allows for the generation to come dynamically and while it finishes server-side, it can work on another person while completing another users response.
    • This also needs to be handled as per this issue.
  • A "proxy server" might be needed to make this happen.
    • Note: We will need to know # of Ollama containers on startup
@kevinthedang kevinthedang added the enhancement New feature or request label Apr 21, 2024
@kevinthedang
Copy link
Owner Author

Looks like as of May, the developers made it possible to create this feature (refer below).

Another possibility of this was mentioned back on Jan. 31st is still possible and I believe was mentioned above. The use of proxies can be on the table if needed.

References

@kevinthedang kevinthedang self-assigned this Jul 9, 2024
@kevinthedang
Copy link
Owner Author

With Ollama v0.2.0, concurreny and parallel generation is possible for the bot.

https://github.com/ollama/ollama/releases/tag/v0.2.0

@kevinthedang
Copy link
Owner Author

Likely no implementation is needed, but that will need to be tested. Likely close this after #82 is resolved.

@JT2M0L3Y

@kevinthedang
Copy link
Owner Author

Looks like Concurrency works as intended outside the box.

Discord:
image

Logging of the two conversations generating simultaneously:
image

This can be closed with #82 now

@kevinthedang kevinthedang linked a pull request Jul 11, 2024 that will close this issue
@kevinthedang
Copy link
Owner Author

kevinthedang commented Aug 3, 2024

Something I did not read about initially when 0.2.0 was release but we might have to have some kind of implementation that allows user to select:

  1. OLLAMA_MAX_LOADED_MODELS - How many models allowed to be active in a given time period.
  2. OLLAMA_NUM_PARALLEL - How many concurrent requests per model

Might be an issue we should create as a new feature. Possibly done through Slash Commands?

  • Note: Concurrency is automatic as noted, but too lazy at the moment to know on maxes for it right now.

Thread Reference: Parallel Requests

@JT2M0L3Y

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant