Configuring localGPT for production #472
Unanswered
AnandMoorthy
asked this question in
Q&A
Replies: 3 comments
-
@PromtEngineer Need your input on this! |
Beta Was this translation helpful? Give feedback.
0 replies
-
@PromtEngineer It's the same issue that my team and I have |
Beta Was this translation helpful? Give feedback.
0 replies
-
@AnandMoorthy @matheus-mondaini we will need to implement queue in the api for multiple users, should be relatively easy to implement. Will have a look. Getting back to this project back soon. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I am planning to configure the project to production, i am expecting around 10 peoples to use this concurrently. My current setup is RTX 4090 with 24Gig memory. Flask app is working fine when a single user using localGPT but when multiple requests comes in at the same time the app is crashing.
Also i see whenever request comes in GPU goes 100%, is there any way with the current configuration i can server for around 10 peoples concurrently? Other suggestions are welcome too :)
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions