You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi
Great project, i really like how fast things are moving here!
We have the following scenario: Multiple servers with GPUs in the same network, but they are "unstable". Some are used for other tasks while others are idling, i changes from time to time.
Creating a VM (or a physical machine, if necessary) outside this network is easy, but it won't have GPUs. Also, they can't directly reach each other.
So the primary/secondary architecture other projects use is quite usefull: primary runns all the time, GPUs join and leave. It would be great to have the possibility to specify a primary- or non-gpu-node which stays in charge (or stays online, for the matter) and have the other hosts find each other automatically as they already do with UDP.
Do you think something like this could be feasable?
Chris
The text was updated successfully, but these errors were encountered:
Hi Great project, i really like how fast things are moving here!
We have the following scenario: Multiple servers with GPUs in the same network, but they are "unstable". Some are used for other tasks while others are idling, i changes from time to time.
Creating a VM (or a physical machine, if necessary) outside this network is easy, but it won't have GPUs. Also, they can't directly reach each other.
So the primary/secondary architecture other projects use is quite usefull: primary runns all the time, GPUs join and leave. It would be great to have the possibility to specify a primary- or non-gpu-node which stays in charge (or stays online, for the matter) and have the other hosts find each other automatically as they already do with UDP.
Do you think something like this could be feasable?
Chris
If I understood your use case correctly, this should already work out of the box.
There's no "master" node in exo.
You would just use the endpoint from the "primary" node you described.
Any other nodes that join/leave will automatically have an effect on the topology.
Hi
Great project, i really like how fast things are moving here!
We have the following scenario: Multiple servers with GPUs in the same network, but they are "unstable". Some are used for other tasks while others are idling, i changes from time to time.
Creating a VM (or a physical machine, if necessary) outside this network is easy, but it won't have GPUs. Also, they can't directly reach each other.
So the primary/secondary architecture other projects use is quite usefull: primary runns all the time, GPUs join and leave. It would be great to have the possibility to specify a primary- or non-gpu-node which stays in charge (or stays online, for the matter) and have the other hosts find each other automatically as they already do with UDP.
Do you think something like this could be feasable?
Chris
The text was updated successfully, but these errors were encountered: