Replies: 8 comments 3 replies
-
Row LockingSome relevant Prisma issues to track:
Also, until the above are implemented, the official recommendation without raw queries. |
Beta Was this translation helpful? Give feedback.
-
As the software stands today, would you consider self-hosting on fly.io to fall into this category of long-running server? Specifically wondering about the approach in the context of a Remix app running on top of express (Kent Dodd's Epic Stack) I understand that fly.io sort of sits into this hybrid area of server/serverless with the opportunity to autoscale. |
Beta Was this translation helpful? Give feedback.
-
Long-running servers likely also means long-running tasks that don't get off-loaded to the platform. Simple timeouts probably won't cut it. There's no reason why you shouldn't be able to run your 10h task, if you so choose.
Health checks could be done additionally, then the platform will hear about it (or not rather). Workers with active runs / tasks could be required to "check-in" on an interval. |
Beta Was this translation helpful? Give feedback.
-
Any news about this topic? I guess no but is there any ETA? Thanks! |
Beta Was this translation helpful? Give feedback.
-
CMIIW, but does it mean that I cannot use Trigger.dev for self hosted Next.js app? |
Beta Was this translation helpful? Give feedback.
-
I do not think pull-based solution solves listed issues:
HTTP backlog / queue is a problem, but I would argue it is less of an issue than the other points raised, especially clearing up after run execution. Pulling itself could also be implemented in Prometheus Pushgateway fashion, with a middleman process accepting pushes from trigger.dev and offloading tasks to a queue. This should allow the current, much simpler, serverless-first architecture to be preserved. |
Beta Was this translation helpful? Give feedback.
-
I'm closing this as we have long-running server support in version 3. Learn more and get early access: https://trigger.dev/blog/v3-developer-preview-launch/ |
Beta Was this translation helpful? Give feedback.
-
I wanted to write this GitHub discussion to think through (and get feedback on) supporting Long-running servers on Trigger.dev.
Currently, to use Trigger.dev, you need to deploy your code to a serverless platform (think Next.js on Vercel). And the way we "invoke" jobs in this setup is through an HTTP request to the exposed route, for example in Next.js:
Internally the
TriggerClient
routes "invoke job" request to the job'srun
function and returns the results in the response.This architecture works in serverless deployments for several key reasons:
Why Not HTTP on Long-Running Servers?
This architecture would not work for long-running servers in a Node.js environment for a few reasons:
Push vs Pull
Basically, what it boils down to is the difference between a push vs. a pull based queue.
In a push-based queue, messages or tasks are automatically sent (pushed) to consumers as they arrive in the queue. The consumers don't have to request or check for new messages; they're simply pushed to them.
In a pull-based queue, consumers request (pull) messages or tasks from the queue when they are ready to process them. This means the consumers have to actively check the queue for new items.
In other words, a push-based queue works best for Serverless deployments, since code won't even start running unless a serverless function is invoked.
And a pull-based queue is best for long-running server deployments, since code is always (hopefully!) running and able to ask for new messages from the queue to process.
Adding pull-based queue mode to Trigger.dev
This means we need to add a pull-based queue mode to Trigger.dev to support long-running servers.
Client code
I'm proposing the following sketch of an API for the client-code side of the pull-based mode:
This change from the
createAppRoute
approach in the *pull-based mode gives a very clear indication to the user that this is a different approach to running jobs with Trigger.dev, in addition to it being provided in the@trigger.dev/nodejs
package.The
createWorkerPool
function would be responsible for asking the Trigger.dev platform for new "messages" to process in a way that works best for long-running servers. Some prior art here would be things like how graphile worker is designed.Platform changes
The bulk of the work for implementing this feature would be in the platform changes, and I'm going to start to document them here but this is not exhaustive and will require ongoing updates:
Endpoint
model will need to be updated to support multiple modes (e.g. pull and push). And does it still make sense to call it that?"Pulling" new runs
There are a couple of ways we could implement the communication layer for "pulling" new runs in pull mode:
POST /api/v1/endpoints/<endpoint slug>/runs/acquire
)Irrespective of which we choose above, we will still need to implement the pull-based queue semantics in the app & database layer to ensure only a single worker is processing any one run at a time, with the least amount of db contention possible. This will probably be the meat of the development of this feature, and may require a system that utilizes postgresql SKIP LOCKED, similar to how Graphile Worker works.
Other considerations
Another thing to be aware of as we develop this feature is if we do ever add things like interactive webhook delivery or HTTP triggers, we will need to support HTTP handlers even for long-running servers.
Also, #400 will also probably have some aspects of a pull-based queue so these techniques will need be developed for either feature.
Beta Was this translation helpful? Give feedback.
All reactions