-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to expose how loaded waitress is #182
Comments
No, waitress at this point in time doesn't make any of that sort of information available. |
As a Prometheus and Waitress user this would be great! Obviously waitress should not depend on |
Waitress will currently log when there are more requests in the queue than there are threads to process said requests. It is logged under |
It might not be a terrible idea to allow people to interrogate for this info via a signal handler and stdout/stderr and let people build on that. |
It would be a great idea. Also would suggest adding traceback for all threads. |
How about making |
Is safe to check the d = Gauge('waitress_task_queue_length', 'Waitress task queue length')
d.set_function(lambda: len(server.task_dispatcher.queue)) How to get the number of connections? The |
Use prometheus to monitor the queue is not a viable solution, because the Maybe use statsd is the better solution. |
It doesn't seem like there is consensus on the technology to use to report metrics. One way to tackle the problem could be to not pick any technology but instead provide a set of signals users could use. By signals I mean something like blinker signals, not unix signals. Waitress would only need to define a set a signals like "new connection", "started task", "finished task"... and users would handle these signals if they want to derive metrics useful to them. Popular integrations like statsd or Prometheus could be created in a contrib part of waitress or even in a separate repository because signals would act as an API between core waitress and the integration. This is the approach used by Flask, Django, Celery and many more to provide developers a way to integrate and extend the frameworks. For a very down to earth example on how this could be used to provide metrics, see how I used signals in a project of mine to create an integration with Datadog. |
Here is a simple monkey patch solution I have used. https://gist.github.com/pauls-username/56744f5bd43b531a348bb59c36541cd7 There are probably more metrics that you could add using this method. The key is getting the server instance. Maybe adding some sort of a hook to get the server instance (without using monkey patching) would be a good official solution? |
We really should have an optional export of open metrics for Waitress. That said, the question of whether to serve the I like the monkeypatch code sample a lot, but — seconding what the previous commenter said — the code should also start a separate HTTP server using the Prometheus routine to serve Without metrics, Waitress is unobservable and that does not bode well fro production. |
Practically, this is not a problem. Monitoring is not for the point in time where the server doesn't serve anymore. Then, you know that already. That's the reason you are looking at the monitoring data in the first place. Monitoring data is an indicator what may be the reason and how fast you got there. Yes, the In most implementations it doesn't matter if its in the same HTTP server or a different one. They consume the same resources. The overhead of a different process is probably higher. Serving it in the same process Pythons GIL may play a role in the game. So my personal assessment is: Metrics consuming a small part of the resources is better than to have no metrics at all. |
Agreed on the assessment — note that serving on a different port allows metrics to still be collected if the main server is overloaded in the specific sense that the main port has an enormous queue. |
I don't quite understand why this is the case. Can't Prometheus expose a format like that? I'm thinking of https://prometheus.io/docs/tutorials/understanding_metric_types/#gauge |
FYI you can create and serve Prometheus metrics without depending on any library. They're simple lines of text your program serves on a port.
It is viable, you just have to serve them on a different port, and possibly with a priority queue, or maybe on a separate thread entirely. For the record, this is standard practice at Google. You don't serve your production traffic on the same port as your metrics traffic, because otherwise you don't get any metrics when you are swamped in production. Different buses for different purposes.
That's push, and therefore it's pretty hard to get right (many corner cases with push based solutions, especially on dynamic service stacks). You miss metrics sometimes, sometimes you don't know that you are supposed to be getting metrics, et cetera. With systems like Prometheus, if the service isn't giving you metrics, you will get a metric that will fire an alert telling you that the service is down, absolutely no exceptins. |
FYI I am using Prometheus today to monitor Plone (waitress-based) servers, and this works quite fine to obtain Zope metrics (of course, until the server is swamped, but then I also have Varnish and NginX metrics to tell me the other aspects of the story). |
waitress is interested in exposing a hook feature that folks could tie into prometheus or whatever library/system they want but the issue has been (in order of priority highest to lowest):
|
I don't have a particular opinion on issues 2 and 3, but on issue 1 I think we need at least:
if not more and in histogram form. On the Prometheus side this is usually accomplished by directly embedding calls to the respective metrics accumulators in opportune places within the code. With Python being a dynamic language, it should be possible to conditionally import and set up e.g. the Prometheus library if it successfully imported. I understand that waitress the project does not want to gain this dependency tho. But the amount of hooks that need to be added for implementers of waitress-based services can be daunting as well, and it's not necessarily going to lead to more code clarity. Hooks may just be too low-level. Perhaps there is a way to implement this all using Zope-style adapters / interfaces, have waitress call the right methods of such code at the right times, and then the implementer can register the right adapter class (Prometheus / statsd / collectd) which in turn concretely implements listening on a socket and serving |
I really think hooks/signals is a great way to solve the problem at hand.
But there are dozens of monitoring libraries and products. I get that Prometheus is popular right now but so is Datadog and statsd. Obliviously waitress cannot support all that in the long run. So it is better to expose low-levels hooks and let others use them to create extensions for each monitoring product.
I don't think it is that hard of a task, as @mmerickel said, most of the HTTP metrics should really be computed by WSGI middlewares. Waitress should really focus on the low-level stuff that is not accessible any place else.
With hooks waitress doesn't really have to compute any metrics. Instead it lets the subscriber know that something happened (eg. new connection in queue, connection out of queue...). The subscriber is then responsible for computing the metrics it wants with that information.
I don't want to beat a dead horse but signals, a la blinker, solve this problem quite elegantly by decoupling publisher from subscriber. Moreover it is a common pattern used by tons of high profile libraries (Flask, Django, Celery...) that most Python developers are already familiar with.
I get that, for signals there are two solutions:
|
I would favor that approach too. |
Maintainers, if you are willing to entertain the idea of solving this problems via signals, I can get started on a PR to show how it would look like. |
@NicolasLM I'd love it if you got into this. The conversation above is correct that it impacts quite a lot of the codebase... if you wanted to propose the general API first it would be great for buy-in before you have to get too deep into things. My primary concerns are:
Basically all of those points are related to the API and what's global vs per-server and limiting the global part. :-) Maybe it's as simple as defining an interface and then passing an adapter into I'm not sure what prior art exists here in other wsgi servers, I think aiohttp has some when I looked a while ago. |
IT just occurred to me that I would also like to have the connection count metric (as well as the queue count) simply because that lets me do clean blue/green deployment — I won't take down a waitress-based backend (which has already been removed from the frontend proxy) until the connection count reduces to 1. |
If it is the goal to keep waitress free from the external dependencies then the approach in romank0#1 can be used. That implementation is far from being generic and it is mostly to demonstrate the approach and to be able to discuss options how to add a posibility to monitor without coupling to a specific provider. It defines the interface that the waitress user can implement to collect metrics if they need. If it is ok to introduce dependencies I would use opentelemetry API. waitress would depend on the opentelementry metrics API and users can plug in their favourite collectors. |
I would vote for this as a preferable solution. |
for anyone who might need it, I repurposed this monkeypatch solution to use a simple class to expose the metrics to me wherever I need them. https://github.com/Haidra-Org/AI-Horde/blob/1e2629a73ddbfa873a61b5ae849b19662b4c591d/server.py#L25 |
Is it possible to grab Waitress metrics (how many channels currently active, how many threads active, how many TCP connections open, for example)?
Is this currently possible? If not, I think it'd be a good feature :-)
(My plan is to shove them in prometheus_client, which can compute averages, monitor for high watermark breaches and trigger Kubernetes pod scaling, etc.)
The text was updated successfully, but these errors were encountered: