Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add denylist to stats logger #25118

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/docs/installation/event-logging.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -56,5 +56,13 @@ from superset.stats_logger import StatsdStatsLogger
STATS_LOGGER = StatsdStatsLogger(host='localhost', port=8125, prefix='superset')
```

You can also pass metric names that should be ignored via an optional `metric_denylist` parameter:

```
STATS_LOGGER = StatsdStatsLogger(host='localhost', port=8125, prefix='superset', metric_denylist={"reports.scheduler"})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's better to promote metrics_denylist to be a config that can be overridden.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like

Suggested change
STATS_LOGGER = StatsdStatsLogger(host='localhost', port=8125, prefix='superset', metric_denylist={"reports.scheduler"})
STATS_LOGGER_METRICS_DENYLIST: Set[str] = {}
STATS_LOGGER = StatsdStatsLogger(host='localhost', port=8125, prefix='superset', metrics_denylist=STATS_LOGGER_METRICS_DENYLIST)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that be the same as just overwriting the STATS_LOGGER with different arguments?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zephyring or @craig-rueda mentioned a good point that there's no guarantee that a metric name might not change. What do you all think about having an enum of strings that we can reference instead? It will have to be maintained and updated of course, but at least then we can export it and refer to the enum in the config rather than the string.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If promoting to be a config, we can just override the config and use the default StatsdStatsLogger logger.
If later on there's any default in the denylist that will be easier to reference and merge/override also

```

This would log all metrics, except `reports.scheduler`.

Note that it’s also possible to implement you own logger by deriving
`superset.stats_logger.BaseStatsLogger`.
Empty file added superset/models/actor.py
Empty file.
62 changes: 43 additions & 19 deletions superset/stats_logger.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,9 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
from __future__ import annotations

import logging
from typing import Optional

from colorama import Fore, Style

Expand All @@ -25,8 +26,19 @@
class BaseStatsLogger:
"""Base class for logging realtime events"""

def __init__(self, prefix: str = "superset") -> None:
prefix: str
metric_denylist: set[str]

def should_log(self, key: str) -> bool:
return key not in self.metric_denylist

def __init__(
self,
prefix: str = "superset",
metric_denylist: set[str] | None = None,
) -> None:
self.prefix = prefix
self.metric_denylist = metric_denylist or set()

def key(self, key: str) -> str:
if self.prefix:
Expand All @@ -51,24 +63,30 @@ def gauge(self, key: str, value: float) -> None:

class DummyStatsLogger(BaseStatsLogger):
def incr(self, key: str) -> None:
logger.debug(Fore.CYAN + "[stats_logger] (incr) " + key + Style.RESET_ALL)
if self.should_log(key):
logger.debug(Fore.CYAN + "[stats_logger] (incr) " + key + Style.RESET_ALL)

def decr(self, key: str) -> None:
logger.debug(Fore.CYAN + "[stats_logger] (decr) " + key + Style.RESET_ALL)
if self.should_log(key):
logger.debug(Fore.CYAN + "[stats_logger] (decr) " + key + Style.RESET_ALL)

def timing(self, key: str, value: float) -> None:
logger.debug(
Fore.CYAN + f"[stats_logger] (timing) {key} | {value} " + Style.RESET_ALL
)
if self.should_log(key):
logger.debug(
Fore.CYAN
+ f"[stats_logger] (timing) {key} | {value} "
+ Style.RESET_ALL
)

def gauge(self, key: str, value: float) -> None:
logger.debug(
Fore.CYAN
+ "[stats_logger] (gauge) "
+ f"{key}"
+ f"{value}"
+ Style.RESET_ALL
)
if self.should_log(key):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@villebro so anyone who wants to use this denylist is going to have to implement the functionality in their own statslogger if they don't use the dummy. Should we put something in UPDATING to explain how to do this or do you think it's evident enough when reading this local example?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, maybe we should just add it to StatsdStatsLogger, and remove it from BaseStatsLogger. This will cover the vanilla StatsD use case, and should serve as an example for anyone who wants similar functionality in their own custom logger class. Thoughts @eschutho

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds like a good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to comments above.
I don't find much value in having the deny list and find def should_log sufficient for filtering and good enough example on how to define a custom logger with filtering.

But that's just a minor preference and current solution is fine by me as well

logger.debug(
Fore.CYAN
+ "[stats_logger] (gauge) "
+ f"{key}"
+ f"{value}"
+ Style.RESET_ALL
)


try:
Expand All @@ -80,30 +98,36 @@ def __init__( # pylint: disable=super-init-not-called
host: str = "localhost",
port: int = 8125,
prefix: str = "superset",
statsd_client: Optional[StatsClient] = None,
statsd_client: StatsClient | None = None,
metric_denylist: set[str] | None = None,
) -> None:
"""
Initializes from either params or a supplied, pre-constructed statsd client.

If statsd_client argument is given, all other arguments are ignored and the
supplied client will be used to emit metrics.
"""
super().__init__(metric_denylist=metric_denylist)
if statsd_client:
self.client = statsd_client
else:
self.client = StatsClient(host=host, port=port, prefix=prefix)

def incr(self, key: str) -> None:
self.client.incr(key)
if self.should_log(key):
self.client.incr(key)

def decr(self, key: str) -> None:
self.client.decr(key)
if self.should_log(key):
self.client.decr(key)

def timing(self, key: str, value: float) -> None:
self.client.timing(key, value)
if self.should_log(key):
self.client.timing(key, value)

def gauge(self, key: str, value: float) -> None:
self.client.gauge(key, value)
if self.should_log(key):
self.client.gauge(key, value)

except Exception: # pylint: disable=broad-except
pass
9 changes: 9 additions & 0 deletions tests/integration_tests/stats_logger_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,12 @@ def test_init_with_params(self):

stats_logger = StatsdStatsLogger()
self.verify_client_calls(stats_logger, mock_client)

def test_with_metric_deny_list(self):
client = Mock()
stats_logger = StatsdStatsLogger(statsd_client=client, metric_denylist={"foo1"})
stats_logger.incr("foo1")
client.incr.assert_not_called()
stats_logger.decr("foo2")
client.decr.assert_called_once()
client.decr.assert_called_with("foo2")