Sample project that demonstrates how to the whylogs container with custom whylogs configuration to enable features like embeddings.
This only applies to the new python based whylogs container. The original java based whylogs container can't be extended like this.
If you want to run the container without any custom configuration then you don't have to worry about the Dockerfile or python code. You can just run the following
docker run -it --net=host --env-file local.env whylabs/whylogs:py-latest
With a local.env
file with your whylabs credentials.
First, you'll need
- A folder named
- A file named
- A variable in
of typeDict[str, DatasetOptions]
- Optional: Anything that should be deployed along with the container should go
. This might be some text/json files that you need to parse/consume for your schema creation logic, or maybe a custom library you depend on for making whylogs UDFs for constraints. - A Dockerfile that copies your
folder into the right spot (see the Dockerfile in this repo)
At startup, the container is going to try from ...whylogs_config.config import schemas
. If there is something there then it will be used to configure the whylogs loggers in the container. The schemas
var is a map from dataset id to configuration options. The container will be tied to a single org and api key.
After you create your python config (like this repo demonstrates) and you create a Dockerfile similar to the one in this repo you'll be able to build the image and run the container.
docker build . -t my-whylogs-container
docker run -it --net=host --env-file local.env my-whylogs-container
There are two types of config. Simple config that can be passed via env variables and custom config that is specified as python source and built into the container.
Here are the current env configuration options. These should be stored in a .env
file and passed to docker when running the container.
# Your WhyLabs org id
# An api key from the org above
# One of these two must be set
# Sets the container password to `password`. See the auth section for details
# If you don't care about password protecting the container then you can set this to True.
# Safeguard if you're using custom configuration to guarantee the container is correctly built to use it.
# The default dataset type to use between HOURLY and DAILY. This determines how data is grouped up into
# profiles before being uploaded. You need to make sure this matches what you configured the dataset as
# in your WhyLabs settings page.
# The frequency that uploads occur, being denoted in either minutes (M), hours (H), or days (D).
# The interval, given the configured cadence. Setting this to 15 with a cadence of M would result in uploads every 15 minutes.
This repo shows the project structure you need if you want to use custom configuration. You would need to use this type of configuration if your use case requires you to use a schema with whylogs. Some examples of use cases would be embeddings and segments. This repo demonstrates an embedding configuration in whylogs_config/
. The file path is important. Anything along side that file path will also be bundled with the container if you use the Dockerfile here, which is useful if you need to use embeddings. You can also use custom configuration to independently configure multiple datasets, rather than having them all fallback to the defaults set in the env variables.
The python dependencies in this package don't actually matter. They're just installed os that you can use an IDE to create the configuration file. You can use any of the dependencies that are already packaged in the base container, things like pandas, nympy, whylogs, etc.
If the container is configured to use a password then you'll have to send a special auth header along with requests. If the password is set to my-password
then the header (in curl format) would be
-H "Authorization: Bearer my-password"
There isn't a published client yet (coming soon), so requests can be made via http calls using the requests
module. See examples folder for a calling example. There are examples for logging normal tabular data as well as embeddings, which require custom configuration.
The container has a special endpoint that takes requests forwarded from Google pub\sub: /log-pubsub
and /log-pubsub-embeddings
. You'll send the same payloads that you would send to the container directly, except you'll send them to pub\sub and they'll be forwarded instead. Don't do any extra escaping on your json data.