This repo contains an example cache for caching data with Redis.
Ask questions or report problems in the main Triton issues page.
If you don't have it installed already - install rapidjson-dev:
apt install rapidjson-dev
Use a recent cmake to build and run the following:
$ mkdir build
$ cd build
$ cmake -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install ..
$ make install
The following required Triton repositories will be pulled and used in the build. By default the "main" branch/tag will be used for each repo but the following CMake arguments can be used to override.
- triton-inference-server/core:
-D TRITON_CORE_REPO_TAG=[tag]
- triton-inference-server/common:
-D TRITON_COMMON_REPO_TAG=[tag]
In order for the Redis Cache to be deployed to triton, you must build the
binary (see build instructions), and copy the libtritoncache_redis.so
file
to the folder redis
in the cache directory on the server you are running
triton from, by default this will be /opt/tritonserver/caches
- but this can
be adjusted by use of the --cache-dir
CLI option as needed.
It is also required that Redis be running on a system reachable by Triton. There are many ways to deploy Redis, to learn how to get started with Redis look at Redis's getting started guide.
The cache is configured by the using --cache-config
CLI options.
The --cache-config
option is variadic, meaning it can be repeated multiple
times to set multiple configuration fields. The format of a --cache-config
option is <cache_name>,<key>=<value>
. At a minimum you must provide a host
and port
to allow the client to connect to Redis e.g. let's try connecting to
a redis instance living on the host redis-host
and listening on port 6379
:
tritonserver --cache-config redis,host=redis-host --cache-config redis,port=6379
Configuration Option | Required | Description | Default |
---|---|---|---|
host | Yes | The hostname or IP address of the server where Redis is running. | N/A |
port | Yes | The port number to connect to on the server. | N/A |
user | No | The username to use for authentication of the ACLs to the Redis Server | default |
password | No | The password to Redis. | N/A |
db | No | The db number to user. NOTE - use of the db number is considered an anti-pattern in Redis, so it is advised that you do not use this option | 0 |
connect_timeout | No | The maximum time, in milliseconds to wait for a connection to be established to Redis. 0 means wait forever | 0 |
socket_timeout | No | The maximum time, in milliseconds the client will wait for a response from Redis. 0 means wait forever | 0 |
pool_size | No | The number pooled connections to Redis the client will maintain. | 1 |
wait_timeout | No | The maximum time, in milliseconds to wait for a connection from the pool. | 1000 |
Optionally you may configure your user
/password
via environment variables. The corresponding user
environment variable is TRITONCACHE_REDIS_USERNAME
whereas the corresponding password
environment variable is TRITONCACHE_REDIS_PASSWORD
.
Transport Layer Security (TLS) can be enabled in Redis and within the Triton Redis Cache, to do so you will need a TLS
enabled version of Redis, e.g. OSS Redis or
Redis Enterprise. You will also need to configure Triton Server to use TLS with Redis
through the following --cache-config
TLS options.
Configuration Option | Required | Description |
---|---|---|
tls_enabled | Yes | set to true to enable TLS |
cert | no | The certificate to use for TLS. |
key | no | The certificate key to use for TLS. |
cacert | No | The Certificate Authority certificate to use for TLS. |
sni | No | Server name indication for TLS. |
There are many ways to go about monitoring what's going on in Redis. One popular mode is to export metrics data from Redis to Prometheus, and use Grafana to observe them.
- If you're using OSS Redis, use the Redis Exporter to export metrics from Redis into Prometheus.
- If you're using Redis Enterprise or Redis Cloud you can use the built-in integrations for Prometheus
You can try out the Redis Cache with Triton in docker:
- clone this repo:
git clone https://github.com/triton-inference-server/redis_cache
- follow build instructions enumerated above
- clone the Triton server repo:
git clone https://github.com/triton-inference-server
- Add the following to:
docs/examples/model_repository/densenet_onnx/config.pbtxt
response_cache{
enable:true
}
- cd into
redis_cache
- Install NVIDIA's container toolkit
- Create an account on NGC
- Log docker into to NVIDIA's container repository:
docker login nvcr.io
Username: $oauthtoken
Password: <MY API KEY>
NOTE: Username: $oauthtoken in this context means that your username is literally $oauthtoken - your API key serves as the unique part of your credentials
- run
docker-compose build
- run
docker-compose up
- In a separate terminal run
docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:23.06-py3-sdk
- Run
/workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
- on the first run - this will miss the cache
- subsequent runs will pull the inference out of the cache
- you can validate this by watching Redis with
docker exec -it redis_cache_triton-redis_1 redis-cli monitor