Skip to content

Commit

Permalink
Enhancements to Docs
Browse files Browse the repository at this point in the history
  • Loading branch information
martinpeck committed Sep 2, 2024
1 parent b4b6f2c commit 0a26211
Show file tree
Hide file tree
Showing 6 changed files with 187 additions and 115 deletions.
69 changes: 42 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,12 @@ WARNING: This is a work in progress!
- [Overview](#overview)
- [What is the Azure OpenAI API Simulator?](#what-is-the-azure-openai-api-simulator)
- [Simulator Modes](#simulator-modes)
- [Record/Replay Mode](#recordreplay-mode)
- [Generator Mode](#generator-mode)
- [When to use the simulator](#when-to-use-the-simulator)
- [Read More](#read-more)
- [When to use the Azure OpenAI API Simulator](#when-to-use-the-azure-openai-api-simulator)
- [How to Get Started with the Azure OpenAI API Simulator](#how-to-get-started-with-the-azure-openai-api-simulator)
- [Running and Deploying the Azure OpenAI API Simulator](#running-and-deploying-the-azure-openai-api-simulator)
- [Configuring the Azure OpenAI API Simulator](#configuring-the-azure-openai-api-simulator)
- [Extending the Azure OpenAI API Simulator](#extending-the-azure-openai-api-simulator)
- [Contributing to the Azure OpenAI API Simulator](#contributing-to-the-azure-openai-api-simulator)
- [Changelog](#changelog)
- [Contributing](#contributing)
- [Trademarks](#trademarks)
Expand All @@ -24,9 +26,11 @@ WARNING: This is a work in progress!
### What is the Azure OpenAI API Simulator?

The Azure OpenAI API Simulator is a tool that allows you to easily deploy endpoints that simulate the OpenAI API.
A common use-case for the simulator is to test the behaviour your application under load. Let's illustrate this with an example...
A common use-case for the simulator is to test the behaviour your application under load, without making calls to the live OpenAI API endpoints.

Let's assume that you have build a chatbot that uses the OpenAI API to generate responses to user queries. Before your chatbot becomes popular, you want to ensure that it can handle a large number of users. One of the factors that will impact whether your chatbot can gracefully handle such load will be the way that your chatbot handles calls to OpenAI. However, when load testing your chatbot there are a number of reasons why you might not want to call the OpenAI API directly:
Let's illustrate this with an example...

Let's assume that you have built a chatbot that uses the OpenAI API to generate responses to user queries. Before your chatbot becomes popular, you want to ensure that it can handle a large number of users. One of the factors that will impact whether your chatbot can gracefully handle such load will be the way that your chatbot handles calls to OpenAI. However, when load testing your chatbot there are a number of reasons why you might not want to call the OpenAI API directly:

- **Cost**: The OpenAI API is a paid service, and running load tests against it can be expensive.
- **Consistency**: The OpenAI API is a live service, and the responses you get back can change over time. This can make it difficult to compare the results of load tests run at different times.
Expand All @@ -35,49 +39,60 @@ Let's assume that you have build a chatbot that uses the OpenAI API to generate

In fact, when considering Rate Limits and Latency, these might be things that you'd like to control. You may also want to inject latency, or inject rate limit issues, so that you can test how your chatbot deals with these issues.

**This is where the Azure OpenAI API Simulator plays it's part!**
**This is where the Azure OpenAI API Simulator plays its part!**

By using the Azure OpenAI API Simulator, instead of the live OpenAI API, you can reduce the cost of running load tests against the OpenAI API and ensure that your application behaves as expected under different conditions.

By using the Azure OpenAI API Simulator, you can reduce the cost of running load tests against the OpenAI API and ensure that your application behaves as expected under different conditions.
The `Azure OpenAI API Simulator presents the same interface as the live OpenAI API, allowing you to easily switch between the two, and then gives you full control over the responses that are returned.

### Simulator Modes

The simulator has two approaches to simulating API responses: record/replay and generators.
If you don't have any requirements around the content of the responses, the generator approach is probably the easiest for you to use.
If you need to simulate specific responses, then the record/replay approach is likely the best fit for you.
The Azure OpenAI API Simulator has two approaches to simulating API responses:

#### Record/Replay Mode
1. **Generator Mode** - If you don't have any requirements around the content of the responses, the **Generator** approach is probably the easiest for you to use.
2. **Record/Replay Mode** - If you need to simulate specific responses, then the **Record/Replay** approach is likely the best fit for you.

With record/replay, the API can be run in record mode to act as a proxy between your application and Azure OpenAI, and it will record requests that are sent to it along with the corresponding response from OpenAI.
#### Generator Mode

When run in Generator mode the Azure OpenAI API Simulator will create responses to requests on the fly. This mode is useful for load testing scenarios where it would be costly/impractical to record the full set of responses, or where the content of the response is not critical to the load testing.

![Simulator in generator mode](./docs/images/mode-generate.drawio.png "The Simulator in generate mode showing lorem ipsum generated content in the response")

#### Record/Replay Mode

With record/replay, the Azure OpenAI API Simulator is set up to act as a proxy between your application and Azure OpenAI. The Azure OpenAI API Simulator will then record requests that are sent to it along with the corresponding response from OpenAI API.

![Simulator in record mode](./docs/images/mode-record.drawio.png "The Simulator in record mode proxying requests to Azure OpenAI and persisting the responses to disk")

Once recorded, the API can be run in replay mode to use the saved responses without forwarding to Azure OpenAI. The recordings are stored in YAML files which can be edited if you want to customise the responses.
Once a set of recordings have been made, the Azure OpenAI API Simulator can then be run in replay mode where it uses these saved responses without forwarding anything to the OpenAI API.

![Simulator in replay mode](./docs/images/mode-replay.drawio.png "The Simulator in replay mode reading responses from disk and returning them to the client")
Recordings are stored in YAML files which can be edited if you want to customise the responses.

#### Generator Mode
![Simulator in replay mode](./docs/images/mode-replay.drawio.png "The Simulator in replay mode reading responses from disk and returning them to the client")

The simulated API can also be run in generator mode, where responses are generated on the fly. This is useful for load testing scenarios where it would be costly/impractical to record the full set of responses.
## When to use the Azure OpenAI API Simulator

![Simulator in generator mode](./docs/images/mode-generate.drawio.png "The Simulator in generate mode showing lorem ipsum generated content in the response")
The Azure OpenAI API Simulator has been used in the following scenarios:

## When to use the simulator
- **Load Testing**: The Azure OpenAI API Simulator can be used to simulate the Azure OpenAI API in a development environment, allowing you to test how your application behaves under load. This can be useful both to save money when load testing or to allow you to test scaling the system beyond the Azure OpenAI capacity available in your development environment
- **Integration Testing**: The Azure OpenAI API Simulator can be used to run integration tests, for example in CI builds without needing to have credentials for an Azure OpenAI endpoint

The simulator has been used in the following scenarios:
The Azure OpenAI API Simulator is not a replacement for testing against the real Azure OpenAI API, but it can be a useful tool in your testing toolbox.

- **Load Testing**: The simulator can be used to simulate the Azure OpenAI API in a development environment, allowing you to test how your application behaves under load. This can be useful both to save money when load testing or to allow you to test scaling the system beyond the Azure OpenAI capacity available in your development environment
- **Integration Testing**: The simulator can be used to run integration tests, for example in CI builds without needing to have credentials for an Azure OpenAI endpoint
## How to Get Started with the Azure OpenAI API Simulator

The simulator is not a replacement for testing against the real Azure OpenAI API, but it can be a useful tool in your testing toolbox.
### Running and Deploying the Azure OpenAI API Simulator
To document [Running and Deploying the Azure OpenAI API Simulator](./docs/running-deploying.md) includes instructions on running the Azure OpenAI API Simulator locally, packaging and deploying it in a Docker container, and also deploying the Azure OpenAI API Simulator to Azure Container Apps.

## Read More
### Configuring the Azure OpenAI API Simulator
The behavious of the Azure OpenAI API Simulator is controlled via a range of [Azure OpenAI API Simulator Configuration Options](./docs/config.md).

There are various options for [running and deploying](./docs/running-deploying.md) the simulator, including running in Docker and deploying to Azure Container Apps.
### Extending the Azure OpenAI API Simulator
There are also a number of [Azure OpenAI API Simulator Extensions](./docs/extensions.md) that allow you to customise the behaviour of the Azure OpenAI API Simulator. Extensions can be used to modify the request/response, add latency, or even generate responses.

There are a range of [configuration options](./docs/config.md) that can be applied to control the simulator behavior.
### Contributing to the Azure OpenAI API Simulator

The simulated API supports [extensions](./docs/extensions.md) that allow you to customise the behaviour of the API. Extensions can be used to modify the request/response, add latency, or even generate responses.
Finally, if you're looking to contribute to the Azure OpenAI API Simulator you should refer to the [Azure OpenAI API Simulator Development Guide](./docs/developing.md).

## Changelog

Expand Down
73 changes: 29 additions & 44 deletions docs/config.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,22 @@
# Configuring the simulator

- [Configuring the simulator](#configuring-the-simulator)
- [Environment variables](#environment-variables)
- [Latency](#latency)
- [Rate Limiting](#rate-limiting)
- [Large recordings](#large-recordings)
# Configuring the Azure OpenAI API Simulator

- [Configuring the Azure OpenAI API Simulator](#configuring-the-azure-openai-api-simulator)
- [Environment Variables](#environment-variables)
- [Setting Environment Variables via the `.env` File](#setting-environment-variables-via-the-env-file)
- [Configuring Latency](#configuring-latency)
- [Configuring Rate Limiting](#configuring-rate-limiting)
- [Open Telemetry Configuration](#open-telemetry-configuration)
- [Config API Endpoint](#config-api-endpoint)
- [Open Telemetry](#open-telemetry)

There are a number of [environment variables](#environment-variables) that can be used to configure the simulator.
There are a number of [environment variables](#environment-variables) that can be used to configure the Azure OpenAI API Simulator.

Additionally, some configuration can be changed while the simulator is running using the [config endpoint](#config-endpoint).

## Environment variables
## Environment Variables

When running the simulated API, there are a number of environment variables to configure:
When running the Azure OpenAI API Simulator, there are a number of environment variables to configure:

| Variable | Description |
| Variable | Description |
| ------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `SIMULATOR_MODE` | The mode the simulator should run in. Current options are `record`, `replay`, and `generate`. |
| `SIMULATOR_API_KEY` | The API key used by the simulator to authenticate requests. If not specified a key is auto-generated (see the logs). It is recommended to set a deterministic key value in `.env` |
Expand All @@ -30,31 +31,21 @@ When running the simulated API, there are a number of environment variables to c
| `EXTENSION_PATH` | The path to a Python file that contains the extension configuration. This can be a single python file or a package folder - see [Extending the simulator](./extending.md) |
| `AZURE_OPENAI_DEPLOYMENT` | Used by the test app to set the name of the deployed model in your Azure OpenAI service. Use a gpt-35-turbo-instruct deployment. |

The examples below show passing environment variables to the API directly on the command line, but when running locally you can also set them via a `.env` file in the root directory for convenience (see the `sample.env` for a starting point).
The `.http` files for testing the endpoints also use the `.env` file to set the environment variables for calling the API.

> Note: when running the simulator it will auto-generate an API Key. This needs to be passed to the API when making requests. To avoid the API Key changing each time the simulator is run, set the `SIMULATOR_API_KEY` environment variable to a fixed value.
### Setting Environment Variables via the `.env` File

To run the simulated API, run `make run-simulated-api` from the repo root directory using the environment variables above to configure.
You can set the environment variables in the shell before running the simulator, or on the command line before running commands.

For example, to use the API in record/replay mode:
However, when running the Azure OpenAI API Simulator locally you may find it more convinient to set them via a `.env` file in the root directory.

```bash
# Run the API in record mode
SIMULATOR_MODE=record AZURE_OPENAI_ENDPOINT=https://mysvc.openai.azure.com/ AZURE_OPENAI_KEY=your-api-key make run-simulated-api
The file `sample.env` lives in the root of this repository, and provides a starting point for the environment variables you may want to set. Copy this file, rename the copy to `.env`, and update the values as needed.

# Run the API in replay mode
SIMULATOR_MODE=replay make run-simulated-api
```
The `.http` files for testing the endpoints also use the `.env` file to set the environment variables for calling the API.

To run the API in generator mode, you can set the `SIMULATOR_MODE` environment variable to `generate` and run the API as above.
> Note: when running the simulator it will auto-generate an API Key. This needs to be passed to the API when making requests. To avoid the API Key changing each time the simulator is run, set the `SIMULATOR_API_KEY` environment variable to a fixed value.
```bash
# Run the API in generator mode
SIMULATOR_MODE=generate make run-simulated-api
```

## Latency
## Configuring Latency

When running in `record` mode, the simulator captures the duration of the forwarded response.
This is stored in the recording file and used to add latency to requests in `replay` mode.
Expand All @@ -76,9 +67,10 @@ The default values are:
| `LATENCY_OPENAI_COMPLETIONS` | 15 | 2 |
| `LATENCY_OPENAI_CHAT_COMPLETIONS` | 19 | 6 |

## Rate Limiting
## Configuring Rate Limiting

The simulator contains built-in rate limiting for OpenAI endpoints but this is still being refined.

The current implementation is a combination of token- and request-based rate-limiting.

To control the rate-limiting, set the `OPENAI_DEPLOYMENT_CONFIG_PATH` environment variable to the path to a JSON config file that defines the deployments and associated models and token limits. An example config file is shown below.
Expand All @@ -100,13 +92,14 @@ To control the rate-limiting, set the `OPENAI_DEPLOYMENT_CONFIG_PATH` environmen
}
```

## Large recordings
## Open Telemetry Configuration

By default, the simulator saves the recording file after each new recorded request in `record` mode.
If you need to create a large recording, you may want to turn off the autosave feature to improve performance.

With autosave off, you can save the recording manually by sending a `POST` request to `/++/save-recordings` to save the recordings files once you have made all the requests you want to capture. You can do this using ` curl localhost:8000/++/save-recordings -X POST`.
The simulator supports a set of basic Open Telemetry configuration options. These are:

| Variable| Description |
| ------- | ----------- |
| `OTEL_SERVICE_NAME`| Sets the value of the service name reported to Open Telemetry. Defaults to `aoai-api-simulator`|
| `OTEL_METRIC_EXPORT_INTERVAL`| The time interval (in milliseconds) between the start of two export attempts..|

## Config API Endpoint

Expand All @@ -126,12 +119,4 @@ For example, the following request will update the mean latency for OpenAI embed

```json
{"latency": {"open_ai_embeddings": {"mean": 1000}}}
```
## Open Telemetry

The simulator supports a set of basic Open Telemetry configuration options. These are:

| Variable| Description |
| ------- | ----------- |
| `OTEL_SERVICE_NAME`| Sets the value of the service name reported to Open Telemetry. Defaults to `aoai-api-simulator`|
| `OTEL_METRIC_EXPORT_INTERVAL`| The time interval (in milliseconds) between the start of two export attempts..|
```
Loading

0 comments on commit 0a26211

Please sign in to comment.