Name	Name	Last commit message	Last commit date
parent directory ..
workload_simulator	workload_simulator
README.md	README.md
poetry.lock	poetry.lock
pyproject.toml	pyproject.toml

Workload Generator

About The Project

Benchmarking DP systems remains a major challenge and this difficulty is exacerbated when dealing with mixed workloads. We provide a configurable DP workload generator that can accommodate various request types and workload characteristics. The key to its flexibility lies in recognizing that any DP application can be represented as a combination of a small number of fundamental DP mechanisms. For example, training an ML model with DP-SGD can be represented as a combination of Gaussian mechanisms. By doing so, we are able to abstract a diverse range of real-world use cases into a more generalized setting. We make our workload generator available as an open-source tool, as we believe better benchmarks for complex mixed DP workloads are of independent interest.

Built With

(back to top)

Getting Started

To get a local copy up and running follow these simple steps.

Prerequisites

Python Poetry

curl -sSL https://install.python-poetry.org | python3 -

Make

Local clone of the repository (with submodules)

git clone --recurse-submodules [email protected]:pps-lab/cohere.git

Installation

To re-create the four workloads from the evaluation section in the paper:

# takes ~170 mins
make create-workloads

(back to top)

Usage

In workload-simulator/workload_simulator/main.py, we provide the workload configurations used in Cohere's evaluation. The function default_scenario() contains Cohere's evaluation scenario, while lineargaussian_workload(..), basic_workload(..), ml_workload(..), and mixed_workload(..) represent the four distinct workload configurations.

In the following, we outline the individual components to construct additional workloads.

Workload Configuration

At the core of the workload generator is the request distribution, which models diverse request families as a categorical distribution over request types. Request types (see Cat) are parameterized with data and privacy requirements, allowing for granular customization and fine-grained management of workload dynamics. Data requirements are sampled from a distribution over partitioning attributes (see e.g., IntervalPopulationSelection), defining a target population, and a categorical distribution over the percentage of that population to be requested (see CategoricalSampling). For example, a request type might require either 25% or 50% of all active users selected by the highly_partitioned selection strategy with equal probability. Privacy requirements specify a DP mechanism (e.g., Laplace Mechanism) with privacy costs sampled from a categorical distribution of possible costs (see CategoricalCostCalibration). For example, a request type might specify the Gaussian mechanism and an equal split between low and high privacy costs.

# define schema of partitioning attributes
schema = create_single_attribute_schema(domain_size=204800)

# define target population (via selection of partitioning attributes)
highly_partitioned = IntervalPopulationSelection(schema, beta_a=1, beta_b=10)

# request type requires 25% or 100% of all active users with equal probability
subsampling = CategoricalSampling([
    {"name": "25%", "prob": 0.5, "fraction": 0.25},
    {"name": "100%", "prob": 0.5, "fraction": 1},
])

# request type has privacy costs calibrated to eps=0.2 or eps=0.75 with equal probability
cost = CategoricalCostCalibration([
    {"name": "cost-1", "prob": 0.5, "epsilon": 0.2, "delta": 1.0e-9},
    {"name": "cost-2", "prob": 0.5, "epsilon": 0.75, "delta": 1.0e-9},
])

# workload consists of an equal combination (weight 1) of the Gaussian and Laplace Mechanism
workload = request.Workload(
    name,
    schema,
    [
        Cat(highly_partitioned, mlib.GaussianMechanism(cost, subsampling), 1),
        Cat(highly_partitioned, mlib.LaplaceMechanism(cost, subsampling), 1),
    ],
)

(back to top)

Scenario Configuration

Alongside the workload, the generator expects a deployment scenario that defines aspects such as the frequency of allocation rounds, arrival rates of users and requests.

As each system expects a certain request format, we provide adapters (see WorkloadVariationConfig) designed to transform samples into concrete requests for specific budget management systems (e.g., Cohere or PrivateKube). In addition, these workload variations can also be used to choose different allocation objectives.

Based on these configurations, the generator employs a discrete event simulator to produce random instances of workloads aligned with a specified deployment scenario.

# define a scenario
scenario = ScenarioConfig(
    name="40-1w-12w",
    allocation_interval=timedelta(weeks=1),
    active_time_window=timedelta(weeks=12), # ~3 months (quartal)
    user_expected_interarrival_time=timedelta(seconds=10), # 786k active users in 3 months
    request_expected_interarrival_time=timedelta(minutes=20), # resulting in 504 requests per week in expectation (batch)
    n_allocations=40,
    schema=schema,
)


# create workload variants with different utilities per request, and formatted for different systems (Cohere / PrivateKube)
workload_variations = WorkloadVariationConfig.product(
    default_utility_assigners(), all_mode_encoders(scenario)
)


# generate arriving users / requests according to the configuration, and create a json with all requests
simulation = Simulation(
        scenario=scenario,
        workloads=[workload],
        workload_variations=workload_variations,
        output_dir=output_dir,
)
run_in_parallel(simulation, n_repetitions=5)

(back to top)