Authors:
Akshay Pundle, Google Privacy Sandbox
The Protected Audience API (Android, Chrome) proposes multiple real-time services (e.g. Bidding and Auction and Key/Value services) running in a trusted execution environment (TEE). These are isolated environments for securely processing sensitive data with limited data egress. Debugging is essential for running a production system. Reproducing issues and finding root causes is essential to the iterative process of making a high quality robust software. An isolated system running in a TEE poses unique debugging challenges due to limited data inspection.
In this document, we propose various methods of debugging TEE servers while retaining the privacy characteristics. We introduce three ways for AdTechs to debug the system:
- AdTech consented debugging: Provides a way for AdTechs to gain access to debugging data and requests that can be replayed on local systems
- Local, Debug mode: Provides a way for AdTechs to use standard debugging tools. This can be used along with AdTech consented debugging to replay and debug requests.
- Aggregate error reporting: Provides aggregate error counts for tracking errors in production systems.
AdTech consented debugging enables AdTechs to get plaintext requests and other debugging data from their servers. The Protected Audience API normally protects this data and it is not available to the AdTechs, or anyone else other than the user who owns the data. This feature is aimed at AdTech developers looking to debug the system, and not regular users of the system or API.
The main use case here is when an AdTech wants to debug issues, they (the AdTech developer) act as the user, and grant consent on their client by initiating a special mode in Chrome or Android, thereby making their request and associated data available for debugging. A user (in this case, the AdTech developer) is fully in control of their Protected Audience API data, and chooses to make it available for debugging; debug reporting will not be provided for requests from any users besides those who have enabled this special mode. This consent allows the system to collect the request in plaintext and the debug data, and provide it to the AdTech.
The production system collects logs, metrics, plaintext requests and other details for such consented requests and makes them available to the AdTech. Privacy impacting metrics (see Monitoring Protected Audience API Services) that are normally noised to protect privacy, will be collected without noise for these consented requests.
With logs and metrics, AdTechs have more information to find the source of the problem. They will also have the plaintext request that their server was called with. This means they can replay this request on a local (or debug) instance of the server, examine logs and use other standard debugging tools for finding misbehaving code and performance problems. This also means there is no need to replicate the entire set of servers involved in serving the Client request, since AdTech already has the request their servers were called with.
We will add ways for the AdTech developer to give consent in Chrome and Android to make data from their client available for debugging. Care will be taken to prevent abuse of the setting. The details are being designed and this explainer will be updated to reflect them in the future. Note that this feature is targeted at the AdTech developers, not the general Chrome or Android user.
When an AdTech wants to reproduce a problem, they will start Chrome or Android and give consent via the UI. This will enable the system to collect debug data. While giving consent, the AdTech developer will also need to provide a secret debug-token known only to the AdTech. This will help ensure that the set of debug requests is only available for debugging on that particular AdTech's servers.
Subsequent Protected Audience API requests from this client will include the consent and the secret debug-token in request to Protected Audience API servers. Consent and the secret debug-token will be encrypted as part of the request (request internals are not visible outside the TEE servers). If the server calls other servers, consent will propagate to all Protected Audience API TEE servers involved.
The AdTech will provide each Protected Audience API server the secret debug-token at startup. Upon receiving the request, the server will match the secret debug-token from the request with the secret debug-token it was started with. If the tokens match, the server will treat the request as a debug request. Otherwise the request will be treated as a normal request. The consent will be propagated to other Protected Audience API TEE servers in either case (whether the request is treated as debug or otherwise)
When a server determines a request to be a consented debug request, it will collect logs, granular metrics and the plaintext request. These will be published to the AdTech's cloud store, where they can access this information. Publishing this will happen through OpenTelemetry and will use the same channels as mentioned in the Monitoring Protected Audience API Services document. Privacy impacting metrics that are usually exported with noise will be published without noise for debug consented requests.
- A secret debug-token helps filter out requests originating from other AdTechs who may be debugging at the same time. This prevents intentional or unintentional spam that would have made it difficult to isolate requests that were sent by that particular AdTech.
- Without effective filtering, the associated cloud storage for an AdTech may get filled with requests from other AdTechs, even though they didn't do any debugging.
The Protected Audience API services will be runnable on the local machine. This means it will be possible to set up a local instance (e.g. B&A, K/V services) of a Protected Audience API server, send plaintext requests collected using the debugging mechanism described above, and use standard debugging tools (since this will not be running inside a TEE). This includes examining system logs, stack traces etc. Such a locally running server will not be able to decrypt or respond to production traffic.
The above described system of AdTech consented debugging gives the AdTech access to the requests in plaintext that their servers were called with. They can then set up local instances of servers and use these requests for further debugging and reproducing the problem.
TEE servers provide a special execution mode for debugging called Debug mode. In this mode, the server runs inside the TEE, but cannot attest with the production key management systems, so they cannot decrypt any production traffic. This mode provides some debug capability like console logs. Debug mode is useful for testing a more realistically deployed system compared to local testing. The above method of replay requests can also be used with servers launched in the debug mode.
We will provide cumulative error counts per server. These will include request level and system errors. We will not support errors directly originating from AdTech provided javascript execution due to security reasons. Aggregate errors will also be published using OpenTelemetry as mentioned in Monitoring Protected Audience API Services. Error counts may also be noised depending on the source of the error. For consented debug requests, the true error counts will be reported without noise.
The following errors will be available for tracking:
- Number of Requests responded: The number of requests that were not processed successfully and resulted in error.
- Number of Requests responded with error, partitioned by type of error: Similar to the above metric, but the data will be at the gRPC status granularity.
- Number of requests initiated by this server that resulted in an error, partitioned by type of error: This will count the number of errors that happened in requests initiated by the server (available at gRPC status granularity). E.g. for SFE, this would include errors that occurred when calling BFE servers.
- Number of requests initiated by this server that resulted in an error, partitioned by destination: Similar to the above, but partitioned by the request destination.