Give the possibility to Anonymize the Body content sent to AI providers #734

sguisse · 2024-10-14T07:20:13Z

Describe the need of your request

It would be great to have a functionality which allow users to anonymize some part of the data (source files) sent to AI Providers.

Effectively, Companies don't want to send their code to a third party (even if protection is provided), so it is important for them that the code could not be associate with them.

And allow users to save sent body content in a local log file to be analyzed by security team.

Proposed solution

For the anonymization part, it could be done by adding a configuration table in the plugin settings.
This table contains 2 columns

one with a regexp to identify the pattern to anonymize and a
second with the action to do (shuffle, random, specific text)

Here is the link to the code I have started to do (branch feat/anonymization)

Additional context

To simplify the implementation, it would be great to have a centralized method in CodeGPT which is responsible to call the external AI Provider, Like that we could do the Anonymization and log in a single place ;-)

carlrobertoh · 2024-10-19T11:04:21Z

Something I have never thought about, thank you! This could definitely benefit the community who are using the extension within their company (especially those who work with proprietary data). However, this seems like a very use case specific feature that I unfortunately can't find the time to implement, but I'm happy to accept PRs.

A few notes though:

From a UI/UX perspective, in my opinion, 'Editor Anonymizations' is a bit misleading, and I would rather call it 'Data Masking' or something similar. This is because there are many things not related to the editor, such as git commit message generation or even basic chatting. Also, I think the masking and log path configuration deserve their own settings page: Tools | CodeGPT | Security, since this is something that most users probably don't find that useful.

Implementation-wise, in theory, we could mask the data after the request body is built (the same place where you added the logging) and work on the final string. However, then there's always a risk of masking some important keys, which could break everything, but this seems to be avoidable if you're converting the final string back to a map and masking the values only.

sguisse · 2024-10-24T07:16:44Z

Hello, thanks a lot for your feedback,

Yes the objective is to Anonymize data at sending, and deanonymize the result before display response in the TextField
The deanonymization will be done thanks to caching transformations.

I will take your remarks and modify my code

sguisse added the enhancement New feature or request label Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Give the possibility to Anonymize the Body content sent to AI providers #734

Give the possibility to Anonymize the Body content sent to AI providers #734

sguisse commented Oct 14, 2024

carlrobertoh commented Oct 19, 2024

sguisse commented Oct 24, 2024

Give the possibility to Anonymize the Body content sent to AI providers #734

Give the possibility to Anonymize the Body content sent to AI providers #734

Comments

sguisse commented Oct 14, 2024

Describe the need of your request

Proposed solution

Additional context

carlrobertoh commented Oct 19, 2024

sguisse commented Oct 24, 2024