Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give the possibility to Anonymize the Body content sent to AI providers #734

Open
sguisse opened this issue Oct 14, 2024 · 2 comments
Open
Labels
enhancement New feature or request

Comments

@sguisse
Copy link

sguisse commented Oct 14, 2024

Describe the need of your request

It would be great to have a functionality which allow users to anonymize some part of the data (source files) sent to AI Providers.

Effectively, Companies don't want to send their code to a third party (even if protection is provided), so it is important for them that the code could not be associate with them.

And allow users to save sent body content in a local log file to be analyzed by security team.

Proposed solution

For the anonymization part, it could be done by adding a configuration table in the plugin settings.
This table contains 2 columns

  • one with a regexp to identify the pattern to anonymize and a
  • second with the action to do (shuffle, random, specific text)

image

Here is the link to the code I have started to do (branch feat/anonymization)

Additional context

To simplify the implementation, it would be great to have a centralized method in CodeGPT which is responsible to call the external AI Provider, Like that we could do the Anonymization and log in a single place ;-)

@sguisse sguisse added the enhancement New feature or request label Oct 14, 2024
@carlrobertoh
Copy link
Owner

Something I have never thought about, thank you! This could definitely benefit the community who are using the extension within their company (especially those who work with proprietary data). However, this seems like a very use case specific feature that I unfortunately can't find the time to implement, but I'm happy to accept PRs.

A few notes though:

From a UI/UX perspective, in my opinion, 'Editor Anonymizations' is a bit misleading, and I would rather call it 'Data Masking' or something similar. This is because there are many things not related to the editor, such as git commit message generation or even basic chatting. Also, I think the masking and log path configuration deserve their own settings page: Tools | CodeGPT | Security, since this is something that most users probably don't find that useful.

Implementation-wise, in theory, we could mask the data after the request body is built (the same place where you added the logging) and work on the final string. However, then there's always a risk of masking some important keys, which could break everything, but this seems to be avoidable if you're converting the final string back to a map and masking the values only.

@sguisse
Copy link
Author

sguisse commented Oct 24, 2024

Hello, thanks a lot for your feedback,

Yes the objective is to Anonymize data at sending, and deanonymize the result before display response in the TextField
The deanonymization will be done thanks to caching transformations.

I will take your remarks and modify my code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants