-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Safety check submission input #399
Open
EneaGore
wants to merge
10
commits into
develop
Choose a base branch
from
security-prompt
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
EneaGore
changed the title
add keyword, fuzzy match and embeddings
Safety check submission input
Jan 20, 2025
EneaGore
temporarily deployed
to
athena-test1.ase.cit.tum.de
January 20, 2025 17:47 — with
GitHub Actions
Inactive
github-actions
bot
added
lock:athena-test1
Is currently deployed to Athena Test Server 1
and removed
deploy:athena-test1
Athena Test Server 1
labels
Jan 20, 2025
EneaGore
added
deploy:athena-test1
Athena Test Server 1
and removed
lock:athena-test1
Is currently deployed to Athena Test Server 1
labels
Jan 20, 2025
EneaGore
temporarily deployed
to
athena-test1.ase.cit.tum.de
January 20, 2025 18:28 — with
GitHub Actions
Inactive
github-actions
bot
added
lock:athena-test1
Is currently deployed to Athena Test Server 1
and removed
deploy:athena-test1
Athena Test Server 1
labels
Jan 20, 2025
LeonWehrhahn
removed
the
lock:athena-test1
Is currently deployed to Athena Test Server 1
label
Jan 20, 2025
EneaGore
temporarily deployed
to
athena-test1.ase.cit.tum.de
January 20, 2025 21:54 — with
GitHub Actions
Inactive
github-actions
bot
added
lock:athena-test1
Is currently deployed to Athena Test Server 1
and removed
deploy:athena-test1
Athena Test Server 1
labels
Jan 20, 2025
EneaGore
added
deploy:athena-test1
Athena Test Server 1
and removed
lock:athena-test1
Is currently deployed to Athena Test Server 1
labels
Jan 20, 2025
EneaGore
temporarily deployed
to
athena-test1.ase.cit.tum.de
January 20, 2025 22:32 — with
GitHub Actions
Inactive
github-actions
bot
added
the
lock:athena-test1
Is currently deployed to Athena Test Server 1
label
Jan 20, 2025
EneaGore
removed
the
lock:athena-test1
Is currently deployed to Athena Test Server 1
label
Jan 21, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation and Context
Rogue prompts can sometimes lead the LLM to produce unusual responses, including unfairly awarding credits or providing information which it shouldn't have.
Description
This PR introduces a mechanism to handle such prompts by leveraging a set of keywords and phrases to generate embeddings. These keywords are compared to the submission using fuzzy matching, and the embeddings are compared with the submission's embeddings using cosine similarity. If the combined score exceeds a configurable threshold, a secondary check is triggered from the LLM to confirm or deny the suspicion.
When the suspicion is confirmed, the system returns a single unreferenced feedback message that addresses content policy concerns.
The keywords are stored in an encrypted file. The encryption key must be provided in the .env to decrypt the file.
Steps for Testing
Attempt to manipulate the prompt to test the system. The response should be an unreferenced feedback message addressing the content policy.
Testserver States
Note
These badges show the state of the test servers.
Green = Currently available, Red = Currently locked
Click on the badges to get to the test servers.
Screenshots