Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hallucination with log probs #281

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

hallucination with log probs #281

wants to merge 4 commits into from

Conversation

cotran2
Copy link
Contributor

@cotran2 cotran2 commented Nov 18, 2024

No description provided.

@cotran2 cotran2 self-assigned this Nov 18, 2024
tuple: A tuple containing two lists - filtered tokens and their corresponding probabilities.
"""
# Use regex to identify tokens without special characters
special_tokens = ["\\n", '{"', '":', ' "', '",', ' {"', '"}}\\n', " ", '"}}\n']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that exhaustive list, are there more to add? and how did you come up with this list?

Comment on lines +25 to +28
filtered_tokens = [token for token in tokens if token not in special_tokens]
filtered_probs = [
prob for token, prob in zip(tokens, probs) if token not in special_tokens
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could zip them in the beginning and then use zip(*) to unzip

combined_token = tokens[i] # Start a new combination

# Check if the combined token matches any parameter name
for func, params in parameter_names.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func => _

Comment on lines +90 to +92
if found_param:
break # Exit the outer loop if parameter was matched
i += 1 # Move to the next token if no match was found yet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so even if one parameter is matched we will exit the loop?

Comment on lines +59 to +61
combined_token += tokens[
i
] # Append next token to the current combination
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first param match will contain everything token from start to the matched token?

return property_name in parameter_info


def calculate_entropy(log_probs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add param type

- entropy (float): The calculated entropy.
- varentropy (float): The calculated variance of entropy.
"""
log_probs = torch.tensor(log_probs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't shadow variable

Comment on lines +188 to +190
entropy_thd: float = 0.7,
varentropy_thd: float = 4.0,
) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take default values from consts like ENTRYPOY_DEFAULT_THRESHOLD = 0.7

def hallucination_detect(
token: str,
log_probs: List[float],
current_state: Dict[str, Any],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the caller need to see current_state, if not then maybe make this func stack param

return log_probs.tolist(), entropy.item(), varentropy.item()


def hallucination_detect(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this current_state could be captured better if you declare rewrite it as class and move the state as class variables. I see that you didn't want to use global vars here and hence used state dict. But I think using class would be better here.

@adilhafeez adilhafeez marked this pull request as draft November 19, 2024 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants