Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluation metric #2

Open
serenawame opened this issue Oct 31, 2023 · 1 comment
Open

evaluation metric #2

serenawame opened this issue Oct 31, 2023 · 1 comment

Comments

@serenawame
Copy link

In run_eval.py,
results["overall"] = {'PSR': sum(sr)/len(sr),
"SR": sr.count(1.0)/len(sr),
"Precision": 1-sum(unchanged_conds)/sum(total_unchanged_conds),
"Exec": sum(exec_per_task)/len(exec_per_task)
}
Could you please explain which is "SR", "Exec", "GSR" in the paper? Based on my understanding, SR is calculated by "PSR" or "SR" , "Exec" is obtained by "Exec" in the code. But how to get "GCR"? Is that same as "Precision"? Checking if the executor keeps the states which should keep unchanged during the whole set of executions, unchanged, and translating it into the overlapping between the final achieved state g' and ground truth final state g.

@ishikasingh
Copy link

GCR (goal condition recall) = PSR (partial success rate), we additionally have precision metric, which was mostly 100% for all agents (meaning all agents mostly do task-relevant actions only), so we didn't report it in the paper. Yes, it keeps track of unchanged states and only evaluates based on changes that happened in the final state over the execution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants