evaluation metric #2

serenawame · 2023-10-31T02:41:08Z

In run_eval.py,
results["overall"] = {'PSR': sum(sr)/len(sr),
"SR": sr.count(1.0)/len(sr),
"Precision": 1-sum(unchanged_conds)/sum(total_unchanged_conds),
"Exec": sum(exec_per_task)/len(exec_per_task)
}
Could you please explain which is "SR", "Exec", "GSR" in the paper? Based on my understanding, SR is calculated by "PSR" or "SR" , "Exec" is obtained by "Exec" in the code. But how to get "GCR"? Is that same as "Precision"? Checking if the executor keeps the states which should keep unchanged during the whole set of executions, unchanged, and translating it into the overlapping between the final achieved state g' and ground truth final state g.

ishikasingh · 2023-11-02T16:18:41Z

GCR (goal condition recall) = PSR (partial success rate), we additionally have precision metric, which was mostly 100% for all agents (meaning all agents mostly do task-relevant actions only), so we didn't report it in the paper. Yes, it keeps track of unchanged states and only evaluates based on changes that happened in the final state over the execution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation metric #2

evaluation metric #2

serenawame commented Oct 31, 2023

ishikasingh commented Nov 2, 2023

evaluation metric #2

evaluation metric #2

Comments

serenawame commented Oct 31, 2023

ishikasingh commented Nov 2, 2023