You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In run_eval.py,
results["overall"] = {'PSR': sum(sr)/len(sr),
"SR": sr.count(1.0)/len(sr),
"Precision": 1-sum(unchanged_conds)/sum(total_unchanged_conds),
"Exec": sum(exec_per_task)/len(exec_per_task)
}
Could you please explain which is "SR", "Exec", "GSR" in the paper? Based on my understanding, SR is calculated by "PSR" or "SR" , "Exec" is obtained by "Exec" in the code. But how to get "GCR"? Is that same as "Precision"? Checking if the executor keeps the states which should keep unchanged during the whole set of executions, unchanged, and translating it into the overlapping between the final achieved state g' and ground truth final state g.
The text was updated successfully, but these errors were encountered:
GCR (goal condition recall) = PSR (partial success rate), we additionally have precision metric, which was mostly 100% for all agents (meaning all agents mostly do task-relevant actions only), so we didn't report it in the paper. Yes, it keeps track of unchanged states and only evaluates based on changes that happened in the final state over the execution.
In run_eval.py,
results["overall"] = {'PSR': sum(sr)/len(sr),
"SR": sr.count(1.0)/len(sr),
"Precision": 1-sum(unchanged_conds)/sum(total_unchanged_conds),
"Exec": sum(exec_per_task)/len(exec_per_task)
}
Could you please explain which is "SR", "Exec", "GSR" in the paper? Based on my understanding, SR is calculated by "PSR" or "SR" , "Exec" is obtained by "Exec" in the code. But how to get "GCR"? Is that same as "Precision"? Checking if the executor keeps the states which should keep unchanged during the whole set of executions, unchanged, and translating it into the overlapping between the final achieved state g' and ground truth final state g.
The text was updated successfully, but these errors were encountered: