You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for this wonderful dataset~ I ran into some issues while running the tests.
Problem and Analysis
During the execution of the testing code, the program seemed to get stuck. While debugging, I discovered that the code seemed to be retrying the greedy chatgpt calls, this happens when extract_answer attempts to extract an answer, and the model generates a blank reply as no answer is found.
In both cases, the number of retries is incredibly large (even infinite), and due to the greedy mode, the model will never generate the expected response. Therefore, this will stall the code and consume a lot of API quota.
I also noticed that the prompt didn't explicitly ask the model to output null when it can't extract, or to only output 0/1 without explanation when scoring, it seems few-shot example is not enough to restrict the format.
What I tried to fix
I added format requirement in the prompt:
For extract:
Directly output the extracted answer with no explanation.
For score:
Output the Judgement (0 or 1) DIRECTLY without any explanation.
In the code:
def get_chat_response(self, prompt, temperature=0, max_tokens=256, n=1, patience=1000, sleep_time=0):
messages = [
{"role": "user", "content": prompt},
]
payload = {"model": self.gpt_model, "messages": messages, "temperature": temperature, "max_tokens": max_tokens, "n": n}
while patience > 0:
patience -= 1
try:
response = self._post_request(payload)
if n == 1:
prediction = response["choices"][0]["message"]["content"].strip()
if prediction and prediction != "":
return prediction
else:
+ if temperature == 0:+ # no need to retry, greedy search always return the same result+ return ""
judgement = match_answer(save_inst, args.api_key, args.quick_match)
- while True:- if judgement.strip() not in ['0', '1']:+ if judgement[0] not in ['0', '1']:- judgement = match_answer(save_inst, args.api_key, args.quick_match)- else:- save_inst['judgement'] = int(judgement)- break+ print('Wrong return format: ', judgement)+ else:+ save_inst['judgement'] = int(judgement)
The above two issues are both alleviated.
Discuss
I am curious to know if others have encountered this issue, as I see it should be universal due to the nature of greediness. Also, I look forward to developers' feedback on my modifications, I am willing to submit a PR if my understanding and fix is correct.
Thanks!
The text was updated successfully, but these errors were encountered:
Hi, thanks for this wonderful dataset~ I ran into some issues while running the tests.
Problem and Analysis
During the execution of the testing code, the program seemed to get stuck. While debugging, I discovered that the code seemed to be retrying the greedy chatgpt calls, this happens when extract_answer attempts to extract an answer, and the model generates a blank reply as no answer is found.
And in a infinite loop in score_answer, there will be a retry as long as the output is not 0/1.
In both cases, the number of retries is incredibly large (even infinite), and due to the greedy mode, the model will never generate the expected response. Therefore, this will stall the code and consume a lot of API quota.
I also noticed that the prompt didn't explicitly ask the model to output null when it can't extract, or to only output 0/1 without explanation when scoring, it seems few-shot example is not enough to restrict the format.
What I tried to fix
I added format requirement in the prompt:
For extract:
For score:
In the code:
The above two issues are both alleviated.
Discuss
I am curious to know if others have encountered this issue, as I see it should be universal due to the nature of greediness. Also, I look forward to developers' feedback on my modifications, I am willing to submit a PR if my understanding and fix is correct.
Thanks!
The text was updated successfully, but these errors were encountered: