Reproducing HF model Summaries #80

Noor-Nizar · 2024-11-27T01:50:30Z

I'm trying to reproduce the summaries generated by HF models, namely Phi-2, and Llama 3.2-1B instruct, since the result I'm getting following the described prompt / pipeline is not close to the one in the leaderboard. Comparing the summary generation im getting with the one in the hf dataset, I found that theres a large difference. One thing those models are sturggling with for example that I don't see in HF dataset was sentence reptition.

So my question is is generation config used for hugginface models ? I'm currently using the text-gerenation pipelien setting do_sample=False (as i found mentioned in another issue that 0 temperature was used), if code can be provided it could be also helpeful to see whats giving raise to this variation in results.

Edit, I also can't reproduce the leadeboard score for Llama 3.2-1B using the generated summaries in the HF dataset linked, this is because
~~1 - I don't know what threshold was used to determine if a response is hallucinated / consisten or not~~ edit : I will use top_k = 1

2 - The dataset includes the ommited samples (length is 1006 and not 850ish)

Miaoranmmm · 2024-12-03T21:15:54Z

Hi @Noor-Nizar, thanks for your interest in our leaderboard.

For the summary generation:

Phi-2 is accessed via LiteLLM Python SDK HuggingFace API with temperature=0.0.
Llama 3.2 1B is accessed via Together AI chat endpoint with temperature=0 and max_tokens=250.

Please note that the leaderboard is scored based on the HHEM-2.1 model, which excels in hallucination detection but not open-sourced. While we offer HHEM-2.1-Open as an open-source alternative, it may produce slightly different results.

forrestbao · 2025-01-07T15:28:20Z

@Noor-Nizar Kindly please let us know whether we have answered your question.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing HF model Summaries #80

Reproducing HF model Summaries #80

Noor-Nizar commented Nov 27, 2024 •

edited

Loading

Miaoranmmm commented Dec 3, 2024 •

edited

Loading

forrestbao commented Jan 7, 2025

Reproducing HF model Summaries #80

Reproducing HF model Summaries #80

Comments

Noor-Nizar commented Nov 27, 2024 • edited Loading

Miaoranmmm commented Dec 3, 2024 • edited Loading

forrestbao commented Jan 7, 2025

Noor-Nizar commented Nov 27, 2024 •

edited

Loading

Miaoranmmm commented Dec 3, 2024 •

edited

Loading