You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Well done work. My colleagues and I have found similar results to yours in training a domain-specific LLM. But we cannot do such clear and clean experiments like yours.
But after reading it came out to me that there were other works suggesting millions of IFT data can still be used to train domain LLMs, such as Huatuo-I/II (https://arxiv.org/abs/2304.06975, https://arxiv.org/pdf/2311.09774.pdf) and Adapting Large Language Models via Reading Comprehension from MSR. So why is it so different or what are the borders/pre-conditions of two types of works? There may be more work to be done.
And would you explain why this paper supports the "weak to strong alignment" especially if the weak model cannot identify the knowledge it learns badly?
The text was updated successfully, but these errors were encountered:
Well done work. My colleagues and I have found similar results to yours in training a domain-specific LLM. But we cannot do such clear and clean experiments like yours.
But after reading it came out to me that there were other works suggesting millions of IFT data can still be used to train domain LLMs, such as Huatuo-I/II (https://arxiv.org/abs/2304.06975, https://arxiv.org/pdf/2311.09774.pdf) and Adapting Large Language Models via Reading Comprehension from MSR. So why is it so different or what are the borders/pre-conditions of two types of works? There may be more work to be done.
And would you explain why this paper supports the "weak to strong alignment" especially if the weak model cannot identify the knowledge it learns badly?
The text was updated successfully, but these errors were encountered: