-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Separate out reasoning content in API response #269
Comments
Thanks for the suggestion. We will likely add this feature. However, we still have to decide the field names. For the time being, you can do this yourself: const model = await client.llm.model("deepseek-r1-distill-llama-8b");
let reasoningPart = "";
let nonReasoningPart = "";
const result = await model.respond("What is the meaning of life?", {
onPredictionFragment: ({ reasoningType, content }) => {
if (reasoningType === "reasoning") {
reasoningPart += content;
} else if (reasoningType === "none") {
nonReasoningPart += content;
}
},
});
console.info("Reasoning part:", reasoningPart);
console.info("Non-reasoning part:", nonReasoningPart); |
Alternatively, if you are streaming already: const model = await client.llm.model("deepseek-r1-distill-llama-8b");
let reasoningPart = "";
let nonReasoningPart = "";
for await (const { reasoningType, content } of model.respond("What is the meaning of life?")) {
if (reasoningType === "reasoning") {
reasoningPart += content;
} else if (reasoningType === "none") {
nonReasoningPart += content;
}
}
console.info("Reasoning part:", reasoningPart);
console.info("Non-reasoning part:", nonReasoningPart); |
Thanks Ryan, I appreciate the explanation. I'm already doing that as my terminal-based app handles both streaming, and non-streaming responses. While it works fine for a short example, it quickly gets more complicated than it needs to be in a real app. To explain my reasoning (see what I did there!): const prediction = mainAgent.sendMessage();
if (options.md) {
conversationLog.await('Generating response');
// As this is a terminal-based app, if the response should be parsed as MD,
// we have to wait for the entire result, as we cannot go back and parse it after
// it's displayed, like we could with a web-based app.
} else {
// This could be removed if reasoning was separate as we wouldn't need to compile it ourselves
last_response = {
text: '',
thinking: '',
};
// No MD parsing means we can just output immediately
for await (const { reasoningType, content } of prediction) {
if (options.hideThinking) {
if (reasoningType === 'reasoning') {
last_response.thinking += content;
} else if (reasoningType === 'none') {
last_response.text += content;
process.stdout.write(content);
}
} else {
// No need to hide reasoning content, output everything
process.stdout.write(content);
}
}
process.stdout.write('\n\n');
}
const result = await prediction;
conversationLog.success(
`Got ${result.stats.totalTokensCount} tokens at ${_.round(result.stats.tokensPerSecond, 2)} t/s`,
);
// This section could be removed if reasoning was separate, and we could just cache the
// last response directly with: last_response = result;
const parts = result.content.split('</think>');
last_response = {
text: parts[1],
thinking: `${parts[0]}</think>`,
};
if (options.md) {
// Output the final result parsed as MD
console.log(
cliMd(
options.hideThinking ? last_response.text : result.content,
),
);
} If const prediction = mainAgent.sendMessage();
if (options.md) {
conversationLog.await('Generating response');
} else {
for await (const { reasoningType, content } of prediction) {
if (options.hideThinking) {
if (reasoningType === 'none') {
process.stdout.write(content);
}
} else {
process.stdout.write(content);
}
}
process.stdout.write('\n\n');
}
const result = await prediction;
last_response = result;
conversationLog.success(
`Got ${result.stats.totalTokensCount} tokens at ${_.round(result.stats.tokensPerSecond, 2)} t/s`,
);
if (options.md) {
console.log(
cliMd(
options.hideThinking ? result.content : `${result.reasoning_content}\n${result.content}`,
),
);
} It's also important to consider that DeepSeek's official recommendation is that the thinking content should not be passed back to the model as part of the conversation, so splitting it out before adding the response to the conversation stack is essential, even if it's displayed in normal output. Hope this makes sense, and I appreciate you and the rest of the team are working incredibly hard on keeping up with a mountain of requests! 🤜 |
@Trippnology Thanks for the detailed information. We will definitely add the support. 👍 As regarding to stripping the reasoning content, here is something I typed earlier:
|
Oh, that's really good information. I had no idea reasoning content was automatically stripped like that. Pleased to hear this will land eventually, I can muddle through in the short term. Many thanks! |
Following on from this issue, it would be very helpful if reasoning content was separated from regular content when using the SDK, the same way it can be separated when using the REST API.
@ryan-the-crayon explained the resoningType property of fragments when streaming, but that is not currently available on the final result (please refer to the issue above for code examples).
The text was updated successfully, but these errors were encountered: