Feature Request: Separate out reasoning content in API response #269

Trippnology · 2025-03-08T09:58:34Z

Following on from this issue, it would be very helpful if reasoning content was separated from regular content when using the SDK, the same way it can be separated when using the REST API.

@ryan-the-crayon explained the resoningType property of fragments when streaming, but that is not currently available on the final result (please refer to the issue above for code examples).

ryan-the-crayon · 2025-03-10T15:20:46Z

Thanks for the suggestion. We will likely add this feature. However, we still have to decide the field names.

For the time being, you can do this yourself:

const model = await client.llm.model("deepseek-r1-distill-llama-8b");

let reasoningPart = "";
let nonReasoningPart = "";

const result = await model.respond("What is the meaning of life?", {
  onPredictionFragment: ({ reasoningType, content }) => {
    if (reasoningType === "reasoning") {
      reasoningPart += content;
    } else if (reasoningType === "none") {
      nonReasoningPart += content;
    }
  },
});

console.info("Reasoning part:", reasoningPart);
console.info("Non-reasoning part:", nonReasoningPart);

ryan-the-crayon · 2025-03-10T15:23:22Z

Alternatively, if you are streaming already:

const model = await client.llm.model("deepseek-r1-distill-llama-8b");

let reasoningPart = "";
let nonReasoningPart = "";

for await (const { reasoningType, content } of model.respond("What is the meaning of life?")) {
  if (reasoningType === "reasoning") {
    reasoningPart += content;
  } else if (reasoningType === "none") {
    nonReasoningPart += content;
  }
}

console.info("Reasoning part:", reasoningPart);
console.info("Non-reasoning part:", nonReasoningPart);

Trippnology · 2025-03-11T13:32:09Z

Thanks Ryan, I appreciate the explanation. I'm already doing that as my terminal-based app handles both streaming, and non-streaming responses. While it works fine for a short example, it quickly gets more complicated than it needs to be in a real app.

To explain my reasoning (see what I did there!):

const prediction = mainAgent.sendMessage();

if (options.md) {
	conversationLog.await('Generating response');
	// As this is a terminal-based app, if the response should be parsed as MD,
	// we have to wait for the entire result, as we cannot go back and parse it after
	// it's displayed, like we could with a web-based app.
} else {
	// This could be removed if reasoning was separate as we wouldn't need to compile it ourselves
	last_response = {
		text: '',
		thinking: '',
	};
	// No MD parsing means we can just output immediately
	for await (const { reasoningType, content } of prediction) {
		if (options.hideThinking) {
			if (reasoningType === 'reasoning') {
				last_response.thinking += content;
			} else if (reasoningType === 'none') {
				last_response.text += content;
				process.stdout.write(content);
			}
		} else {
			// No need to hide reasoning content, output everything
			process.stdout.write(content);
		}
	}
	process.stdout.write('\n\n');
}

const result = await prediction;

conversationLog.success(
	`Got ${result.stats.totalTokensCount} tokens at ${_.round(result.stats.tokensPerSecond, 2)} t/s`,
);

// This section could be removed if reasoning was separate, and we could just cache the
// last response directly with: last_response = result;
const parts = result.content.split('</think>');
last_response = {
	text: parts[1],
	thinking: `${parts[0]}</think>`,
};

if (options.md) {
	// Output the final result parsed as MD
	console.log(
		cliMd(
			options.hideThinking ? last_response.text : result.content,
		),
	);
}

If result had a reasoning_content property (as found in REST API responses), this could be reduced down to:

const prediction = mainAgent.sendMessage();

if (options.md) {
	conversationLog.await('Generating response');
} else {
	for await (const { reasoningType, content } of prediction) {
		if (options.hideThinking) {
			if (reasoningType === 'none') {
				process.stdout.write(content);
			}
		} else {
			process.stdout.write(content);
		}
	}
	process.stdout.write('\n\n');
}

const result = await prediction;
last_response = result;

conversationLog.success(
	`Got ${result.stats.totalTokensCount} tokens at ${_.round(result.stats.tokensPerSecond, 2)} t/s`,
);

if (options.md) {
	console.log(
		cliMd(
			options.hideThinking ? result.content : `${result.reasoning_content}\n${result.content}`,
		),
	);
}

It's also important to consider that DeepSeek's official recommendation is that the thinking content should not be passed back to the model as part of the conversation, so splitting it out before adding the response to the conversation stack is essential, even if it's displayed in normal output.

Hope this makes sense, and I appreciate you and the rest of the team are working incredibly hard on keeping up with a mountain of requests! 🤜

ryan-the-crayon · 2025-03-11T14:41:57Z

@Trippnology Thanks for the detailed information. We will definitely add the support. 👍

As regarding to stripping the reasoning content, here is something I typed earlier:

tl;dr: LM Studio performs reasoning content stripping via prompt template. i.e. We only remove stuff between <think> tags iff the prompt template asks us to.

Explanation:

On its own, LM Studio aims to preserve models' output, as we have no way of knowing whether a model expects to see its previous reasoning content in subsequent generations. However, for many recent reasoning models (such as deepseek), they do expect reasoning content being stripped. This is achieved by the prompt template shipped with the model. If you check the jinja template of deepseek (go to My Models page -> your model -> gears -> Prompt-> Prompt Template), it will likely contain something like {% if '</think>' in content %}{% set content = content.split('</think>')|last %}{% endif %}, which removes the content before </think>. Since prompt template is applied before feeding the context to the model, it ensures reasoning content from previous generations are not passed to the model.

If you suspect reasoning content of previous generations are not stripped properly, please use the command lms log stream. It will show you the raw input before passing to the model. (In general, lms log stream is what you should use when you need to make sure correct content is being fed to the model.)

If you do see reasoning content from previous generation in lms log stream, it is likely caused by incorrect prompt template being used. There are usually 2 causes:

The prompt template shipped with the model is incorrect. (Check the prompt template in My Models page.) If you are unsure, you can ask here.

You have provided a prompt template override that does not have reasoning content stripping (you can check this by right clicking the top gear in Chat -> Copy Model Debug Information).

If you are using "Manual" prompt template, it will not support reasoning content stripping.

If you are using a custom jinja template, make sure it has reasoning content stripping.

Trippnology · 2025-03-11T21:01:34Z

Oh, that's really good information. I had no idea reasoning content was automatically stripped like that.

Pleased to hear this will land eventually, I can muddle through in the short term. Many thanks!

ryan-the-crayon added the enhancement New feature or request label Mar 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Separate out reasoning content in API response #269

Feature Request: Separate out reasoning content in API response #269

Trippnology commented Mar 8, 2025

ryan-the-crayon commented Mar 10, 2025

ryan-the-crayon commented Mar 10, 2025

Trippnology commented Mar 11, 2025

ryan-the-crayon commented Mar 11, 2025 •

edited

Loading

Trippnology commented Mar 11, 2025

Feature Request: Separate out reasoning content in API response #269

Feature Request: Separate out reasoning content in API response #269

Comments

Trippnology commented Mar 8, 2025

ryan-the-crayon commented Mar 10, 2025

ryan-the-crayon commented Mar 10, 2025

Trippnology commented Mar 11, 2025

ryan-the-crayon commented Mar 11, 2025 • edited Loading

Trippnology commented Mar 11, 2025

ryan-the-crayon commented Mar 11, 2025 •

edited

Loading