Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force stdin/stdout encoding to UTF-8 #198

Conversation

rnortman
Copy link
Contributor

The character encoding of the stdin/stdout streams in Python is platform- dependent. On Windows it will be something weird, like CP437 or CP1252, depending on the locale. This change ensures that no matter the platform, UTF-8 is used.

Motivation and Context

How Has This Been Tested?

Breaking Changes

None known.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally (but see below)
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Existing tests fail on Windows because of backslashes vs forward slashes in path names before my change. The same tests still fail on Windows with my change. Tests pass on Linux.

Additional context

This enforces that stdin/stdout are UTF-8-encoded independent of platform; otherwise it's a grab-bag what you get on Windows. The JSON spec requires UTF-8, UTF-16, or UTF-32 encoding, and UTF-8 is by far the most common, so this seems safe.

A proper "fix" would be for the MCP specification to either specify an exact encoding or have a handshake where encoding is established before any other messages are exchanged. (I would lean toward requiring UTF-8.)

The character encoding of the stdin/stdout streams in Python is platform-
dependent. On Windows it will be something weird, like CP437 or CP1252,
depending on the locale. This change ensures that no matter the platform,
UTF-8 is used.
@dsp-ant
Copy link
Member

dsp-ant commented Feb 11, 2025

Thank you. I think this is a pretty valid concern. In fact, I believe we should start enforcing UTF-8 in the spec. Here is my draft: modelcontextprotocol/specification#162.

I will wait for the discussion on the draft pr before merging this.

@dsp-ant dsp-ant self-requested a review February 18, 2025 11:21
@dsp-ant dsp-ant changed the base branch from v1.2.x to main February 18, 2025 11:21
@dsp-ant dsp-ant changed the base branch from main to v1.2.x February 18, 2025 11:22
@dsp-ant dsp-ant merged commit f3c8aad into modelcontextprotocol:v1.2.x Feb 18, 2025
3 checks passed
@rnortman rnortman deleted the fix/windows-stdin-stdout-encoding branch February 18, 2025 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants