-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnicodeEncodeError The input .md file in simplified Chinese #163
Comments
Thank you for reporting this. I'm away from my computer for the next couple of days- so can't look at this right away. Can you somehow get me a minimal reducible example? I will also say that a workaround might be using a hexadecimal entity reference. But that's probably not a scalable behaviour. |
|
Thanks. I need the 222.md file - or a minimal version of it. (No confidential etc data.) |
Thanks for your reply. Here is just an example markdown file: |
Thanks for this. I note your attempt to use If you think paragraph tags should be supported - and have a clear idea as to how they should be rendered - please open another issue. |
Thanks for your reply. |
Right. BBEdit (one of my editors of choice) thinks the file is UTF-8 but I suspect it isn't. Sniffing what it is is an approach I might take. |
This is strange: My run with your file yields this:
|
I'm suspecting your problem is with python-pptx or lxml, rather than md2pptx. But I keep an open mind about this. |
Thank you so much for helping me with this question. If it's a problem with python-pptx or lxml, what do you suggest to fix it? |
I've just fixed a problem with numeric character references. So with the very latest push a workaround for you might well be to use character references such as |
Thank you very much. |
Please let me know how you get on. And do you think the text is really UTF-16 rather than UTF-8? The U+DC80 character isn't valid in UTF-8, apparently. (And I just pushed some doc changes after the one that fixes numeric character references - so don't get confused by what the latest commit says.) |
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 23: surrogates not allowed
I'm trying to convert a md file whose content is in Simplified Chinese, but I'm encountering encoding problems. I've read that the latest version mentions fixing #161, but I still can't get it to work on my end, so I'd like to ask what's the best way to fix it.
The text was updated successfully, but these errors were encountered: