UnicodeEncodeError The input .md file in simplified Chinese #163

Lydiagugugaga · 2024-08-21T08:06:07Z

UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 23: surrogates not allowed

I'm trying to convert a md file whose content is in Simplified Chinese, but I'm encountering encoding problems. I've read that the latest version mentions fixing #161, but I still can't get it to work on my end, so I'd like to ask what's the best way to fix it.

The text was updated successfully, but these errors were encountered:

MartinPacker · 2024-08-21T15:40:48Z

Thank you for reporting this.

I'm away from my computer for the next couple of days- so can't look at this right away.

Can you somehow get me a minimal reducible example? I will also say that a workaround might be using a hexadecimal entity reference. But that's probably not a scalable behaviour.

Lydiagugugaga · 2024-08-22T07:20:44Z

Thank you for reporting this.

I'm away from my computer for the next couple of days- so can't look at this right away.

Can you somehow get me a minimal reducible example? I will also say that a workaround might be using a hexadecimal entity reference. But that's probably not a scalable behaviour.

I just try to input python md2pptx output.pptx < 222.md

MartinPacker · 2024-08-22T08:42:06Z

Thanks. I need the 222.md file - or a minimal version of it. (No confidential etc data.)

Lydiagugugaga · 2024-08-22T08:55:01Z

Thanks for your reply. Here is just an example markdown file:

222.md

MartinPacker · 2024-08-23T11:02:50Z

Thanks for this. I note your attempt to use <p> paragraph tags. Those aren't supported by md2pptx - if I remember correctly. I would use asterisks * instead.

If you think paragraph tags should be supported - and have a clear idea as to how they should be rendered - please open another issue.

Lydiagugugaga · 2024-08-23T11:35:20Z

Thanks for this. I note your attempt to use <p> paragraph tags. Those aren't supported by md2pptx - if I remember correctly. I would use asterisks * instead.

If you think paragraph tags should be supported - and have a clear idea as to how they should be rendered - please open another issue.

Thanks for your reply.
About <p> paragraph tags, I thought it was the problem before, but I actually tried removing it and using the generic .md form and it doesn't work either.

MartinPacker · 2024-08-23T12:23:11Z

Right. BBEdit (one of my editors of choice) thinks the file is UTF-8 but I suspect it isn't. Sniffing what it is is an approach I might take.

MartinPacker · 2024-08-23T12:30:12Z

This is strange: My run with your file yields this:

md2pptx Markdown To Powerpoint Converter 5.0.2+ 15 August, 2024
===============================================================

Open source project: https://github.com/MartinPacker/md2pptx

External Dependencies:

  Python: 3.9.6
  python-pptx: 0.6.23
  Pillow: 10.3.0
  CairoSVG: Not Installed
  graphviz: Not Installed

Internal Dependencies:

  funnel: 0.1
  runPython: 0.4

No slide to document metadata on. Continuing without it.

Slides:
=======

   1   初学者骑车之路：掌握自行车技巧的必备指南
   2   自行车基础知识
   3       自行车的组成部分
   4       自行车的类型和用途
   5   准备骑行前的注意事项
   6       自行车装备和保养
   7       骑行安全知识和规则
   8   学习骑行技巧
   9       自行车平衡和姿势
  10       踩踏和换挡技巧
  11       转弯和刹车技巧

MartinPacker · 2024-08-23T12:31:55Z

I'm suspecting your problem is with python-pptx or lxml, rather than md2pptx. But I keep an open mind about this.

Lydiagugugaga · 2024-08-23T12:49:42Z

I'm suspecting your problem is with python-pptx or lxml, rather than md2pptx. But I keep an open mind about this.

Thank you so much for helping me with this question.
I've referenced some of the previously mentioned issues and also tried the python-pptx version change which is currently v0.6.23. But is didn't work.

If it's a problem with python-pptx or lxml, what do you suggest to fix it?

MartinPacker · 2024-08-23T12:57:33Z

I've just fixed a problem with numeric character references. So with the very latest push a workaround for you might well be to use character references such as &#dc80;. Fiddly, I know.

Lydiagugugaga · 2024-08-23T13:12:03Z

So with the very latest push a workaround for you might well be to use character references such as &#dc80;. Fiddly, I know.

Thank you very much.
I'll try it.

MartinPacker · 2024-08-23T13:16:59Z

Please let me know how you get on. And do you think the text is really UTF-16 rather than UTF-8? The U+DC80 character isn't valid in UTF-8, apparently.

(And I just pushed some doc changes after the one that fixes numeric character references - so don't get confused by what the latest commit says.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeEncodeError The input .md file in simplified Chinese #163

UnicodeEncodeError The input .md file in simplified Chinese #163

Lydiagugugaga commented Aug 21, 2024

MartinPacker commented Aug 21, 2024

Lydiagugugaga commented Aug 22, 2024

MartinPacker commented Aug 22, 2024

Lydiagugugaga commented Aug 22, 2024

MartinPacker commented Aug 23, 2024 •

edited

Loading

Lydiagugugaga commented Aug 23, 2024 •

edited

Loading

MartinPacker commented Aug 23, 2024

MartinPacker commented Aug 23, 2024

MartinPacker commented Aug 23, 2024 •

edited

Loading

Lydiagugugaga commented Aug 23, 2024

MartinPacker commented Aug 23, 2024

Lydiagugugaga commented Aug 23, 2024

MartinPacker commented Aug 23, 2024

UnicodeEncodeError The input .md file in simplified Chinese #163

UnicodeEncodeError The input .md file in simplified Chinese #163

Comments

Lydiagugugaga commented Aug 21, 2024

MartinPacker commented Aug 21, 2024

Lydiagugugaga commented Aug 22, 2024

MartinPacker commented Aug 22, 2024

Lydiagugugaga commented Aug 22, 2024

MartinPacker commented Aug 23, 2024 • edited Loading

Lydiagugugaga commented Aug 23, 2024 • edited Loading

MartinPacker commented Aug 23, 2024

MartinPacker commented Aug 23, 2024

MartinPacker commented Aug 23, 2024 • edited Loading

Lydiagugugaga commented Aug 23, 2024

MartinPacker commented Aug 23, 2024

Lydiagugugaga commented Aug 23, 2024

MartinPacker commented Aug 23, 2024

MartinPacker commented Aug 23, 2024 •

edited

Loading

Lydiagugugaga commented Aug 23, 2024 •

edited

Loading

MartinPacker commented Aug 23, 2024 •

edited

Loading