Feature request: CJK and more characters support #8

liyiheng · 2018-09-13T08:27:04Z

tree -L 1     
.
├── Cargo.lock
├── Cargo.toml
├── Chinese.md
├── Chinese.pdf
├── images
├── README.md
├── src
├── target
├── test.md
└── test.pdf

thread 'main' panicked at 'byte index 7 is not a char boundary; it is inside '─' (bytes 6..9) of ├── src ', libcore/str/mod.rs:2111:5

The text was updated successfully, but these errors were encountered:

leroycep · 2018-09-13T21:49:22Z

Thanks for the issue! Could you provide the file that gives you this error? A testcase would be helpful.

liyiheng · 2018-09-14T02:00:54Z

Chinese characters are ?? in pdf. I think the panic is caused by output of tree command.

File contents:

中文
```
tree -L 1
.
├── Cargo.lock
├── Cargo.toml
├── Chinese.md
├── Chinese.pdf
├── images
├── README.md
├── src
├── target
├── test.md
└── test.pdf
```

leroycep · 2018-09-15T00:36:30Z

Ah, thanks. The error is located in src/sectioner.rs.

Relevant code:

Event::Text(ref text) if self.is_code => {
    let mut start = 0;
    for (pos, c) in text.chars().enumerate() {
        if c == '\n' {
            self.write(&text[start..pos]);
            self.new_line();
            start = pos + 1;
        }
    }
    if start < text.len() {
        self.write(&text[start..]);
    }
}

On line 3 of that snippet I call text.chars().enumerate(), which gives the current character and the current character count. Then, on line 5 I assume that the character count is the byte position, which works in ASCII, but not in unicode.

I changed text.chars().enumerate() to text.char_indices(). That solves the panicking, but the characters are still rendered as question marks.

fschutt · 2018-09-20T06:06:49Z

Can you select the text and copy the original characters out? If yes, that means that the font simply can't display the characters (or is encoded badly)? Does the font you are embedding the characters with support CJK? I've always used http://bluejamesbond.github.io/CharacterMap/ for debugging font-related issues.

You'll probably need to do some kind of font-selection-based-on-character-plane, i.e. if CJK characters are detected, then embed Roboto-CJK, otherwise, use Roboto-Medium.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: CJK and more characters support #8

Feature request: CJK and more characters support #8

liyiheng commented Sep 13, 2018 •

edited

Loading

leroycep commented Sep 13, 2018

liyiheng commented Sep 14, 2018

leroycep commented Sep 15, 2018

fschutt commented Sep 20, 2018

Feature request: CJK and more characters support #8

Feature request: CJK and more characters support #8

Comments

liyiheng commented Sep 13, 2018 • edited Loading

leroycep commented Sep 13, 2018

liyiheng commented Sep 14, 2018

leroycep commented Sep 15, 2018

fschutt commented Sep 20, 2018

liyiheng commented Sep 13, 2018 •

edited

Loading