Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utf-8 decode fails on chunk if character is split #78

Open
Jolbas opened this issue Apr 3, 2021 · 0 comments
Open

Utf-8 decode fails on chunk if character is split #78

Jolbas opened this issue Apr 3, 2021 · 0 comments

Comments

@Jolbas
Copy link

Jolbas commented Apr 3, 2021

bufs[fd] += os.read(fd, 4096).decode('UTF-8')

It happened that an unicode character appeared at position 4095 and therefore was split in two resulting in utf-8 decode fail.

    bufs[fd] += os.read(fd, 4096).decode('UTF-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 4095: unexpected end of data

I'm not sure about this solution but it seems to work:

        if sys.version_info < (3, 0):
            for fd in fds:
                bufs[fd] += os.read(fd, 4096)
        else:
            for fd in fds:
                b = os.read(fd, 4096)
                for i in range(4):
                    try:
                        bufs[fd] += b.decode('UTF-8')
                        break
                    except UnicodeDecodeError:
                        if i < 4:
                            b += os.read(fd, 1)
                        else:
                            raise
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant