Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content-Transfer-Encoding badly interpreted when charset in Content-Type is between quote #1522

Open
guijemont opened this issue Jun 4, 2020 · 17 comments

Comments

@guijemont
Copy link

Before you submit a bug report, please make sure that the issue still exists on the master branch!

Describe the bug
I have a bunch of emails with headers that contain:

Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable 

Though in all the examples I could find, several headers were in-between these two. For all these emails, non-ascii characters appear incorrectly. Looking at the debug log, I see for these messages that the Content-Transfer-Enconding is misinterpreted:

DEBUG:utils:Content-Transfer-Encoding: "8bit"
DEBUG:utils:assuming Content-Transfer-Encoding: 8bit
DEBUG:utils:command: more /tmp/pb54ils2
DEBUG:utils:parms: ('text/plain=', 'charset=UTF-8')

Worthy of notes, I have found another quoted-printable email that is correctly displayed, with the headers looking like:

Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Note that unlike in the previous case, the charset is not between quotes.

In that case, non-ascii characters are displayed properly, and the debug log yields:

DEBUG:utils:Content-Transfer-Encoding: "quoted-printable"
DEBUG:utils:assuming Content-Transfer-Encoding: quoted-printable

Software Versions

  • Python version: 3.8.2-0ubuntu2
  • Notmuch version: 0.29.3-1ubuntu2
  • Alot version: 2348014

To Reproduce
Steps to reproduce the behaviour:

  1. open a correctly formed email, with headers set as above, that contains non-ascii characters (ones that would be transformed by quoted-printable
  2. look at accentuated characters, see that they are malformed

Error Log
See description of the bug.

@pazz
Copy link
Owner

pazz commented Jun 4, 2020

Thanks for reporting this.
Would you mind sending a few problematic anonymized mails our way? Ideally in the form of a PR that adds (failing) unit tests.
Email etiquette is unfortunately not very rigurous when it ocmes to encoding issues and fidly header syntax. Alot mostly uses pythons email module in order to keep standard compliant, but of course, lots of malformed mails make the rounds..

@guijemont
Copy link
Author

Would you mind sending a few problematic anonymized mails our way? Ideally in the form of a PR that adds (failing) unit tests.

I'll see if I find time to do that over the week-end.

@guijemont
Copy link
Author

Ok, I did some more trying with things, and it turns out that my minimal test is already in the tree: it is https://github.com/pazz/alot/blob/master/tests/static/mail/utf8.eml
Here are a few screencaps of how it looks for me after notmuch insert:

Default view:
image

With togglesource (note how alot seems to have transformed the email to quoted-printable, though the original is not)
image
with toggleheaders (here the headers match the original):
image

Finally, this is how notmuch sees it:

$ notmuch show --format=raw tag:utf8test
From: lucc@github
To: tests@alot
Subject: plain utf8 8bit message
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit

Liebe Grüße!

@guijemont
Copy link
Author

A couple additional notes on tests:

  • If I remove the mock decorator from test_simple_utf8_file(), the test fails, which matches my experience in alot. I do not have a ~/.mailcap and /etc/mailcap is unmodified from ubuntu 20.04
  • If I revert b1c93c4 then the test passes again.

That leads me to think that either:

  • there is an issue with my system, though it's a fresh ubuntu using the docker ubuntu:latest image, with the only additions being the dependencies for alot, afew (from pip) and its dependencies and vim, offlineimap and opensmtpd, so I would expect my system to be fairly "vanilla" and among what should be supported by alot (though I'd be happy to get suggestions on workaround involving modifications to my system); or
  • there is a problem with the commit linked above, which creates this bug, and I think that adding the@mock.patch() is just hiding the bug.

@jonassmedegaard
Copy link

I experience what seems like same issue:

Using alot 0.9 (or more accurately 0.9-2 from Debian unstable) emails behaved correctly, but with 0.9.1 (or git snapshot of 0.9.1 from Debian packaging git) I get tofu characters for non-ascii characters.

Example header which fails to render correctly contains this:

Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"

So this issue seems not a broken configuration locally for @guijemont and happens also when source email is not UTF-8 encoded.

@jonassmedegaard
Copy link

...and for me it also makes the issue go away to revert b1c93c4

@pazz
Copy link
Owner

pazz commented Jun 9, 2020

@ryneeverett can you comment on this?

@ryneeverett
Copy link
Contributor

ryneeverett commented Jun 13, 2020

If I remove the mock decorator from test_simple_utf8_file(), the test fails, which matches my experience in alot. I do not have a ~/.mailcap and /etc/mailcap is unmodified from ubuntu 20.04

I cannot reproduce this and wonder if there isn't a mailcap elsewhere on your system. See https://docs.python.org/3/library/mailcap.html#mailcap.getcaps. Can you confirm the following @guijemont?

$ python
>>> import mailcap
>>> mailcap.getcaps()
{}
>>>

ryneeverett added a commit to ryneeverett/alot that referenced this issue Jun 14, 2020
Without this change, temporary file in db.utils.render_part reads
"Liebe Grüße!".

This seems to essentially revert 777823f,
the reasoning of which I don't yet follow.

This may resolve the issue in pazz#1522.
ryneeverett added a commit to ryneeverett/alot that referenced this issue Jun 14, 2020
This seems to essentially revert 777823f,
the reasoning of which I don't yet follow.

This may resolve the issue in pazz#1522.
@ryneeverett
Copy link
Contributor

See #1526. This fixes the issues I see with utf8 and a maiilcap entry but I'm not convinced this is the same issue you're facing.

Further review of the code further convinces me that you have some entry for text/plain in a mailcap, because b1c93c4 should not have change the behavior at all if you do not.

@pazz
Copy link
Owner

pazz commented Jun 14, 2020

I can confirm that I see the same issue as @guijemont reports, displaying the message in tests/mail/utf8.eml in alot. Also, #1526 fixes that issue for me, yes.

I do not have text/plain in my user mailcap but in the (debian testing) system-wide mailcap there are quite a few such entries:

 grep "^text/plain" /etc/mailcap 
text/plain; less '%s'; needsterminal
text/plain; more %s; needsterminal
text/plain; env ATOM_DISABLE_SHELLING_OUT_FOR_ENVIRONMENT=false /usr/bin/atom %s; test=test -n "$DISPLAY"
text/plain; /usr/share/code/code --no-sandbox --new-window %s; test=test -n "$DISPLAY"
text/plain; /usr/bin/emacs -nw %s; needsterminal
text/plain; /usr/bin/emacs %s; test=test -n "$DISPLAY"
text/plain; gvim -f %s; test=test -n "$DISPLAY"
text/plain; nvim %s; needsterminal
text/plain; okular %s; test=test -n "$DISPLAY"
text/plain; gedit --new-document %s; test=test -n "$DISPLAY"
text/plain; vim %s; needsterminal
text/plain; view %s; edit=vim %s; compose=vim %s; test=test -x /usr/bin/vim; needsterminal
text/plain; gview -f %s; edit=gvim -f %s; compose=gvim -f %s; test=test "$DISPLAY" != ""
text/plain; view %s; edit=vi %s; compose=vi %s; needsterminal

accordingly, mailcap.getcaps() is not empty.

@pazz pazz closed this as completed Jun 14, 2020
@guijemont
Copy link
Author

FWIW, I have the following text/plain lines in my /etc/mailcap (unmodified by me, fresh ubuntu docker image):

text/plain; more %s; needsterminal
text/plain; vim %s; needsterminal
text/plain; view %s; edit=vim %s; compose=vim %s; test=test -x /usr/bin/vim; needsterminal
text/plain; view %s; edit=vi %s; compose=vi %s; needsterminal

Will try #1526.

Also, I am a bit confused, not knowing the usual workflow on this project: why is this issue closed if the pull request is not merged yet?

@pazz
Copy link
Owner

pazz commented Jun 14, 2020

Closing this was an accident sorry

@pazz pazz reopened this Jun 14, 2020
pazz pushed a commit that referenced this issue Jun 15, 2020
This seems to essentially revert 777823f,
the reasoning of which I don't yet follow.

This may resolve the issue in #1522.
@pazz
Copy link
Owner

pazz commented Jun 15, 2020

I have an email that may or may not be related:

From: pazz@github
To: tests@alot
Subject: iso-8859-1 quoted-printable
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Viele Gr=FC=DFe!

This one never gets displayed correctly, neither with or without the quotes, and not by notmuch as you show above.. Is this message simply broken?

The strange thing is that notmuch does find it when I search for "grüße":

notmuch show --format=raw  to:tests@alot from:pazz grüße
From: pazz@github
To: tests@alot
Subject: iso-8859-1 quoted-printable
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Viele Gr=FC=DFe!

@guijemont
Copy link
Author

Just updated to latest master with #1526 merged. It does seem to fix the main issue for me (non-ascii utf-8 characters are displayed correctly), though togglesource still shows me inaccurate information. E.g. for https://github.com/pazz/alot/blob/master/tests/static/mail/utf8.eml it shows me:

lucc@github (Jan 1970)
From: lucc@github
To: tests@alot
Subject: plain utf8 8bit message
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

Liebe Gr=C3=BC=C3=9Fe!

@pazz
Copy link
Owner

pazz commented Jun 15, 2020 via email

@guijemont
Copy link
Author

But this is to be expected: togglesource will result in alot displaying the email's source text verbatim, including not yet decoded quoted-printables.

In this case, the problem is that what is displayed is precisely not the verbatim source (which is not quoted-printable), which I am pasting here for completeness:

From: lucc@github
To: tests@alot
Subject: plain utf8 8bit message
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit

Liebe Grüße!

Notice the difference in the Content-Transfer-Encoding and in how the body is encoded.

@pazz
Copy link
Owner

pazz commented Jun 16, 2020

@guijemont you are right, this is weird.
I've dug into it and it seems that the email module changes the source when representing the message as string:

>>> m=ui.current_buffer.get_selected_message()
>>> e=m.get_email()
>>>
>>> # This is what alot shows (see widgets.thread.MessageTree)
>>> str(e)
'From: lucc@github\r\nTo: tests@alot\r\nSubject: plain utf8 8bit message\r\nMIME-Version: 1.0\r\nContent-Type: text/plain; charset="utf-8"\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\nLiebe Gr=C3=BC=C3=9Fe!\r\n'
>>>
>>> # This is the content of the file.
>>> open(m.get_filename()).read()
'From: lucc@github\nTo: tests@alot\nSubject: plain utf8 8bit message\nMIME-Version: 1.0\nContent-Type: text/plain; charset="UTF-8"\nContent-Transfer-Encoding: 8bit\n\nLiebe Grüße!\n'

So I suggest we replace https://github.com/pazz/alot/blob/master/alot/widgets/thread.py#L260
to read the source text from disk instead.

GuillaumeSeren pushed a commit to GuillaumeSeren/alot that referenced this issue Oct 3, 2021
This seems to essentially revert 777823f,
the reasoning of which I don't yet follow.

This may resolve the issue in pazz#1522.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants