Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lots of headers that can't be parsed #34

Open
Ndolam opened this issue Feb 28, 2023 · 6 comments
Open

Lots of headers that can't be parsed #34

Ndolam opened this issue Feb 28, 2023 · 6 comments

Comments

@Ndolam
Copy link

Ndolam commented Feb 28, 2023

I just downloaded the .zip file and compiled mairix. When I run it (and this is the same as V0.24), I get many complaints about headers that can't be parsed. For example:

Header 'content-type: image/*; name="20221017_130844_resized.jpg"' in [89420989,90670144) could not be parsed

I'm not a mail wizard, but that looks OK to me.

A more lengthy example:

Header 'content-disposition: inline; filename="image004.png"; size=79197; creation-date=Fri, 06 May 2022 16:51:48 GMT; modification-date=Fri, 06 May 2022 20:09:01 GMT' in [28093802,28267769) could not be parsed

Q1: Is it just me, or is this happening to other people?

Q2: Are these complaints valid, or are they spurious?

Thanks.

@edgewood
Copy link

They happen to me. My "get mail" script, which calls mairix to index after getting new email, pipes mairix output through:
egrep -v '(could not.*parse|Can.t (find|process).*boundary|mtime failed)'. According to git, I added that line to my script in June 2013.

It always seems to be on MIME headers, which I never want to search, so I just ignore the errors.

@Ndolam
Copy link
Author

Ndolam commented Feb 28, 2023

Thanks for the response.
I guess I always get concerned about programs not handling (what I assume is) perfectly valid input.
But your way of dealing with it has pragmatic appeal.

@edgewood
Copy link

edgewood commented Mar 8, 2023 via email

@Ndolam
Copy link
Author

Ndolam commented Mar 8, 2023

I decided to take a look at the code as well.

During my very quick look, I see that one of the headers it complains about contains
Content-type: image/; name="..."
and (for what it is worth) the answer in
https://stackoverflow.com/questions/27790669/is-the-contenttype-image-valid
claims that image/
is not valid. I suppose mairix is right to complain about this.
(I tried changing the line to "... image/jpeg; ..." and mairix is happy with it.)

I'll try another example and see if anything else illuminating pops up.

@Ndolam
Copy link
Author

Ndolam commented Mar 8, 2023

Another complaint mairix is making is because some mailers send out lines like
creation-date=Thu, 09 Feb 2023 18:33:40 GMT
and mairix wants "s around the date.

I took a very quick look but didn't find out whether the quotes are required or not. (In this case the entire header group is

Content-Type: image/jpeg;
	name="image002.jpg"
Content-Description: image002.jpg
Content-Disposition: inline;
	filename="image002.jpg";
	creation-date=Thu, 09 Feb 2023 18:33:40 GMT
Content-ID: <[email protected]>
Content-Transfer-Encoding: base64

and it occurs to me that the rules could be different for multi-line headers as oppose to single-line headers.

Anyone reading this know?

@Ndolam
Copy link
Author

Ndolam commented Mar 9, 2023

In any case, to allow things like creation dates with unquoted strings I'd guess the NFA definition in nvp.nfa would have to be modified, and that might be a job best suited to either
(a) the original NFA author, or
(b) someone who loves playing with NFAs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants