Skip to content
This repository has been archived by the owner on Aug 10, 2024. It is now read-only.

No content parsed if an Atom entry contains <summary> XHTML tag in <content> #72

Open
handlerug opened this issue Sep 20, 2022 · 9 comments

Comments

@handlerug
Copy link

<summary> tags are most often present in <details> tags. If such a tag is present in an Atom entry and the content is formatted as XHTML, the entry shows up empty in NetNewsWire.

Works:

<feed xmlns="http://www.w3.org/2005/Atom">
	<id>79EED2EA-0E1E-4DEF-8772-B4BA7EF6429D</id>
	<title>Test feed</title>
	<updated>2022-07-15T21:49:06-07:00</updated>
	<entry>
		<id>72D83467-20FE-4BA5-9508-2B16660BADDC</id>
		<title>Test entry new title</title>
		<updated>2022-07-15T21:49:06-07:00</updated>
		<content type="xhtml">
			<div xmlns="http://www.w3.org/1999/xhtml">Test</div>
                </content>
	</entry>
</feed>

Fails:

<feed xmlns="http://www.w3.org/2005/Atom">
	<id>79EED2EA-0E1E-4DEF-8772-B4BA7EF6429D</id>
	<title>Test feed</title>
	<updated>2022-07-15T21:49:06-07:00</updated>
	<entry>
		<id>72D83467-20FE-4BA5-9508-2B16660BADDC</id>
		<title>Test entry new title</title>
		<updated>2022-07-15T21:49:06-07:00</updated>
		<content type="xhtml">
			<div xmlns="http://www.w3.org/1999/xhtml">Test<summary /></div>
		</content>
	</entry>
</feed>

Works:

<feed xmlns="http://www.w3.org/2005/Atom">
	<id>79EED2EA-0E1E-4DEF-8772-B4BA7EF6429D</id>
	<title>Test feed</title>
	<updated>2022-07-15T21:49:06-07:00</updated>
	<entry>
		<id>72D83467-20FE-4BA5-9508-2B16660BADDC</id>
		<title>Test entry new title</title>
		<updated>2022-07-15T21:49:06-07:00</updated>
		<content type="html">
			Test&lt;summary /&gt;
		</content>
	</entry>
</feed>

My gut feeling is that the Atom feed parser still looks for <summary> tags when parsing <content type="xhtml">, even when it's already inside a <content> tag.

@vincode-io
Copy link
Member

Could you please point out in the Atom spec where it specifies that the summary element is allowed in content? I can't find anything.

https://www.rfc-editor.org/rfc/rfc4287.html#section-4.1.3

@handlerug
Copy link
Author

It doesn't explicitly disallow a <summary> tag from appearing inside content as part of XHTML markup (not with the meaning of atom:summary), so that seems like allowed to me. Otherwise I don't see a way of using the tag inside <content type="xhtml">. Please correct me if I'm wrong.

@vincode-io
Copy link
Member

You can put it in there, but if it isn't specified in the spec that it should be there, you can't expect parsers to pick it up. In other words, if you are producing an Atom feed, you can not put summary inside content and expect it to work.

@vincode-io vincode-io closed this as not planned Won't fix, can't repro, duplicate, stale Sep 20, 2022
@handlerug
Copy link
Author

Aren't other elements like p, b, h1, etc. allowed? I don't see them explicitly allowed in the spec either. If I understand correctly, xhtmlDiv already implies the parser must be comfortable parsing XHTML or discarding the surrounding markup per section 6.3. For example, Miniflux doesn't have this issue and parses any valid XHTML markup inside content (including any elements which happen to have names also defined by the Atom spec, like summary) correctly.

What I meant in my issue description is that checks here perhaps should be enhanced to record the starting element when parsingXHTML was set to YES, or keep track of opening-closing elements some way, because I don't think the assumption that an ending summary element may not appear in XHTML is valid. (I'm not exactly sure that they're the checks which need to be modified.)

@brentsimmons
Copy link
Collaborator

p, b, h1, etc. are (X)HTML tags, while summary is not.

@meissnem
Copy link

Not to split hairs, but <summary> is indeed an HTML tag.

And doesn't the presence of the namespace on <div xmlns="http://www.w3.org/1999/xhtml"> propagate to child tags -- so it should be ignored by the Atom parser.

@brentsimmons
Copy link
Collaborator

Good point! But it’s not an XHTML tag — it came in with HTML 5.1, it looks like.

I want to avoid adding code to the Atom parser for the sake of a single feed, because this means running that code for everyone on all Atom feeds.

Question: what are you trying to do with summary inside content? Use it as HTML or as an Atom summary?

@handlerug
Copy link
Author

It happens with a feed that has articles which use the <details> tag. It's often paired with the <summary> tag to customize the text that appears near the drop-down arrow. So, as HTML (HTML5). I'd imagine there are other feeds that do this, too.

@brentsimmons
Copy link
Collaborator

Reopening this.

My take is that when <content type="xhtml"> then the summary tag is not valid, because it’s not XHTML. (It’s HTML 5.1.)

However, it’s perfectly understandable that a website with an Atom feed would just put the article content inside the content tag, same as it’s done for years, and not one person would ever think about the fact that the site is now using some version of HTML newer than XHTML. Nobody cares! (Quite rightly.)

This means we have to add code to the relevant part of the Atom parser to parse a summary tag as HTML:summary instead of as Atom:summary when it’s inside a <content type="xhtml">.

@brentsimmons brentsimmons reopened this Sep 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants