Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make conversion non-destructive to soup; improve div/article/section handling #184

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

chrispy-snps
Copy link
Collaborator

This merge request does the following:

  • Makes convert_soup() non-destructive (soup left as-is)
  • Implements block-element newline separation for <div>, <article>, <section> elements
  • Fixes div mis-converted #107

Unit tests are updated.

Regarding #107, I believe that block-element newline separation, not line continuation, is the correct behavior at <div>, <article>, and <section> elements. These elements are all block elements. The following HTML example shows that in both the <p> and <div> cases, the separation between "foo" and "bar" uses block-element separation behavior, not <br /> line-continuation behavior:

<!DOCTYPE html>
<html>
 <head>
  <title>Page Title</title>
  <style>

p, div {
 margin-top: 1em;
 margin-bottom: 1em;
 border: 1px black dotted;
 background-color: yellow;
}

  </style>
 </head>
 <body>

  <p>foo</p>
  <p>bar<br />baz</p>

  foo
  <div>bar<br />baz</div>

 </body>
</html>

@chrispy-snps chrispy-snps requested a review from AlexVonB February 1, 2025 23:24
@chrispy-snps
Copy link
Collaborator Author

@jsm28 - I am interested in your feedback on this pull request, if you have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

div mis-converted
1 participant