Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd margin being added to bullets #224

Open
jhubert opened this issue Dec 21, 2016 · 8 comments
Open

Odd margin being added to bullets #224

jhubert opened this issue Dec 21, 2016 · 8 comments

Comments

@jhubert
Copy link
Contributor

jhubert commented Dec 21, 2016

When certain docx files that have adjusted margins get imported, the resulting HTML places the margin in the wrong place. This results in oddly formatted HTML.

For example, here are two lists in word:

image

The first list has been indented, the second one has the standard doc indentation.

Here is the result in HTML:

image

The resulting HTML has a span inside the li with a margin-left set on it:

<li><span style="margin-left:3.00em">This is a list item</span></li>

It seems that the whole ul should have the margin, if anything at all.

Here is the sample file:
list-item-margin.docx

And here is the cleaned up docx source from the document.xml file:

  <w:p w14:paraId="25B98899" w14:textId="77777777" w:rsidR="00442583" w:rsidRDefault="00442583" w:rsidP="00442583">
    <w:r>
      <w:t>Headline:</w:t>
    </w:r>
  </w:p>
  <w:p w14:paraId="59BA247B" w14:textId="77777777" w:rsidR="00442583" w:rsidRDefault="00442583" w:rsidP="00442583">
    <w:pPr>
      <w:pStyle w:val="ListParagraph"/>
      <w:numPr>
        <w:ilvl w:val="0"/>
        <w:numId w:val="2"/>
      </w:numPr>
      <w:ind w:left="720"/>
    </w:pPr>
    <w:r>
      <w:t>This is a list item</w:t>
    </w:r>
  </w:p>
  <w:p w14:paraId="551666C9" w14:textId="77777777" w:rsidR="00442583" w:rsidRDefault="00442583" w:rsidP="00442583">
    <w:pPr>
      <w:pStyle w:val="ListParagraph"/>
      <w:numPr>
        <w:ilvl w:val="0"/>
        <w:numId w:val="2"/>
      </w:numPr>
      <w:ind w:left="720"/>
    </w:pPr>
    <w:r>
      <w:t>This is a list item</w:t>
    </w:r>
  </w:p>
  <w:p w14:paraId="1A7915D0" w14:textId="08BE8BEB" w:rsidR="005D0069" w:rsidRDefault="005D0069" w:rsidP="00892FBD"/>
  <w:p w14:paraId="29C5692C" w14:textId="323CB438" w:rsidR="00FB2CED" w:rsidRDefault="00FB2CED" w:rsidP="00892FBD">
    <w:r>
      <w:t>Headline:</w:t>
    </w:r>
  </w:p>
  <w:p w14:paraId="156080DF" w14:textId="24763A50" w:rsidR="00FB2CED" w:rsidRDefault="00FB2CED" w:rsidP="00FB2CED">
    <w:pPr>
      <w:pStyle w:val="ListParagraph"/>
      <w:numPr>
        <w:ilvl w:val="0"/>
        <w:numId w:val="3"/>
      </w:numPr>
    </w:pPr>
    <w:r>
      <w:t>This is a list item</w:t>
    </w:r>
  </w:p>
  <w:p w14:paraId="5DDA8D93" w14:textId="4FE9F6C2" w:rsidR="00FB2CED" w:rsidRDefault="00FB2CED" w:rsidP="00FB2CED">
    <w:pPr>
      <w:pStyle w:val="ListParagraph"/>
      <w:numPr>
        <w:ilvl w:val="0"/>
        <w:numId w:val="3"/>
      </w:numPr>
    </w:pPr>
    <w:r>
      <w:t>This is a list item</w:t>
    </w:r>
  </w:p>

The difference seems to be the existence of the <w:ind w:left="720"/> value, which I'm assuming is telling pydocx to add an indentation.

@botzill
Copy link
Contributor

botzill commented Dec 23, 2016

OK, will investigate this issue and try to come with a PR.
If there are any suggestions, let me know.

@botzill
Copy link
Contributor

botzill commented Dec 23, 2016

Btw @jhubert, what is the desired output for this? We should not have that margin at all? Because I don't see any margin when opening in libreoffice, as you mentioned in first screen. But when converting from libreoffice to html I get:

screen shot 2016-12-23 at 12 48 40 pm

which is a little different from what we have with pydocx.

@jhubert
Copy link
Contributor Author

jhubert commented Dec 23, 2016

I think the desired output is that the inset matches the word document. For this simple case, that should just mean removing the margin on the inner span.

@botzill
Copy link
Contributor

botzill commented Dec 23, 2016

Hm, but there can be cases when we actually need this margin there?

@jhubert
Copy link
Contributor Author

jhubert commented Dec 23, 2016

There are definitely more complex cases, all of which I don't think are being handled properly. Here are some examples.

When the word document has this:
image

The HTML output is this:
image

In the first case, the nested list items are getting margin added to the content of each item but the bullet should be in line with the headline. Basically everything is wrong.

In the second case, the list should have a negative margin so the list items match the indent of the headline.

In the third case, the list should have additional margin so that it's inset more into the page than the headline.

I would call these more or less edge cases... the only one that really feels broken when looking at it is:
image

So, that's probably worth spending the most time on. If the rest of them get solved in the process, hurrah! 💯

@botzill
Copy link
Contributor

botzill commented Dec 23, 2016

@jhubert can you also attach .docx files with this example you mention, just to have some for tests. Thx

@botzill
Copy link
Contributor

botzill commented Dec 23, 2016

I just don't understand when we need to ignore this margin and when we should not. Maybe @winhamwr @kylegibson can give some advice on this.

@jhubert
Copy link
Contributor Author

jhubert commented Jan 4, 2017

@botzill I can't think of a time where we would want the margin next to the list item. If anything, I think there would be a case where we want the margin on the whole ul.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants