-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect parsing of list items - missing tabs/spaces. #202
Comments
It looks like there is one definite bug, along with some confusion about the styling. "Gather Items for Re-pricing" should definitely be in the same list as "Prepare your markdown gun" and it's not obvious to me why it isn't. The first step will be adding a fixtures testcase by adding both a .docx and .html file in the fixtures directory. That will let us define the input and then the expected output. If anyone could help with that part, it would be appreciated. From there, someone will need to dive in to the OOXML in the .docx to figure out why we're parsing the .docx as separate lists instead of one list. |
I dove in and took a look at the OOXML for this. I've added the fixtures as well. It looks like what's happening is that it's being considered three different lists because the bulleted list is breaking up the numeric list. Here is the simplified relevant <w:p><w:pPr><w:numPr><w:ilvl w:val="0"/><w:numId w:val="1"/></w:numPr></w:pPr><w:r><w:t>one</w:t></w:r></w:p>
<w:p><w:pPr><w:numPr><w:ilvl w:val="0"/><w:numId w:val="1"/></w:numPr></w:pPr><w:r><w:t>two</w:t></w:r></w:p>
<w:p><w:pPr><w:numPr><w:ilvl w:val="0"/><w:numId w:val="1"/></w:numPr></w:pPr><w:r><w:t>three</w:t></w:r></w:p>
<w:p w:rsidP="007F6A48"><w:pPr><w:numPr><w:ilvl w:val="0"/><w:numId w:val="5"/></w:numPr><w:tabs><w:tab w:val="clear" w:pos="709"/></w:tabs></w:pPr><w:r><w:t>AAA</w:t></w:r></w:p>
<w:p w:rsidP="007F6A48"><w:pPr><w:numPr><w:ilvl w:val="0"/><w:numId w:val="5"/></w:numPr><w:tabs><w:tab w:val="clear" w:pos="709"/></w:tabs></w:pPr><w:r><w:t>BBB</w:t></w:r></w:p>
<w:p w:rsidP="007F6A48"><w:pPr><w:numPr><w:ilvl w:val="0"/><w:numId w:val="5"/></w:numPr><w:tabs><w:tab w:val="clear" w:pos="709"/></w:tabs></w:pPr><w:r><w:t>CCC</w:t></w:r></w:p>
<w:p><w:pPr><w:numPr><w:ilvl w:val="2"/><w:numId w:val="1"/></w:numPr></w:pPr><w:r><w:t>alpha</w:t></w:r></w:p>
<w:p><w:pPr><w:numPr><w:ilvl w:val="0"/><w:numId w:val="1"/></w:numPr></w:pPr><w:r><w:t>four</w:t></w:r></w:p>
<w:p/>
<w:p/>
<w:p><w:pPr><w:numPr><w:ilvl w:val="0"/><w:numId w:val="2"/></w:numPr></w:pPr><w:r><w:t>xxx</w:t></w:r></w:p>
<w:p w:rsidP="007F6A48"><w:pPr><w:numPr><w:ilvl w:val="1"/><w:numId w:val="6"/></w:numPr></w:pPr><w:r><w:t>yyy</w:t></w:r></w:p>
<w:p/>
<w:p><w:pPr><w:numPr><w:ilvl w:val="0"/><w:numId w:val="3"/></w:numPr></w:pPr><w:r><w:t>www</w:t></w:r></w:p>
<w:p w:rsidP="007F6A48"><w:pPr><w:numPr><w:ilvl w:val="1"/><w:numId w:val="7"/></w:numPr></w:pPr><w:r><w:t>zzz</w:t></w:r></w:p> The full |
Can you include |
Of course. Gist update: https://gist.github.com/jhubert/29f7899073b765e74297 Also, here is the docx file: |
@kylegibson I'm about to work on this issue. Have you already started? |
Hi Jeremy. None of us have started work on this issue. I expect it will be awhile before we have time to dedicate to fixing this. We'll be happy to review any PRs that you submit! |
Awesome. Good to know. We'll get a PR together. :) |
Hi.
I have such a file:
subsections_format.docx
after converting to html we get:
As you can see the subsections are not properly formatted. If you guide me where to look for this issue I can submit a pull request to solve this.
Thx a lot.
The text was updated successfully, but these errors were encountered: