Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal with nested paragraphs after #203 #213

Open
jlward opened this issue May 16, 2016 · 1 comment
Open

Deal with nested paragraphs after #203 #213

jlward opened this issue May 16, 2016 · 1 comment

Comments

@jlward
Copy link
Contributor

jlward commented May 16, 2016

The problem:
#203 introduced a bug that results in nested paragraphs.

The solution:

We will need to build a method similar to pydocx.export.base:PyDocXExporter._convert_complex_fields_into_simple_fields. The complex fields to simple fieds (cf2sf) method was built to handle complex hyperlinks. So it's possible to build a hyperlink in OOXML that spans multiple run tags. However, it was not possible to roll all of that into the original hyperlink with using some sort of lookahead. The solution for the nested paragraphs is going to be very similar I think. Basically, when we find an mce:AlternateContent tag, we need to close whatever paragraph (maybe others as well) is holding the mce:AlternateContent Then open a new paragraph (maybe others as well) after the mce:AlternateContent tag has been finished.

For example:

Lets say the XML looks like this.

<p>
<r><t>AAA</t></r>
<r>
    <t>BBB</t>
    <AlternateContent>
    <Fallback>
        <pict>
        <shape>
            <textbox>
            <txbxContent>
                <p>
                <r>
                    <t>CCC</t>
                </r>
                </p>
            </txbxContent>
            </textbox>
        </shape>
        </pict>
    </Fallback>
    </AlternateContent>
    <t>DDD</t>
</r>
<r><t>EEE</t></r>
</p>

We will want it to end up looking like this:

<p>
<r><t>AAA</t></r>
<r>
    <t>BBB</t>
</r>
</p>
<AlternateContent>
<Fallback>
    <pict>
    <shape>
        <textbox>
        <txbxContent>
            <p>
            <r>
                <t>CCC</t>
            </r>
            </p>
        </txbxContent>
        </textbox>
    </shape>
    </pict>
</Fallback>
</AlternateContent>
<p>
<r>
    <t>DDD</t>
</r>
<r><t>EEE</t></r>
</p>

Which would convert the output from:

<p>
AAABBB
<p>
    CCC
</p>
DDDEEE
</p>

to:

<p>AAABBB</p>
<p>CCC</p>
<p>DDDEEE</p>
@kylegibson
Copy link
Contributor

This looks like a good write-up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants