Skip to content

Content of merged cells #1442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jodyphelan opened this issue Oct 30, 2024 · 4 comments
Closed

Content of merged cells #1442

jodyphelan opened this issue Oct 30, 2024 · 4 comments

Comments

@jodyphelan
Copy link

I am merging cells with the same content where I want the following table

Screenshot 2024-10-30 at 14 25 30

to merge to

Screenshot 2024-10-30 at 14 25 16

Based on this (from the documentation):

When two or more cells are merged, any existing content is concatenated and placed in the resulting merged cell. Content from each original cell is separated from that in the prior original cell by a paragraph mark. An original cell having no content is skipped in the contatenation process.
Merging four cells with content 'a', 'b', '', and 'd' respectively results in a merged cell having text 'a\nb\nd'.

I thought the following code should produce a merged cell with only one paragraph, however it looks like there are two paragraphs in the merged cell.

from docx import Document
doc = Document('fuits.docx')
tab = doc.tables[0]
c1 = tab.cell(1,0)
c2 = tab.cell(2,0)

c2.text = ''
cm = c1.merge(c2)
cm.paragraphs

Result:

[<docx.text.paragraph.Paragraph at 0x112cb9330>,
 <docx.text.paragraph.Paragraph at 0x112cba6b0>]

Is this expected?

@scanny
Copy link
Contributor

scanny commented Oct 30, 2024

Sounds plausible. What is the specific before and after text of the cells? The "\n" in the docs would be in cell.text. If you're looking in cell.paragraphs each "\n" would start a new paragraph.

@scanny
Copy link
Contributor

scanny commented Oct 30, 2024

If you merged the top version you would receive the bottom version but with "Apple\nApple". The merging algorithm doesn't have anything to do with deduplicating text. You'll have to take care of that yourself.

@scanny scanny closed this as completed Oct 30, 2024
@jodyphelan
Copy link
Author

jodyphelan commented Oct 30, 2024

Thanks for taking a look at this @scanny. From the documentation it said "An original cell having no content is skipped in the contatenation process" so I thought by doing c2.text = '' the whole paragraph will be skipped, thanks for clearing that up!

@scanny
Copy link
Contributor

scanny commented Oct 30, 2024

This is the code that controls that behavior:
https://github.com/python-openxml/python-docx/blob/master/src/docx/oxml/table.py#L616-L630

I expect what's happening is that a paragraph with a run that contains the empty string is considered distinct from a paragraph with no runs. The latter being what you get if the cell was empty from the start.

If you wanted to hack something in then this might produce the result you're looking for:

# -- instead of `cell.text = ""`... --
tc = cell._tc
tc.clear_content()
p = tc.add_p()

This is what the Cell.text setter is doing but it goes a little further:
https://github.com/python-openxml/python-docx/blob/master/src/docx/table.py#L273-L284

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants