Content of merged cells #1442

jodyphelan · 2024-10-30T14:40:57Z

I am merging cells with the same content where I want the following table

to merge to

Based on this (from the documentation):

When two or more cells are merged, any existing content is concatenated and placed in the resulting merged cell. Content from each original cell is separated from that in the prior original cell by a paragraph mark. An original cell having no content is skipped in the contatenation process.
Merging four cells with content 'a', 'b', '', and 'd' respectively results in a merged cell having text 'a\nb\nd'.

I thought the following code should produce a merged cell with only one paragraph, however it looks like there are two paragraphs in the merged cell.

from docx import Document
doc = Document('fuits.docx')
tab = doc.tables[0]
c1 = tab.cell(1,0)
c2 = tab.cell(2,0)

c2.text = ''
cm = c1.merge(c2)
cm.paragraphs

Result:

[<docx.text.paragraph.Paragraph at 0x112cb9330>,
 <docx.text.paragraph.Paragraph at 0x112cba6b0>]

Is this expected?

The text was updated successfully, but these errors were encountered:

scanny · 2024-10-30T16:39:30Z

Sounds plausible. What is the specific before and after text of the cells? The "\n" in the docs would be in cell.text. If you're looking in cell.paragraphs each "\n" would start a new paragraph.

scanny · 2024-10-30T16:47:26Z

If you merged the top version you would receive the bottom version but with "Apple\nApple". The merging algorithm doesn't have anything to do with deduplicating text. You'll have to take care of that yourself.

jodyphelan · 2024-10-30T16:52:53Z

Thanks for taking a look at this @scanny. From the documentation it said "An original cell having no content is skipped in the contatenation process" so I thought by doing c2.text = '' the whole paragraph will be skipped, thanks for clearing that up!

scanny · 2024-10-30T18:17:45Z

This is the code that controls that behavior:
https://github.com/python-openxml/python-docx/blob/master/src/docx/oxml/table.py#L616-L630

I expect what's happening is that a paragraph with a run that contains the empty string is considered distinct from a paragraph with no runs. The latter being what you get if the cell was empty from the start.

If you wanted to hack something in then this might produce the result you're looking for:

# -- instead of `cell.text = ""`... --
tc = cell._tc
tc.clear_content()
p = tc.add_p()

This is what the Cell.text setter is doing but it goes a little further:
https://github.com/python-openxml/python-docx/blob/master/src/docx/table.py#L273-L284

scanny closed this as completed Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content of merged cells #1442

Content of merged cells #1442

jodyphelan commented Oct 30, 2024

scanny commented Oct 30, 2024

scanny commented Oct 30, 2024

jodyphelan commented Oct 30, 2024 •

edited

Loading

scanny commented Oct 30, 2024

Content of merged cells #1442

Content of merged cells #1442

Comments

jodyphelan commented Oct 30, 2024

scanny commented Oct 30, 2024

scanny commented Oct 30, 2024

jodyphelan commented Oct 30, 2024 • edited Loading

scanny commented Oct 30, 2024

jodyphelan commented Oct 30, 2024 •

edited

Loading