You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I need fecth the text , table and images adress from word document. Using doc.element.body loop I can't detect or recognise the images in the document and using doc.part.rels.values looping I can only get images. How can I get both text and tables along with images and it should in the same order of the source word document.
I used one list variable to store the results. The problem is I am not able to detect image occurrence the document using element.body loop so I can't able to run the doc.part.rels.values loop.
The text was updated successfully, but these errors were encountered:
How to get both paragraphs and tables from the document body in document/reading order.
How to get images in reading order.
Paragraphs and tables are both block items, meaning they take up a whole vertical segment of the document and extend between the margins, like "blocks" stacked on top of each other.
Images are inline elements, meaning they occur inside block elements (inside Paragraph.runs specifically) and a given paragraph can contain more than one. The closest python-docx can get you to those currently is with run.iter_inner_content() -> Iterator[str | Drawing | RenderedPageBreak] of which images appear in Drawing elements.
Only the XML is available on a Drawing element, on drawing._drawing. Using XPath on that XML, the drawing contains pictures at either ./wp:inline/a:graphic/a:graphicData/pic:pic (an "inline" picture) or ./wp:anchor/a:graphic/a:graphicData/pic:pic (a "placed" or so-called "floating" picture). So you'd have to dig into that XML to get the rId of the picture element and match it up with the corresponding ImagePart of you wanted to get those in document order.
I need fecth the text , table and images adress from word document. Using doc.element.body loop I can't detect or recognise the images in the document and using doc.part.rels.values looping I can only get images. How can I get both text and tables along with images and it should in the same order of the source word document.
I used one list variable to store the results. The problem is I am not able to detect image occurrence the document using element.body loop so I can't able to run the doc.part.rels.values loop.
The text was updated successfully, but these errors were encountered: