Extract text from Word textboxes [proposed label: enhancement] #688

Mrodent · 2024-03-16T20:42:28Z

I just did a read_docx as part of my testing for my project on a test .docx file with various things including a textbox.
Examining the resulting Value::Object I can't find the text in my textbox anywhere.
I can see from the crates.io page that at the bottom, under "Features", "Textbox" is left unticked.
Does this mean that the parsing basically ignores all textboxes?

And yet, when I uncompress the .docx file, in document.xml there it is, near the end:

"v:textbox style="mso-fit-shape-to-text:t"><w:txbxContent><w:p w:rsidR="0094123E" w:rsidRPr="00DF617B" w:rsidRDefault="0094123E" w:rsidP="0094123E"><w:pPr><w:ind w:left="0" w:firstLine="0"/></w:pPr><w:r w:rsidRPr="00DF617B"><w:t>Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</w:t></w:r></w:p><w:p w:rsidR="0094123E" w:rsidRDefault="0094123E"/></w:txbxContent></v:textbox>"

Have I got this right about omitting textboxes currently?

If so, any reason why this is not apparently currently included in the parsing? It's slightly irksome because it means I'll have to cobble together my own code to parse document.xml.

The text was updated successfully, but these errors were encountered:

bokuweb · 2024-03-18T14:20:28Z

@Mrodent Thanks for your report. Could you please provide docx?

Mrodent · 2024-03-27T21:38:58Z

Here's a small .docx file with a text box. On my setup the text in the text box is just ignored when I parse.
test_file_2.docx

... but if you uncompress you'll find what I've included in my previous post.

By the way, I have only Word 2007 installed ... this may make a difference to something.

Mrodent · 2024-05-17T08:50:17Z

Edited the title in the hope that you might find time to give this some thought. Omitting text from text-boxes seems a bit of an oversight, which could seemingly be corrected fairly easily...

Mrodent changed the title ~~Clarification about Word textboxes?~~ Extract text from Word textboxes [proposed label: enhancement] May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract text from Word textboxes [proposed label: enhancement] #688

Extract text from Word textboxes [proposed label: enhancement] #688

Mrodent commented Mar 16, 2024

bokuweb commented Mar 18, 2024

Mrodent commented Mar 27, 2024 •

edited

Loading

Mrodent commented May 17, 2024

Extract text from Word textboxes [proposed label: enhancement] #688

Extract text from Word textboxes [proposed label: enhancement] #688

Comments

Mrodent commented Mar 16, 2024

bokuweb commented Mar 18, 2024

Mrodent commented Mar 27, 2024 • edited Loading

Mrodent commented May 17, 2024

Mrodent commented Mar 27, 2024 •

edited

Loading