Export content only #240

tritium21 · 2017-05-27T05:24:14Z

It would be extremely helpful to me if it were possible to export only the content of with no or tags. I intend to pass the document on to further processing and will provide those parts myself.

bitscompagnie · 2017-11-02T08:36:47Z

Hello @tritium21,

You could use the pandoc tool to achieve what you are looking for. Once installed, you can convert a document to plain text with the following command in the terminal or command prompt:
pandoc test.docx -f docx -t plain -s -o test.txt

Hope the above helps you.

jlward · 2017-11-02T14:13:16Z

It would not be difficult to create a custom parser that strips out all the tags. It's something we've wanted to include anyway, so if you end up using that approach, PRs are welcome.

IuryAlves · 2019-05-24T10:47:51Z

I have done something similar:

from pydocx.export.base import PyDocXExporter


class RawExporter(PyDocXExporter):

    def apply_newlines(self, nodes):
        if nodes:
            return '\n'.join(node for node in nodes)
        return ''

    def export_paragraph(self, paragraph):
        nodes = super(RawExporter, self).export_paragraph(paragraph)
        return self.apply_newlines(nodes)

    def export_break(self, br):
        nodes = super(RawExporter, self).export_break(br)
        return self.apply_newlines(nodes)


with open('test.docx') as fp:
    output = ''.join(result for result in RawExporter(fp).export())
    print(output)

@tritium21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export content only #240

Export content only #240

tritium21 commented May 27, 2017

bitscompagnie commented Nov 2, 2017 •

edited

Loading

jlward commented Nov 2, 2017

IuryAlves commented May 24, 2019 •

edited

Loading

Export content only #240

Export content only #240

Comments

tritium21 commented May 27, 2017

bitscompagnie commented Nov 2, 2017 • edited Loading

jlward commented Nov 2, 2017

IuryAlves commented May 24, 2019 • edited Loading

bitscompagnie commented Nov 2, 2017 •

edited

Loading

IuryAlves commented May 24, 2019 •

edited

Loading