Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export content only #240

Open
tritium21 opened this issue May 27, 2017 · 3 comments
Open

Export content only #240

tritium21 opened this issue May 27, 2017 · 3 comments

Comments

@tritium21
Copy link

It would be extremely helpful to me if it were possible to export only the content of with no or tags. I intend to pass the document on to further processing and will provide those parts myself.

@bitscompagnie
Copy link

bitscompagnie commented Nov 2, 2017

Hello @tritium21,

You could use the pandoc tool to achieve what you are looking for. Once installed, you can convert a document to plain text with the following command in the terminal or command prompt:
pandoc test.docx -f docx -t plain -s -o test.txt

Hope the above helps you.

@jlward
Copy link
Contributor

jlward commented Nov 2, 2017

It would not be difficult to create a custom parser that strips out all the tags. It's something we've wanted to include anyway, so if you end up using that approach, PRs are welcome.

@IuryAlves
Copy link
Contributor

IuryAlves commented May 24, 2019

I have done something similar:

from pydocx.export.base import PyDocXExporter


class RawExporter(PyDocXExporter):

    def apply_newlines(self, nodes):
        if nodes:
            return '\n'.join(node for node in nodes)
        return ''

    def export_paragraph(self, paragraph):
        nodes = super(RawExporter, self).export_paragraph(paragraph)
        return self.apply_newlines(nodes)

    def export_break(self, br):
        nodes = super(RawExporter, self).export_break(br)
        return self.apply_newlines(nodes)


with open('test.docx') as fp:
    output = ''.join(result for result in RawExporter(fp).export())
    print(output)

@tritium21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants