-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export content only #240
Comments
Hello @tritium21, You could use the pandoc tool to achieve what you are looking for. Once installed, you can convert a document to plain text with the following command in the terminal or command prompt: Hope the above helps you. |
It would not be difficult to create a custom parser that strips out all the tags. It's something we've wanted to include anyway, so if you end up using that approach, PRs are welcome. |
I have done something similar: from pydocx.export.base import PyDocXExporter
class RawExporter(PyDocXExporter):
def apply_newlines(self, nodes):
if nodes:
return '\n'.join(node for node in nodes)
return ''
def export_paragraph(self, paragraph):
nodes = super(RawExporter, self).export_paragraph(paragraph)
return self.apply_newlines(nodes)
def export_break(self, br):
nodes = super(RawExporter, self).export_break(br)
return self.apply_newlines(nodes)
with open('test.docx') as fp:
output = ''.join(result for result in RawExporter(fp).export())
print(output) |
It would be extremely helpful to me if it were possible to export only the content of with no or tags. I intend to pass the document on to further processing and will provide those parts myself.
The text was updated successfully, but these errors were encountered: