-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for textbox -> txbxContent #203
Changes from 9 commits
340f67e
d0b2cb6
368e957
6015680
47b0a2e
d2b373c
2937c06
fed6498
563354c
b80b088
afb093d
f735783
338a25b
7e5fc4b
030b0d0
1ce2887
97438e5
6b738c1
5554636
4d12156
7e20b60
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,7 +15,7 @@ | |
NumberingSpan, | ||
NumberingSpanBuilder, | ||
) | ||
from pydocx.openxml import wordprocessing, vml | ||
from pydocx.openxml import wordprocessing, vml, markup_compatibility | ||
from pydocx.openxml.packaging import WordprocessingDocument | ||
|
||
|
||
|
@@ -67,6 +67,10 @@ def __init__(self, path): | |
wordprocessing.EmbeddedObject: self.export_embedded_object, | ||
NumberingSpan: self.export_numbering_span, | ||
NumberingItem: self.export_numbering_item, | ||
markup_compatibility.AlternateContent: self.export_alternate_content, | ||
markup_compatibility.Fallback: self.export_fallback, | ||
wordprocessing.Textbox: self.export_textbox, | ||
wordprocessing.TxBxContent: self.export_textbox_content, | ||
} | ||
self.field_type_to_export_func_map = { | ||
'HYPERLINK': getattr(self, 'export_field_hyperlink', None), | ||
|
@@ -531,3 +535,16 @@ def export_field_char(self, field_char): | |
|
||
def export_field_code(self, field_code): | ||
pass | ||
|
||
def export_textbox(self, textbox): | ||
return self.yield_nested(textbox.children, self.export_node) | ||
|
||
def export_textbox_content(self, textbox_content): | ||
return self.yield_nested(textbox_content.children, self.export_node) | ||
|
||
# Markup Compatibility exporters | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems like we no longer need this comment. |
||
def export_alternate_content(self, alternate_content): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we call this |
||
return self.yield_nested(alternate_content.children, self.export_node) | ||
|
||
def export_fallback(self, fallback): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
return self.yield_nested(fallback.children, self.export_node) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,6 +5,7 @@ | |
unicode_literals, | ||
) | ||
|
||
import importlib | ||
import inspect | ||
from collections import defaultdict | ||
|
||
|
@@ -120,9 +121,24 @@ class ParkingLot(XmlModel): | |
def __init__(self, *types, **kwargs): | ||
default = kwargs.pop('default', []) | ||
super(XmlCollection, self).__init__(self, default=default) | ||
self.types = set(types) | ||
self._types = types | ||
self._name_to_type_map = None | ||
|
||
@property | ||
def types(self): | ||
return set(self._set_types(*self._types)) | ||
|
||
def _set_types(self, *types): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we want to consider caching this result? Would be expensive to have to run it more than once, if stuff is needing to be imported. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had considered it. Do you recall if pydocx has a built in caching tool? Or should I do it the old fashioned way? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A thought. This gets called for each instance. I might need to make There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think a class method would actually fix this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My research leads me to believe that no caching is needed (it won't actually change anything). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you clarify what you researched, Jason? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I printed the types in this method and could only get that printing the first time it was used in a test. The second test did not show any of the printing. |
||
base_path = 'pydocx.openxml.{0}' | ||
for _type in types: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I really don't like moving all of the imports to be string-based. That removes some of the developer-friendly niceness of immediately knowing when you typo'd an import. Django's approach of mostly using actual imports, but supporting string-based imports where necessary, seems like the best of both worlds. We use strings when we need it, and get immediate feedback on import fails for all of the other cases. |
||
try: | ||
path, klass, = _type.rsplit('.', 1) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A stray There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. :( |
||
except AttributeError: | ||
yield _type | ||
else: | ||
module = importlib.import_module(base_path.format(path)) | ||
yield getattr(module, klass) | ||
|
||
@property | ||
def name_to_type_map(self): | ||
if self._name_to_type_map is None: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
from pydocx.openxml.markup_compatibility.alternate_content import AlternateContent | ||
from pydocx.openxml.markup_compatibility.fallback import Fallback | ||
|
||
__all__ = [ | ||
'AlternateContent', | ||
'Fallback', | ||
] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# coding: utf-8 | ||
from __future__ import ( | ||
absolute_import, | ||
print_function, | ||
unicode_literals, | ||
) | ||
|
||
from pydocx.models import XmlModel, XmlCollection | ||
|
||
|
||
class AlternateContent(XmlModel): | ||
XML_TAG = 'AlternateContent' | ||
children = XmlCollection('markup_compatibility.Fallback') |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# coding: utf-8 | ||
from __future__ import ( | ||
absolute_import, | ||
print_function, | ||
unicode_literals, | ||
) | ||
|
||
from pydocx.models import XmlModel, XmlCollection | ||
|
||
|
||
class Fallback(XmlModel): | ||
XML_TAG = 'Fallback' | ||
children = XmlCollection('wordprocessing.Picture') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you know what other child types Fallback supports? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will be dealing with that in #204 as I mentioned during stand up :) I don't plan to push a new release until both this and 203 are done. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add this comment? |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,18 +8,6 @@ | |
|
||
from pydocx.models import XmlModel, XmlCollection, XmlChild | ||
from pydocx.openxml.wordprocessing.run_properties import RunProperties | ||
from pydocx.openxml.wordprocessing.br import Break | ||
from pydocx.openxml.wordprocessing.drawing import Drawing | ||
from pydocx.openxml.wordprocessing.field_char import FieldChar | ||
from pydocx.openxml.wordprocessing.field_code import FieldCode | ||
from pydocx.openxml.wordprocessing.picture import Picture | ||
from pydocx.openxml.wordprocessing.no_break_hyphen import NoBreakHyphen | ||
from pydocx.openxml.wordprocessing.text import Text | ||
from pydocx.openxml.wordprocessing.tab_char import TabChar | ||
from pydocx.openxml.wordprocessing.deleted_text import DeletedText | ||
from pydocx.openxml.wordprocessing.footnote_reference import FootnoteReference | ||
from pydocx.openxml.wordprocessing.footnote_reference_mark import FootnoteReferenceMark | ||
from pydocx.openxml.wordprocessing.embedded_object import EmbeddedObject | ||
from pydocx.util.memoize import memoized | ||
|
||
|
||
|
@@ -29,18 +17,19 @@ class Run(XmlModel): | |
properties = XmlChild(type=RunProperties) | ||
|
||
children = XmlCollection( | ||
EmbeddedObject, | ||
TabChar, | ||
Break, | ||
NoBreakHyphen, | ||
Text, | ||
Drawing, | ||
Picture, | ||
DeletedText, | ||
FootnoteReference, | ||
FootnoteReferenceMark, | ||
FieldChar, | ||
FieldCode, | ||
'wordprocessing.EmbeddedObject', | ||
'wordprocessing.TabChar', | ||
'wordprocessing.Break', | ||
'wordprocessing.NoBreakHyphen', | ||
'wordprocessing.Text', | ||
'wordprocessing.Drawing', | ||
'wordprocessing.Picture', | ||
'wordprocessing.DeletedText', | ||
'wordprocessing.FootnoteReference', | ||
'wordprocessing.FootnoteReferenceMark', | ||
'wordprocessing.FieldChar', | ||
'wordprocessing.FieldCode', | ||
'markup_compatibility.AlternateContent', | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you doing this just to eliminate the import clutter? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yup There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, one concern I have about this is that we're not validating the types until the class is actually used, which means errors could be hidden. Before, that validation would occur at initial run time. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As long as we have a test on each of these nodes, it doesn't matter if it happens at run time vs compile time. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you verify that is the case? We'll also need to ensure that any future additions have the requisite test cases as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think testcase coverage is sufficient to mitigate that concern. We're not the only users of PyDocx. We also need to think about the developer experience for other folks that will be using and extending the library. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know how to solve your problem then. If we want to allow strings (which I am +1 on), then we have to rely on tests. If we don't want strings, then we have to update things in odd places. |
||
) | ||
|
||
def get_style_chain_stack(self): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# coding: utf-8 | ||
from __future__ import ( | ||
absolute_import, | ||
print_function, | ||
unicode_literals, | ||
) | ||
|
||
from pydocx.models import XmlModel, XmlCollection | ||
|
||
|
||
class TxBxContent(XmlModel): | ||
XML_TAG = 'txbxContent' | ||
children = XmlCollection( | ||
'wordprocessing.Paragraph', | ||
) | ||
|
||
|
||
class Textbox(XmlModel): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This guy should really live in |
||
XML_TAG = 'textbox' | ||
|
||
children = XmlCollection( | ||
TxBxContent, | ||
) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,34 +24,35 @@ def convert(path, *args, **kwargs): | |
|
||
class ConvertDocxToHtmlTestCase(DocXFixtureTestCaseFactory): | ||
cases = ( | ||
'read_same_image_multiple_times', | ||
'all_configured_styles', | ||
'export_from_googledocs', | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Most of this is alphabetizing. |
||
'external_image', | ||
'has_missing_image', | ||
'has_missing_image', | ||
'has_title', | ||
'inline_tags', | ||
'has_missing_image', | ||
'justification', | ||
'list_in_table', | ||
'external_image', | ||
'export_from_googledocs', | ||
'has_missing_image', | ||
'lists_with_styles', | ||
'missing_numbering', | ||
'missing_style', | ||
'nested_lists', | ||
'nested_table_rowspan', | ||
'nested_tables', | ||
'no_break_hyphen', | ||
'read_same_image_multiple_times', | ||
'rotate_image', | ||
'shift_enter', | ||
'simple', | ||
'simple_lists', | ||
'simple_table', | ||
'special_chars', | ||
'styled_bolding', | ||
'table_col_row_span', | ||
'table_with_multi_rowspan', | ||
'tables_in_lists', | ||
'textbox', | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This one is new. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO for ourselves, we should avoid adding test cases via this method and prefer adding explicit XML test cases. Being able to test against a test file is good for new developers, since it eliminates some of the overhead in getting an issue fixed. It's been by goal to gradually reduce this list down by converting them into explicit XML test cases. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would disagree. Way back when, I wanted to have an actual docx file that was being tested for each case we were working on. It's very easy to build an XML snippet that can't actually exist in a docx file. These are smoke tests. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not scalable to use full documents for each specific case. You will end up with thousands of poorly named, poorly organized files. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not trying to say each docx file has one and only one thing it's testing. I want to make sure that newly considered nodes exist in a docx file that we are parsing in a test. This ensures that, at least when the new node is first added, that we have a real life example that we know now works. I feel pretty strongly about having each of the nodes being processed in our tests from docx files. Mostly because back when I would fake it (in XML), I created more than a few instances of XML that could not exist in an actual docx. Meaning I was writing fixes for things that could not happen. Meaning I quite likely didn't solve the problem. |
||
'track_changes_on', | ||
'table_with_multi_rowspan', | ||
'rotate_image' | ||
) | ||
|
||
@raises(MalformedDocxException) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
<p> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fun fact, if you look at the XML for this, it is actually nested paragraphs. Not really sure how to stop it from doing the outside one. Or even if that will be the correct solution in the general case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is probably going to be bad. I think we need to figure out a solution for this. I suspect we will also want additional test cases that consist of having text within the parent level paragraph as well. For example:
IMO this needs to be de-normalized into three separate paragraphs for HTML purposes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll see what I can do, but I don't know if I know how to do something like this anymore. |
||
<p>AAA</p> | ||
<p>BBB</p> | ||
<p>CCCDDD</p> | ||
</p> |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,6 +12,7 @@ commands = | |
deps = | ||
-rrequirements/testing.txt | ||
defusedxml: defusedxml==0.4.1 | ||
py26: importlib | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. :( Might be time to drop support for python 2.6 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 on dropping python 2.6 support if it makes anything easier. We could do that in this PR or in a separate PR and then bump the major version for the next release. |
||
|
||
[testenv:docs] | ||
commands = | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alphabetize?