-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEW phpoffice/phpword support #61
Conversation
4ce580f
to
36db4ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my testing there was one sample docx file I tried which that caused the browser to become basically non-responsive (25 second load time at 97% CPU usage according to firefox performance profiler). That same sample doc did work in CMS 4. I think the difference is that this particular document, for whatever reason, every work was wrapped in its own
<span>
tag which you could see when viewing source. I think perhaps the older version of tinymce would tolerate this while the newer version doesn't.
I take it the file is imported properly, but when TinyMCE tries to display, the browser crashes because there's too many weird tag.
Could we just do something like called strip_tags
on the HTML output? Or do some other process to normalise the HTML?
@@ -25,38 +23,24 @@ | |||
use SilverStripe\Versioned\Versioned; | |||
use SilverStripe\View\Parsers\HTMLValue; | |||
use Tidy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a hard dependency on Tidy.
ERROR [Emergency]: Uncaught Error: Class "Tidy" not found IN POST /admin/pages/edit/EditForm/2/field/ImportedFromFile/upload Line 232 in /home/max/Projects/cms5x-doc-convert/vendor/silverstripe/documentconverter/src/ImportField.php Source ====== 223: /** @var Importer $importer */ 224: $importer = Injector::inst()->create($importerClass, $fileDescriptor, $chosenFolderID); 225: $content = $importer->import(); 226: 227: if (is_array($content) && isset($content['error'])) { 228: return $content; 229: } 230: 231: // Clean up with tidy (requires tidy module) * 232: $tidy = new Tidy(); 233:
Can we add this to the composer requirements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
36db4ef
to
ff1f418
Compare
strip_tags() is not not feasible because the whole point of the document converter is that converts a word document to HTML. I'm strongly of the opinion we shouldn't try to normalise the output of the converter, because that's the job of the converter to get it right, we shouldn't be trying to fix its mistakes Here's small snippet of the HTML imported from this particular word doc which was the 1MB download for here:
The bottom part with the I tried uploaded two different sample docx files, and this was the result of one of them. Given the 50% failure rate, and the multitude of different ways that a docx file could be imported wrong due to all of the edges cases, I'm strongly of the opinion that we shouldn't be trying to "bandaid fix" when the phpoffice dependency doesn't do it's job well. We rely on the dependency to do all of the heavy lifting and we shouldn't spend any of our own time trying to "fix" it. As noted above this dependency has over 1K open issues on it. |
A more realistic solution would be to simply put a big red warning on the import screen that says "Importing certain documents may make the CMS become unresponsive, if that happens don't use the importer for this particular batch of word documents." - though I'd really question why we'd want to release software in that sort of state. |
Not doing. |
Issue #58
Notes:
<span>
tag which you could see when viewing source. I think perhaps the older version of tinymce would tolerate this while the newer version doesn't.