-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds text layer and copy&paste #94
base: master
Are you sure you want to change the base?
Conversation
Wow, @lafickens, this is amazing -- thanks so much for making a pull request, I'd love to get this into pdf-view. ⚡ I was just testing this and noticed a few things:
Thanks again! |
@izuzak Oops I missed the syncTeX functionality when adding text layer. I will look into that. I think adding an item in config for enabling text layer is a good idea. There is indeed some problem in positioning the text layer on top of pdf canvas, which is probably the reason for the peculiar behavior when selecting text. There is however another possible reason to such problem - the font ascent and descent attributes are not properly set when generating the pdf. Although PDF.js developers claim to have solved the problem (see mozilla/pdf.js#4665), the problem still exists for some documents. I have encountered one such document. Download it and try opening it in Firefox (or PDF.js viewer) and select some text, you would see the text layer is about 20 pixels above the actual text in canvas... What I've discovered so far is that pdfs converted from LaTeX works pretty well in PDF.js. |
Just a heads-up -- I switched the package to use JavaScript in #98, so this branch is no longer mergeable. If you do continue working on this -- we can tackle the conversion to JS last, after we get things working. Thanks again! 🙇 |
This adds a text layer on top of the original PDF canvas, along with a preliminary solution for copy and paste.
Changes made:
@on
method call.textlayerbuilder.js
: This is extracted from Mozilla's pdf.js viewer. I did not wish to do so as this adds a large chunk of code, but it appears that the code cannot be directly required from package.About copy & paste: The fragmented nature of text contents passed by
Page.getTextContent()
makes it hard to find a universal way to copy flawlessly. So far, when copying multiple lines, the nativedocument.execCommand('copy')
is used. This won't preserve formatting if copy destination is in Atom, but works when copying to Word or TextEdit.There is also a
select()
method that will add a line break after the contents that are approximately on the same line. Currently it's commented out since it only makes copied contents looks nicer within Atom. While the method would work for strictly formatted PDFs (e.g. TeX converted), it does not function very well for some irregularly formatted documents, or when word separation is purely done by pixel arrangements 😂