-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create annotations from highlights #6
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all of the work on this, and sorry it's taken me a while to get to reviewing the work. In truth, part of the delay was knowing that it would take a while for me to wrap my head around this code.
I've put a bunch of nit-picky comments within the code. These are generally based on my own style opinions, which aren't documented anywhere. And since this code was adapted from RCU, it's generally not consistent anyway. If you don't feel like cleaning these up, I'll take care of them.
But before dealing with those, I see two large problems to address:
First, with this patch I get all highlighted sections drawn as rectangles. While this works well for highlighted text, this doesn't work well for drawings with the highlighter. At the very least, we should provide a setting to toggle between drawn highlights and annotations.
However, with the new split in how highlighting works with the tablet, I wonder if the easier solution is to only turn highlights from the highlights file into annotations, and leave the highlights in the lines files as drawn lines on the PDF. I feel like this would do the right thing in at least 95% of the cases going forward. (It would also let us avoid passing the annotations list into the pen classes, which has struck me as a confusion of responsibilities.)
Second, this doesn't handle some of the weird and crazy things PDF files do with rotations and crop boxes. Rather than try to explain, I'll attach some acid test files I used to figure out the transformations. Each page has a line of text in the middle of the form (page size) - (crop box) - rotation
. The page coordinates are plotted around the edges. On each page, I've drawn a highlight over the line of text, as well as an arrow pointing upwards above this line. Note that these are lines-file-type highlights, so this may cause a testing problem if you follow my suggest approach above.
In brief, the tablet displays the CropBox, not the PageBox, following the page rotation. But it adds a -90 rotation to any page that would end up being landscape, based on its original size and rotation. If the page does not match the aspect ratio of the tablet, the CropBox is placed in the upper-left corner of the device, meaning extra space appears to the bottom or right. But this is based on the device, not the page orientation, so that space may be above or to the left of the CropBox in page coordinates.
(I wish I could say that I worked this out through clever calculation, but in truth I just tried every transformation until I found one that worked. For whatever reason, I've never been properly able to wrap my head around coordinate transformations. So I hope the explanation above makes some kind of sense.)
Please feel free to discuss this further here. I should be able to be more responsive in the days ahead.
self.x = x | ||
self.y = y | ||
|
||
def toList(self) -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: to_list
. (The existing code base is not terribly consistent here, since it derived from some Qt code. But I'm mostly following pep8 with new code, I think.)
Alternatively, we could implement __iter__
and then just call list(point)
. And depending what it's used for, we might just iterate through the point directly. But I'm happy with to_list
; only go this way if it seems to provide other benefits.
# the line cannot have positive overlap | ||
return False | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: No more than one empty line.
max(self.ur.y, rectB.ur.y)) | ||
return Rect(ll, ur) | ||
|
||
def toList(self) -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: to_list
.
|
||
|
||
@staticmethod | ||
def fromRect(rect: Rect): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: from_rect
.
self.annotype = annotype | ||
self.rect = rect | ||
if quadpoints: | ||
self.quadpoints = quadpoints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this produce some inconsistency, if quadpoints and rect aren't the same?
@@ -74,9 +75,11 @@ def render(source, *, | |||
# key of zero length, so it doesn't break the rest of the | |||
# process. | |||
pages = [] | |||
highlihgts = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: spelling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this doesn't appear to be used anywhere?
for k, layer_a in enumerate(page_annot): | ||
layerannots = layer_a[1] | ||
for a in layerannots: | ||
# PDF origin is in bottom-left, so invert all | ||
# y-coordinates. | ||
author = 'RCU' #self.model.device_info['rcuname'] | ||
author = 'reMarkable' #self.model.device_info['rcuname'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Remove old comment. (Admittedly, not your fault.)
|
||
def rotate_annot_points(points: list) -> list: | ||
rotated = [] | ||
for n in range(0,len(points),2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps for x, y in zip(points[::2], points[1::2])
?
Or there are more clever solutions here: https://stackoverflow.com/questions/5389507/iterating-over-every-two-elements-in-a-list. (I'm actually surprised there isn't a solution in itertools.)
def scale_annot_points(points: list, scale:float, adjust: list) -> list: | ||
scaled = [] | ||
for i, p in enumerate(points): | ||
scaled.append(p*scale + adjust[i%2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: use a list comprehension.
for i, p in enumerate(points): | ||
scaled.append(p*scale + adjust[i%2]) | ||
|
||
return scaled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Trailing newline.
Implements the PDF annotation spec for highlights, including QuadPoints which are used by many utilities to extract the underlying text. Tested and working with a range of sample PDFs on Windows 10 Home, OS build 19042.867