Create annotations from highlights #6

brush701 · 2021-04-07T20:39:38Z

Implements the PDF annotation spec for highlights, including QuadPoints which are used by many utilities to extract the underlying text. Tested and working with a range of sample PDFs on Windows 10 Home, OS build 19042.867

rschroll

Thanks for all of the work on this, and sorry it's taken me a while to get to reviewing the work. In truth, part of the delay was knowing that it would take a while for me to wrap my head around this code.

I've put a bunch of nit-picky comments within the code. These are generally based on my own style opinions, which aren't documented anywhere. And since this code was adapted from RCU, it's generally not consistent anyway. If you don't feel like cleaning these up, I'll take care of them.

But before dealing with those, I see two large problems to address:

First, with this patch I get all highlighted sections drawn as rectangles. While this works well for highlighted text, this doesn't work well for drawings with the highlighter. At the very least, we should provide a setting to toggle between drawn highlights and annotations.

However, with the new split in how highlighting works with the tablet, I wonder if the easier solution is to only turn highlights from the highlights file into annotations, and leave the highlights in the lines files as drawn lines on the PDF. I feel like this would do the right thing in at least 95% of the cases going forward. (It would also let us avoid passing the annotations list into the pen classes, which has struck me as a confusion of responsibilities.)

Second, this doesn't handle some of the weird and crazy things PDF files do with rotations and crop boxes. Rather than try to explain, I'll attach some acid test files I used to figure out the transformations. Each page has a line of text in the middle of the form (page size) - (crop box) - rotation. The page coordinates are plotted around the edges. On each page, I've drawn a highlight over the line of text, as well as an arrow pointing upwards above this line. Note that these are lines-file-type highlights, so this may cause a testing problem if you follow my suggest approach above.

In brief, the tablet displays the CropBox, not the PageBox, following the page rotation. But it adds a -90 rotation to any page that would end up being landscape, based on its original size and rotation. If the page does not match the aspect ratio of the tablet, the CropBox is placed in the upper-left corner of the device, meaning extra space appears to the bottom or right. But this is based on the device, not the page orientation, so that space may be above or to the left of the CropBox in page coordinates.

(I wish I could say that I worked this out through clever calculation, but in truth I just tried every transformation until I found one that worked. For whatever reason, I've never been properly able to wrap my head around coordinate transformations. So I hope the explanation above makes some kind of sense.)

Please feel free to discuss this further here. I should be able to be more responsive in the days ahead.

boxes.2.zip
skinny.2.zip
wide.2.zip

rschroll · 2021-05-31T02:41:09Z

rmrl/annotation.py

+        self.x = x
+        self.y = y
+
+    def toList(self) -> list:


Nit: to_list. (The existing code base is not terribly consistent here, since it derived from some Qt code. But I'm mostly following pep8 with new code, I think.)

Alternatively, we could implement __iter__ and then just call list(point). And depending what it's used for, we might just iterate through the point directly. But I'm happy with to_list; only go this way if it seems to provide other benefits.

rschroll · 2021-05-31T02:41:50Z

rmrl/annotation.py

+            # the line cannot have positive overlap
+            return False
+
+


Nit: No more than one empty line.

rschroll · 2021-05-31T02:42:16Z

rmrl/annotation.py

+                    max(self.ur.y, rectB.ur.y))
+        return Rect(ll, ur)
+
+    def toList(self) -> list:


Nit: to_list.

rschroll · 2021-05-31T02:43:49Z

rmrl/annotation.py

+
+
+    @staticmethod
+    def fromRect(rect: Rect):


Nit: from_rect.

rschroll · 2021-05-31T02:46:41Z

rmrl/annotation.py

+        self.annotype = annotype
+        self.rect = rect
+        if quadpoints:
+            self.quadpoints = quadpoints


Won't this produce some inconsistency, if quadpoints and rect aren't the same?

rschroll · 2021-05-31T02:59:14Z

rmrl/render.py

@@ -74,9 +75,11 @@ def render(source, *,
    # key of zero length, so it doesn't break the rest of the
    # process.
    pages = []
+    highlihgts = []


Nit: spelling.

Also, this doesn't appear to be used anywhere?

rschroll · 2021-05-31T03:00:56Z

rmrl/render.py

    for k, layer_a in enumerate(page_annot):
        layerannots = layer_a[1]
        for a in layerannots:
            # PDF origin is in bottom-left, so invert all
            # y-coordinates.
-            author = 'RCU' #self.model.device_info['rcuname']
+            author = 'reMarkable' #self.model.device_info['rcuname']


Nit: Remove old comment. (Admittedly, not your fault.)

rschroll · 2021-05-31T03:10:11Z

rmrl/render.py

+
+def rotate_annot_points(points: list) -> list:
+    rotated = []
+    for n in range(0,len(points),2):


Perhaps for x, y in zip(points[::2], points[1::2])?

Or there are more clever solutions here: https://stackoverflow.com/questions/5389507/iterating-over-every-two-elements-in-a-list. (I'm actually surprised there isn't a solution in itertools.)

rschroll · 2021-05-31T03:11:10Z

rmrl/render.py

+def scale_annot_points(points: list, scale:float, adjust: list) -> list:
+    scaled = []
+    for i, p in enumerate(points):
+        scaled.append(p*scale + adjust[i%2])


Nit: use a list comprehension.

rschroll · 2021-05-31T03:11:26Z

rmrl/render.py

+    for i, p in enumerate(points):
+        scaled.append(p*scale + adjust[i%2])
+
+    return scaled


Nit: Trailing newline.

Get annotations working

fd408a8

brush701 force-pushed the master branch from ab9b844 to e32bb54 Compare April 7, 2021 20:46

Fixed y-offset error for wider aspect ratios

7af721f

brush701 force-pushed the master branch from e32bb54 to 7af721f Compare April 7, 2021 20:48

brush701 added 2 commits April 21, 2021 12:41

Update to support 2.7 firmware

21a91a0

re-implement support for legacy highlighter

25b9d74

rschroll requested changes May 31, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create annotations from highlights #6

Create annotations from highlights #6

brush701 commented Apr 7, 2021

rschroll left a comment

rschroll May 31, 2021

rschroll May 31, 2021

rschroll May 31, 2021

rschroll May 31, 2021

rschroll May 31, 2021

rschroll May 31, 2021

rschroll May 31, 2021

rschroll May 31, 2021

rschroll May 31, 2021

rschroll May 31, 2021

rschroll May 31, 2021

Create annotations from highlights #6

Are you sure you want to change the base?

Create annotations from highlights #6

Conversation

brush701 commented Apr 7, 2021

rschroll left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment