Convert ticket v3 HTML to JSON tickets #22

Kuba314 · 2024-09-22T22:36:51Z

Copy diogotcorreia/lidl-to-grocy's fix to a ticket v2 API change. The fix is to try the v2 API and if it fails use a v3 API which returns the ticket formatted as HTML. The JSON data is then constructed from that HTML.

Credit to https://github.com/diogotcorreia/lidl-to-grocy/blob/master/lidl/src/html_receipt.rs.

Closes #20

This is a draft, because this implementation has not been tested much. Feel free to test it yourself.

Kuba314 · 2024-09-22T23:37:04Z

I have noticed that weighted items (at least in cs/cz) do not work currently. @diogotcorreia This should be the case for your implementation at https://github.com/diogotcorreia/lidl-to-grocy/blob/master/lidl/src/html_receipt.rs as well.

I'm seeing the following entries in the HTML for a weighted item (apples):

<span id="purchase_list_line_21">Jablka Gala                        21,11 B</span>
<span id="purchase_list_line_22">N   0,922 kg   x 22,90  Kč/kg </span>
<span id="purchase_list_line_23">PT: 0,002 kg                              </span>

There's no easy way to parse this AFAICS.

One way would be to store the first non-classed line ending in B as a weighted item with the number being originalAmount, then expect a line starting with N to optionally parse the weight, but technically the originalAmount should be sufficient.

Kuba314 · 2024-09-22T23:56:15Z

I implemented the method for the weighted items that I mentioned in my last comment. I believe this PR is ready for a review.

diogotcorreia · 2024-09-23T10:43:30Z

@Kuba314 Does your receipt not have data- attributes?
This is a sample receipt from my account (in SE): https://github.com/diogotcorreia/lidl-to-grocy/blob/a7856fb5d7369f827d627978bac3460ceab9e0fa/lidl/test/receipt.html
I'm using those data attributes to parse whether an item is weighted, depending if the amount has a decimal separator or not (https://github.com/diogotcorreia/lidl-to-grocy/blob/a7856fb5d7369f827d627978bac3460ceab9e0fa/lidl/src/html_receipt.rs#L52). I think your strategy with detecting the B will not work for everyone since that's country-dependent (it's the VAT type AFAICT).

May I ask, what is the value of type in the v3 ticket response? I took a look inside the APK and there seem to be three possible values, HTML, HTML_OLD (something like that?) and NATIVE.
My guess is that NATIVE would result in the same JSON as the v2 API (because again, in the APK that is still a field in the response), but I have HTML in my receipts (even in the ones that still work on v2).

Kuba314 · 2024-09-23T13:25:31Z

Does your receipt not have data- attributes?

@diogotcorreia It does, just not for weighted items. These 3 lines are the only information that I have in the receipt. I'm not seeing what you're seeing. I only see data-art-quantity when it's a whole number (not weighted, just N amount of the same product).

I think your strategy with detecting the B will not work for everyone since that's country-dependent (it's the VAT type AFAICT).

Yeah... this is very possible. Maybe detecting [A-Z] would be better...

May I ask, what is the value of type in the v3 ticket response?

Do you mean ticketType? That's set to HTML, same as you. I fear that the CZ API for lidl is somehow worse than SE, or it's this exact store's issue or I don't know.

diogotcorreia · 2024-09-23T16:08:55Z

These 3 lines are the only information that I have in the receipt.

@Kuba314 That's unfortunate, I'm not sure how you would fix it then, since you also don't have an article number either :/

Kuba314 · 2024-09-24T10:49:52Z

I have changed the VAT line detection from what is essentially B$ to [A-Z]$. I hope that this works for everyone. I'm not aware of all the possible VAT types and what their values could be, but I assume it's always an uppercase letter.

vilmosnagy · 2024-10-01T05:58:35Z

FYKI: Hungarian Lidl Plus API broke as well a couple of days ago (some days after 09.21), but this PR solves the issue for me.

Thanks @Kuba314

salvadorbs · 2024-10-08T21:33:07Z

So no barcode, no match with openfoodfacts?

diogotcorreia · 2024-10-08T21:38:39Z

@salvadorbs unfortunately yeah, there's no way to get the barcode now :/

Fanis10V

I was able to get receipts that didn't have discounts. I just started using this so can't tell for sure if everything else is working fine. Will continue testing. So far everything else looks good! Thanks!

Fanis10V · 2024-10-10T20:09:54Z

lidlplus/html_receipt.py

+                }
+            )
+        elif node.attrib["class"] == "discount":
+            discount = abs(parse_float(node.text.split()[-1]))


This throws an IndexError when I'm running it because some of the span elements contain just white text so the node.text.split() returns an empty list.

Here's the HTML on my receipt:

 Coupon Plus reward -0.69```

Man, they just can't be consistent... Thank you for providing another data point with which we can figure out all the formats they use for this! I'll try to implement the format you provided once I have time to actually do this though... You can always suggest changes and I'll be happy to use them of course.

For now I'm thinking a regex searching for something like -\d+[\.,]\d{2}$ would be best.

Btw shouldn't the code currently fail in parsing the first line's reward word as float instead of the whitespace-split-index-error that you're describing?

No rush to implement this. I would have done it myself but I was hesitant to suggest changes cause I'm still trying to understand what's happening. I will for sure once I get more familiar with the project. :)

I think because the class of the first line is discount ccs_bold instead of just discount it's not parsed at all.

So, does the HTML differ from country to country? Or does it depend on the coupon you use and whether it's a percentage/flat discount? That's the only receipt with a coupon I have so that's my only data point :/

I was hesitant to suggest changes cause I'm still trying to understand what's happening. I will for sure once I get more familiar with the project.

No worries :) This is not even that tied to this specific project, but more to the actual lidl API since AFAIK there's no public documentation for it and people just somehow reverse engineered it.

I think because the class of the first line is discount ccs_bold instead of just discount it's not parsed at all.

Right, of course, missed that.

So, does the HTML differ from country to country? Or does it depend on the coupon you use and whether it's a percentage/flat discount?

There's definitely some difference for some reason. See diogotcorreia/lidl-to-grocy's lidl/test/receipt.html. It uses I think the same format as what I saw and implemented in this PR. It's weird that your receipt is different, but we'll probably have to implement a common parsing for all possible formats. Currently I'm blocked on #23 though so I can't verify if anything changed recently in my receipts, but in my lidl-plus android app I don't see any discounts as bold as you probably would.

We could probably do something like this to support both formats:

if ...: ... elif {"discount", "css_bold"}.issubset(node.attrib["class"].split()) and try_parse_float(node.text): ... elif node.attrib["class"] == "discount": ...

bchhabra · 2024-10-16T15:09:28Z

@Kuba314 Does your receipt not have data- attributes? This is a sample receipt from my account (in SE): https://github.com/diogotcorreia/lidl-to-grocy/blob/a7856fb5d7369f827d627978bac3460ceab9e0fa/lidl/test/receipt.html I'm using those data attributes to parse whether an item is weighted, depending if the amount has a decimal separator or not (https://github.com/diogotcorreia/lidl-to-grocy/blob/a7856fb5d7369f827d627978bac3460ceab9e0fa/lidl/src/html_receipt.rs#L52). I think your strategy with detecting the B will not work for everyone since that's country-dependent (it's the VAT type AFAICT).

May I ask, what is the value of type in the v3 ticket response? I took a look inside the APK and there seem to be three possible values, HTML, HTML_OLD (something like that?) and NATIVE. My guess is that NATIVE would result in the same JSON as the v2 API (because again, in the APK that is still a field in the response), but I have HTML in my receipts (even in the ones that still work on v2).

Is there anyway to instruct the api to results only in json (NATIVE)?

diogotcorreia · 2024-10-16T15:11:06Z

Is there anyway to instruct the api to results only in json (NATIVE)?

@bchhabra Not that I could find

Kuba314 force-pushed the ticket-v3-html branch from 8419bd7 to 18af73a Compare September 22, 2024 23:55

Kuba314 marked this pull request as ready for review September 22, 2024 23:56

Convert ticket v3 HTML to JSON tickets

3f81806

Kuba314 force-pushed the ticket-v3-html branch from 18af73a to 3f81806 Compare September 24, 2024 10:48

Fanis10V reviewed Oct 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert ticket v3 HTML to JSON tickets #22

Convert ticket v3 HTML to JSON tickets #22

Kuba314 commented Sep 22, 2024

Kuba314 commented Sep 22, 2024

Kuba314 commented Sep 22, 2024

diogotcorreia commented Sep 23, 2024

Kuba314 commented Sep 23, 2024 •

edited

Loading

diogotcorreia commented Sep 23, 2024

Kuba314 commented Sep 24, 2024 •

edited

Loading

vilmosnagy commented Oct 1, 2024 •

edited

Loading

salvadorbs commented Oct 8, 2024

diogotcorreia commented Oct 8, 2024

Fanis10V left a comment

Fanis10V Oct 10, 2024

Kuba314 Oct 11, 2024

Fanis10V Oct 12, 2024

Kuba314 Oct 12, 2024

bchhabra commented Oct 16, 2024

diogotcorreia commented Oct 16, 2024

Convert ticket v3 HTML to JSON tickets #22

Are you sure you want to change the base?

Convert ticket v3 HTML to JSON tickets #22

Conversation

Kuba314 commented Sep 22, 2024

Kuba314 commented Sep 22, 2024

Kuba314 commented Sep 22, 2024

diogotcorreia commented Sep 23, 2024

Kuba314 commented Sep 23, 2024 • edited Loading

diogotcorreia commented Sep 23, 2024

Kuba314 commented Sep 24, 2024 • edited Loading

vilmosnagy commented Oct 1, 2024 • edited Loading

salvadorbs commented Oct 8, 2024

diogotcorreia commented Oct 8, 2024

Fanis10V left a comment

Choose a reason for hiding this comment

Fanis10V Oct 10, 2024

Choose a reason for hiding this comment

Kuba314 Oct 11, 2024

Choose a reason for hiding this comment

Fanis10V Oct 12, 2024

Choose a reason for hiding this comment

Kuba314 Oct 12, 2024

Choose a reason for hiding this comment

bchhabra commented Oct 16, 2024

diogotcorreia commented Oct 16, 2024

Kuba314 commented Sep 23, 2024 •

edited

Loading

Kuba314 commented Sep 24, 2024 •

edited

Loading

vilmosnagy commented Oct 1, 2024 •

edited

Loading