Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Log more expressive errors for missing child links #381

Merged
merged 4 commits into from
Feb 7, 2024

Conversation

eric-nguyen-cs
Copy link
Contributor

@eric-nguyen-cs eric-nguyen-cs commented Feb 1, 2024

What

  • Taxonomy parser now checks for valid child links and logs out the missing child links in the parsing errors

Screenshot

Screenshot 2024-02-01 at 17 45 58

Fixes bug(s)

@eric-nguyen-cs eric-nguyen-cs mentioned this pull request Feb 1, 2024
Copy link
Member

@alexgarel alexgarel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great ! Just a suggestion of an improvement that we can do.

Comment on lines 348 to 358
for lc, lc_child_links in lc_child_links_map.items():
# we collect all the tags_ids in a certain language
tags_ids = set()
for node in entry_nodes:
if "tags_ids_" + lc in node.tags:
tags_ids.update(node.tags["tags_ids_" + lc])

# we check if the parent_id exists in the tags_ids
for child_link in lc_child_links:
if child_link["parent_id"] not in tags_ids:
missing_child_links.append((lc, child_link))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good implementation, but it seems of a pity that we do not make the most of this computation.

If, instead of a set, tags_ids is a dict mapping tags_ids to their canonical id, it still can be used to check for existence, but we can also retrieve the real parent at the same cost.

This will simplify (and improve performance on) parent links creation in the graph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good idea!

Copy link
Member

@alexgarel alexgarel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect !

@alexgarel alexgarel merged commit 25f73dc into main Feb 7, 2024
8 checks passed
@alexgarel alexgarel deleted the ericn/log-more-expressive-errors-for-child-links branch February 7, 2024 09:57
@eric-nguyen-cs eric-nguyen-cs changed the title chore: Log more expressive errors for missing child links label:test Log more expressive errors for missing child links Mar 9, 2024
@eric-nguyen-cs eric-nguyen-cs changed the title label:test Log more expressive errors for missing child links chore: Log more expressive errors for missing child links Mar 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Fix parsing errors
2 participants