Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort only Edge fid so edge_id becomes more stable #1363

Merged
merged 3 commits into from
Apr 9, 2024
Merged

Sort only Edge fid so edge_id becomes more stable #1363

merged 3 commits into from
Apr 9, 2024

Conversation

visr
Copy link
Member

@visr visr commented Apr 8, 2024

Before we sorted the Edge table like this:

        sort_keys = [
            "from_node_type",
            "from_node_id",
            "to_node_type",
            "to_node_id",
        ]

This made it appear a bit more neat, though it served no other purpose.

The fid index was mostly an implementation detail that the user did not specify, and it went from 1:n following the sorting above. However fid becomes edge_id in the flow.arrow. Therefore when a users added a new edge, usually half of all edge_ids changed, making post-processing unnecessarily difficult. Therefore this PR removes this sorting, such that the input order is retained, keeping the old edge_ids stable.

With this I think we can close #1310. It is useful to have a single identifier value for an Edge, even though it is somewhat superfluous. With this PR it becomes stable, unless users start modifying fids themselves.

@visr visr changed the title Stop sorting Edge fid so edge_id becomes more stable Sort only Edge fid so edge_id becomes more stable Apr 9, 2024
@visr visr marked this pull request as ready for review April 9, 2024 14:41
@visr visr merged commit 35dfc2c into main Apr 9, 2024
24 checks passed
@visr visr deleted the flow branch April 9, 2024 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove column edge_id from flow.arrow schema
2 participants