Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PFB schema is inconsistent for non-schema AnVIL tables #6678

Open
nadove-ucsc opened this issue Nov 1, 2024 · 2 comments
Open

PFB schema is inconsistent for non-schema AnVIL tables #6678

nadove-ucsc opened this issue Nov 1, 2024 · 2 comments
Labels
- [priority] Medium bug [type] A defect preventing use of the system as specified manifests [subject] Generation and contents of manifests orange [process] Done by the Azul team

Comments

@nadove-ucsc
Copy link
Contributor

nadove-ucsc commented Nov 1, 2024

We derive the PFB from the AnVIL schema where possible, but for non-schema tables, we fall back to our old approach of building the schema dynamically based on the observed shape of the replicas' contents. These dynamic schemas may differ from one manifest to the next if the replicas in one manifest exhibit shapes not observed in the other.

Since the replica shapes are constrained by the BigQuery table schema, I expect that the only place this will be observable is with nullable columns. A BigQuery column with type NULLABLE STRING may manifest in the PFB schema with the type null, [null, string], or string, depending on whether all/some/none of the values for that column are NULL within a given PFB manifest.

@nadove-ucsc nadove-ucsc added the orange [process] Done by the Azul team label Nov 1, 2024
@nadove-ucsc
Copy link
Contributor Author

nadove-ucsc commented Nov 4, 2024

[edit, @hannes-ucsc, moved to description]

@nadove-ucsc
Copy link
Contributor Author

Related, not a dupe: #6270

@nadove-ucsc nadove-ucsc removed their assignment Nov 4, 2024
@hannes-ucsc hannes-ucsc added bug [type] A defect preventing use of the system as specified manifests [subject] Generation and contents of manifests - [priority] Medium labels Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
- [priority] Medium bug [type] A defect preventing use of the system as specified manifests [subject] Generation and contents of manifests orange [process] Done by the Azul team
Projects
None yet
Development

No branches or pull requests

2 participants