Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Metadata; internal to external data type conversion #360

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Allam76
Copy link

@Allam76 Allam76 commented Aug 25, 2022

Following out discussion in #359 and #358 here comes a possible implementation.
Note this is not intended to be a ready fix but rather an idea on what I have in mind.

I've added 2 new fields to the metadata.Column. Internal and external data type.
External is the one that is well, external and should be uniform between databases. The internal comes from the fact that most databases allow to write the same data type in many different format. VCHAR, VARCHAR, VARIABLE CHARACTER, etc all are the same type. This is the underlying type to reduce the number entries in the mapping.

It is implemented for only two databases (postgres and sqlite) that covers a bit of the spectrum of available databases.

I'm also agreeing that the metadata maybe should be extracted into a more general purpose package.

@nineinchnick
Copy link
Member

What is the purpose of having unified types? Would it be used only for display in the describe commands (\d) or for formatting values when presenting query results? This is kind of a slippery slope, because you'd have to be careful when mapping types to make sure they match, that is any conversion is not lossy. This is true even when only displaying them to users.

If there are equivalent types in a database, like PostgreSQL, maybe consider normalizing them inside a single metadata reader?

@Allam76
Copy link
Author

Allam76 commented Aug 25, 2022

I have two use cases:

  1. Federated SQL engine. This is an engine without own storage. So it uses other (several) databases underneath. Like one database to rule them all. Presto/Trino is an example in the enterprise sphere. This engine expects a metadata format and writing one mapping per individual database is tedious.

  2. Migration. I have an ancient database with 14 K tables to migrate with metadata in yet another database so total databases to handle is three. Then having unified metadata is a great plus.

I'll also have a look at your xo for the migration. This could potentially be helpful.

@nineinchnick
Copy link
Member

I'm a Trino contributor and I work for Starburst Data. I wrote a few Trino connectors so I have hands on experience with data type mapping.

You have not answered my questions why unifying types helps in those use cases. Right now it looks like you only need to read types. My concern is that you can't simply define type aliases, without also defining how to convert data to avoid loosing precision.

@Allam76
Copy link
Author

Allam76 commented Aug 26, 2022

Aha ok, I understand your concern now. I was simple assuming from my exploration that the rest of the data needed to avoid loosing precision is available as is. That is, only external data type was needed to be added.

If not, then indeed I have a bigger problem.

@nineinchnick
Copy link
Member

Maybe we could report the Go's scan type, but I have a very subjective feeling most drivers don't implement this at all. Since there are very few Go types, I'm not sure if that'll be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants