You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you are already running SDV, please indicate the following details about the environment in
which you are running it:
SDV version:1.17.3
Python version:3.9
Operating System: MacOS
Problem description
I am reading a parquet file in panda dataframe and using Metadata.detect_from_dataframe to detect the metadata. The dataframe has multiple fields those have an array of another subelements. This is more of denormalized dataset. Will SDV work for this kind of structure of , it is important that my dataset has to be totally flatout (normalized)
To explain better, heres is the example of schema,
Hi @jaysara, nice to meet you. It would be very helpful if you are able to share what a few rows of your data look like, just as an example (you can redact any private info, but it would be helpful to see the format).
In the absence of any examples, I am assuming here that each of the Fields you are specifying represent different columns of your data? If so, then your understanding is correct -- SDV will not accept columns whose values contain arrays, dictionaries, etc. The data should be in a flat structure so that each column would contain a simple value such as a string, a number, or a datetime.
Are you able to modify your data to be in such a format? Perhaps you can expand out Fields 1-6 so that they are each separate columns?
Environment details
If you are already running SDV, please indicate the following details about the environment in
which you are running it:
Problem description
I am reading a parquet file in panda dataframe and using Metadata.detect_from_dataframe to detect the metadata. The dataframe has multiple fields those have an array of another subelements. This is more of denormalized dataset. Will SDV work for this kind of structure of , it is important that my dataset has to be totally flatout (normalized)
To explain better, heres is the example of schema,
Field1 : String (id)
Field2: String (Category)
Field3 : Array[] of [{Field3, Field4, Field5},{Field3, Field4, Field5},{Field3, Field4, Field5}]
Field 6: String (Category)
What I already tried
I tried using the Metadata API on this complext dataset, I got following error,
The text was updated successfully, but these errors were encountered: