-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change mixed type as string to have higher priority over JSON schema #16731
Change mixed type as string to have higher priority over JSON schema #16731
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update docs to explicitly clarify this behavior.
I need to spend some more time looking at this as it has not fixed the test cases that I have. I might have simplified the issue too much for the test case that I included when I filed the issue, or I might have lumped too many things under the same issue. I am also a little concerned about this code. I don't think I want mixed type as string to have higher priority compared to the schema that I passed in, but I need to do some more testing to really understand what is happening here and not just my speculation from the description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry this just does not work for the issues I am having right now.
std::map<std::string, cudf::io::schema_element> data_types; | ||
data_types.insert( | ||
std::pair{"data", cudf::io::schema_element{cudf::data_type{cudf::type_id::LIST}}}); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little confused by this test. The data type of the schema element is just LIST? but the test I posed in the bug #15260 (comment) has it as a LIST of STRING. I'm not really sure if that makes much of a difference, but it makes a big difference if I want a LIST<STRUCT<a:STRING,b:STRING>> compared to just a LIST
EXPECT_EQ(result.tbl->num_rows(), 2); | ||
EXPECT_EQ(result.tbl->get_column(0).type().id(), cudf::type_id::STRING); | ||
// expected output without whitespace | ||
cudf::test::strings_column_wrapper expected({R"({"A": 0, "B": 1})", "[1,0]"}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the result I want to come out of this. I want an LIST of strings to come out. Not a column of strings. I filed the issue because my users asked us (Spark) to return a LIST for the data column, but because the data types don't match that I don't want to get back a STRING column. I want to get back a column that has a LIST of STRINGS in it. So the first entry would be null because {"A":0, "B": 1} is not something that can be coerced into a LIST. The second entry would be a list with two STRING value in it "1" and "0" because each of them can be coerced into a string.
Issue is fixed with #16545 |
Description
Fixes #15260
Prefer mixed type as string than input schema in json reader, to fix the issue.
Checklist