-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change mixed type as string to have higher priority over JSON schema #16731
Changes from all commits
1802a8b
55c84d3
cdc417e
14298a4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2776,4 +2776,29 @@ TEST_F(JsonReaderTest, JSONMixedTypeChildren) | |
} | ||
} | ||
|
||
TEST_F(JsonReaderTest, MixedTypesWithSchema) | ||
{ | ||
std::string data = "{\"data\": {\"A\": 0, \"B\": 1}}\n{\"data\": [1,0]}\n"; | ||
|
||
std::map<std::string, cudf::io::schema_element> data_types; | ||
data_types.insert( | ||
std::pair{"data", cudf::io::schema_element{cudf::data_type{cudf::type_id::LIST}}}); | ||
|
||
cudf::io::json_reader_options in_options = | ||
cudf::io::json_reader_options::builder(cudf::io::source_info{data.data(), data.size()}) | ||
.dtypes(data_types) | ||
.recovery_mode(cudf::io::json_recovery_mode_t::RECOVER_WITH_NULL) | ||
.mixed_types_as_string(true) | ||
.keep_quotes(true) | ||
.lines(true); | ||
cudf::io::table_with_metadata result = cudf::io::read_json(in_options); | ||
|
||
EXPECT_EQ(result.tbl->num_columns(), 1); | ||
EXPECT_EQ(result.tbl->num_rows(), 2); | ||
EXPECT_EQ(result.tbl->get_column(0).type().id(), cudf::type_id::STRING); | ||
// expected output without whitespace | ||
cudf::test::strings_column_wrapper expected({R"({"A": 0, "B": 1})", "[1,0]"}); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not the result I want to come out of this. I want an LIST of strings to come out. Not a column of strings. I filed the issue because my users asked us (Spark) to return a LIST for the data column, but because the data types don't match that I don't want to get back a STRING column. I want to get back a column that has a LIST of STRINGS in it. So the first entry would be null because {"A":0, "B": 1} is not something that can be coerced into a LIST. The second entry would be a list with two STRING value in it "1" and "0" because each of them can be coerced into a string. |
||
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result.tbl->get_column(0)); | ||
} | ||
|
||
CUDF_TEST_PROGRAM_MAIN() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little confused by this test. The data type of the schema element is just LIST? but the test I posed in the bug #15260 (comment) has it as a LIST of STRING. I'm not really sure if that makes much of a difference, but it makes a big difference if I want a LIST<STRUCT<a:STRING,b:STRING>> compared to just a LIST