-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Columns of same type mixed up during grouping, select and as #411
Comments
Fiddling around further with the example, it seems the grouped.collect()
// WrappedArray(("snakeoil Inc.", "Foo", 0L), ("ACME", "Bar", 42L))
grouped.select(
grouped('_2),
grouped('_1),
grouped('_3)
).collect()
// WrappedArray(("snakeoil Inc.", "Foo", 0L), ("ACME", "Bar", 42L)) |
@mfelsche did you find out what was causing it by any chance? |
Hi. I reproduce a similar bug with an even simpler case. Do you think that this can be related ? |
I've been seeing this issue as well when using scalapb-sparksql and a flow that uses encoders to create Datasets of both protobuf-derived types and normal scala case classes. When I define the following case class:
And then cast a dataframe with identical column names and types and map using a function which takes the above type as input
It will fail in that customerClientServiceFunction with the following error: I now realize that this is happening because one of the long columns is being shuffled with one of the string columns and there's an attempt to cast it to a string when it's being used in the function. I believe this is related, it just seems that sometimes even columns with different types can be shuffled too which leads to runtime errors |
I took a TypedDataset of case class A, grouped it, mixing the order of two columns of the same type, resulting in a tupled dataset. I had to do this way, don't ask. To get stuff right again, I selected the columns in the right order again, and finally used
.as[A]
again to get me back a nice TypedDataset of my typeA
.Expected Behaviour: Everything just as it has been, right columns ending up in the right place.
Actual behaviour: The mixed up columns weren't put in the right order by the select I issued at the end.
I suspect the quirk is somewhere within
.as[A]
but i cannot pinpoint it tbh.Here a small reproducer:
Output (compare the case classes in
data
above):The text was updated successfully, but these errors were encountered: