-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] Remove reshape
argument in predict
#10330
Conversation
Maybe we should consolidate them into a single parameter and pick a default? |
Thing is, they have rather different usages:
But I'm not sure what should be the ideal arguments to offer to control this sort of functionalities. |
Hmm, my personal preference is something that aligns with R and is consistent. Performance is great, but not more important than consistency. I don't know what are the best practices established by other frameworks. I would choose the default prediction to return a consistent shape as stated in https://github.com/dmlc/xgboost/blob/master/doc/prediction.rst , in a format that aligns with R (say, column-major). The type is a bit of a headache, maybe we use a matrix when possible, an array when the dimension is higher than 2? |
The object class for predictions with more than one entry per row should not be an issue, because all 2D arrays in R are both arrays and matrices: is.matrix(array(dim=c(2,3)))
class(matrix(0, nrow=2, ncol=3))
What I'd say is an issue is to return a 2D object as a 1D vector instead of giving it dimensions, which is the current situation in the
For the most part, other frameworks (including base R's |
We can remove it as well. Simplicity seems important.
If we can avoid doing that with this PR, we can always return a matrix with dims instead of a numeric vector. Thank you for sharing that matrix and array are not that different in R. I think that's good news for us, we can use it consistently and avoid other types. |
@trivialfis After a more thorough reading, I'm thinking the following could be a better alternative:
@mayer79 what are your thoughts on this? |
@david-cortes : I would support this, very good idea. With |
61d6e6a
to
ca74400
Compare
Redid the PR so as to let the C++ core library handle the shape, allowing the user to pass As part of the changes, I realized now most of the custom R prediction code is no longer needed so I removed it, alongside with removing the There's one thing that I wasn't very sure about though: there's a function |
reshape=TRUE
in predictionsreshape
argument in predict
Looks great! Not sure about One comment: Why do we use dimnames(arra) <- dim_names ? According to Hadley's book http://adv-r.had.co.nz/Functions.html:
In our case: Instead of if ((predcontrib || predinteraction) && !is.null(colnames(newdata))) {
cnames <- c(colnames(newdata), "(Intercept)")
dim_names <- vector(mode = "list", length = length(dim(arr)))
dim_names[[1L]] <- cnames
if (predinteraction) dim_names[[2L]] <- cnames
.Call(XGSetArrayDimNamesInplace_R, arr, dim_names)
} we could write if ((predcontrib || predinteraction) && !is.null(colnames(newdata))) {
rownames(arr) <- cnames <- c(colnames(newdata), "(Intercept)")
if (predinteraction)
colnames(arr) <- cnames
} |
@mayer79 That tutorial is outdated. |
Here is the new edition: https://adv-r.hadley.nz/functions.html According to that text, "replacement functions actually create a modified copy". Which shows that my comment was wrong! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the work on the prediction function. Could you please help rebase the PR?
Updated. |
ref #9810
By default, the
predict
method will return outputs in a vector having row-major order. This is quite unintuitive in R, since matrices and arrays follow column-major order, thus it's rather unlikely that a user would want to use the prediction output as-is if it comes as row-major.This PR changes the default instead to
reshape=TRUE
, which in the case of multi-column objectives, will return an R matrix with the dimensions that a user would expect.What's more, if a user were looking to avoid computational inefficiencies by avoiding the data transpose from row-major to column-major, xgboost also provides a
strict_shape
parameter which has that effect with a more clearly signaled output.I am nevertheless not sure that the current mix between
reshape
andstrict_shape
as mutually exclusive parameters is ideal - perhaps there could be a parameterstrict_shape
that would determine whether vectors are reshaped to matrices when they are 1D but have no effect for 2D and higher, and a parameterrows_as_first_dim
or similar that would control whether the output shape should have rows as the first or last dimension.