Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expression handling #1126

Merged
merged 22 commits into from
Mar 2, 2017
Merged

Expression handling #1126

merged 22 commits into from
Mar 2, 2017

Conversation

kerschke
Copy link
Contributor

This PR allows to finally use task dependent expressions in the learner (see #767).

@larskotthoff
Copy link
Member

Thanks Pascal.

  • Please document prominently what parameters are available. This should appear not only in the documentation for evaluateLearner, but also makeLearner, setHyperPars and anywhere else where people can set hyperparameters.
  • Why is the task itself in the dictionary?
  • In the unit tests, instead of checking the values for randomForest (which are specified somewhere else), could you please use a hand-constructed expression? Otherwise this test will fail for no good reason if the values in randomForest are changed.

@kerschke
Copy link
Contributor Author

kerschke commented Aug 12, 2016

Please document prominently what parameters are available. This should appear not only in the documentation for evaluateLearner, but also makeLearner, setHyperPars and anywhere else where people can set hyperparameters.

Ok, I'll do that :)

Why is the task itself in the dictionary?

That's a decision we've made with @mllg and @berndbischl. The main reason is that now the user can use any expression which takes any information from the task. I mean, you probably wouldn't want to do expression(task) but expression(task$...).

In the unit tests, instead of checking the values for randomForest (which are specified somewhere else), could you please use a hand-constructed expression? Otherwise this test will fail for no good reason if the values in randomForest are changed.

The tests already contain hand-constructed expressions, e.g. the rpart with hand-constructed hyperparameters.. The reason for having the randomForest in there, is actually to check whether it also works with the parameter setup that is defined from our side within the definition of the learner. And if those values are changed by someone of us, well then that person should also update the unit tests accordingly. Right?

@larskotthoff
Copy link
Member

larskotthoff commented Aug 12, 2016 via email

@kerschke
Copy link
Contributor Author

the documentation has been updated.. please review again and then merge ;)

@mllg
Copy link
Member

mllg commented Aug 15, 2016

Reviewed. Everything okay. @berndbischl -> ok to merge?

@kerschke
Copy link
Contributor Author

@berndbischl: ping..

@berndbischl
Copy link
Member

you do know that i am on holiday right?

@kerschke
Copy link
Contributor Author

yes, and I see your working on all other ends of the package during your holiday ;)

@berndbischl
Copy link
Member

evaluateParset should be called evaluateParamSet. We have to be consistent with the names

@kerschke
Copy link
Contributor Author

Then, we're mixing up the names from ParamHelpers (evaluateParamSet) and mlr (evaluateParSet). But if that's the only thing that you don't like, I can fix that tomorrow..

@berndbischl
Copy link
Member

@berndbischl @larskotthoff Last call

i am on holiday pls wait at least until monday.

@schiffner
Copy link
Contributor

Where can I find your tutorial stuff? Can you link it here?

In case that you haven't found it already: mlr-archive/mlr-tutorial#49

@larskotthoff
Copy link
Member

Looks good to me.

@mllg
Copy link
Member

mllg commented Sep 5, 2016

@berndbischl ping ping ping

@kerschke
Copy link
Contributor Author

@berndbischl what's your status on this one? i mean ping, ping, ping ;)

# Conflicts:
#	DESCRIPTION
#	R/Learner_properties.R
#	tests/testthat/test_base_measures.R
@jakob-r
Copy link
Member

jakob-r commented Feb 2, 2017

Test fail because of roxygen2 6.0.0

@jakob-r
Copy link
Member

jakob-r commented Feb 16, 2017

@berndbischl please review

@mllg
Copy link
Member

mllg commented Feb 22, 2017

@berndbischl ping

1 similar comment
@jakob-r
Copy link
Member

jakob-r commented Feb 23, 2017

@berndbischl ping

@@ -31,6 +31,10 @@
train = function(learner, task, subset, weights = NULL) {
learner = checkLearner(learner)
assertClass(task, classes = "Task")
if (hasExpression(learner)) {
dict = getTaskDictionary(task = task)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dict should somehow contain information about subset. The learner only sees the subset task, and probably expects the parameters to behave accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume that the task is already subsetted?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subsetting happens in trainLearner (e.g.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course. I remember... what I don't remember is the motivation behind this though.

@jakob-r
Copy link
Member

jakob-r commented Mar 2, 2017

All reviews were postive. @mb706 comment can be implemented later in another PR.

@jakob-r jakob-r closed this Mar 2, 2017
@jakob-r jakob-r reopened this Mar 2, 2017
@jakob-r jakob-r merged commit 416305d into master Mar 2, 2017
@berndbischl
Copy link
Member

berndbischl commented Mar 3, 2017

@jakob-r @mllg
can we please comment on:
a) why you merged something that apparently seems wrong? or please tell me why @mb706 is not correct.
if the the subsetting is done later, the the "n" symbol will we incorrect in the dict?

if thats the case, that creates invalid results.

b) this is a base change. who where the 2 core members who reviewed positively?

@berndbischl
Copy link
Member

@jakob-r
i am sorry, i wanted to avoid this, but if it takes 3 min (!) to check EXACTLY what @mb706 said in a test.

load_all()
task = makeClassifTask(data = iris, target = "Species")
dict = getTaskDictionary(task = task)
lrn1 = makeLearner("classif.rpart", minsplit = expression(n))
print(getHyperPars(lrn1))
m = train(lrn1, task)
print(m$learner.model$control$minsplit)
m = train(lrn1, task, subset = 1:100)
print(m$learner.model$control$minsplit # is 150! wrong

what do you think happens now to people who use that code, and generate results from it?

will revert now

@jakob-r
Copy link
Member

jakob-r commented Mar 3, 2017

n is documented to be the size of the Task and not of any subset.

R/evaluateParamExpressions.R

\item{\code{n}:} the number of observations in the task

In other words this example just assumes something that is never mentioned in the documentation.

@berndbischl
Copy link
Member

In other words this example just assumes something that is never mentioned in the documentation.

we can discuss this on hangout, but you cannot be serious

@berndbischl
Copy link
Member

maybe as reviewer it is also your obligation to think about whether the docs make sense? whether the semantics are good and proper?

do you want to argue they are, here? that the example i posted is OK? you would like to have that behavior in your own experiments? that the defaults in the other underlying packages work this way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants