Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tutorial pages for expression handling #49

Open
wants to merge 1 commit into
base: gh-pages
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ pages:
- 'Tuning': 'tune.md'
- 'Feature Selection': 'feature_selection.md'
- 'Nested Resampling': 'nested_resampling.md'
- 'Task Dependent Learners': 'task_dependent_learner.md'
- 'Cost-Sensitive Classification': 'cost_sensitive_classif.md'
- 'Imbalanced Classification Problems': 'over_and_undersampling.md'
- 'ROC Analysis': 'roc_analysis.md'
Expand Down
20 changes: 15 additions & 5 deletions src/learner.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -111,12 +111,21 @@ and defaults of a learning method without explicitly constructing it (by calling
getParamSet("classif.randomForest")
```

As one can see in the example above, the [ParamSet](&ParamHelpers::makeParamSet) of a
[Learner](&makeLearner) can also include [&Task]-dependent expressions, i.e., their values
differ according to their use case. For instance, the default value of the parameter `mtry`,
i.e., the number of randomly chosen variables per split increases, while the number of
features `p` increases, whereas the length of the parameters `classwt` and `cutoff` are
defined by the number of class labels `k`.

Further information on the pre-defined [&Task] dependent parameters can be found in the
section about [task dependent learners](task_dependent_learner.md).

## Modifying a learner

There are also some functions that enable you to change certain aspects
of a [Learner](&makeLearner) without needing to create a new [Learner](&makeLearner) from scratch.
Here are some examples.
of a [Learner](&makeLearner) without needing to create a new [Learner](&makeLearner) from
scratch. Here are some examples.

```{r}
## Change the ID
Expand All @@ -134,10 +143,11 @@ regr.lrn = removeHyperPars(regr.lrn, c("n.trees", "interaction.depth"))
```

## Listing learners
A list of all learners integrated in [%mlr] and their respective properties is shown in the [Appendix](integrated_learners.md).
A list of all learners integrated in [%mlr] and their respective properties is shown in the
[Appendix](integrated_learners.md).

If you would like a list of available learners, maybe only with certain properties or suitable for a
certain learning [&Task] use function [&listLearners].
If you would like a list of available learners, maybe only with certain properties or suitable
for a certain learning [&Task] use function [&listLearners].

```{r}
## List everything in mlr
Expand Down
76 changes: 76 additions & 0 deletions src/task_dependent_learner.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Task Dependent Learners

As shown in the [learner](learner.md) section, learners are allowed to contain expressions,
which will be evaluated right before the usage of the learner. Per default, [%mlr] comes
with a built-in dictionary of task dependent *keys* for the expressions (including the `task`
itself). For convenience, some of them can be accessed directly:

* `task`: the [&Task] itself (allowing to access any of its elements)
* `p`: number of features in the [&Task]
* `n`: number of observations in the [&Task]
* `type`: type of the [&Task] (`"classif"`, `"regr"`, `"surv"`, `"cluster"`, `"costcens"`,
`"multilabel"`)
* `k`: number of classes of the target variable (only available for classification tasks)

This way, one could for instance create a classification tree, whose minimum number of
observations within a node (`minsplit`) equals approximately `10%` of the number of available
observations (`n`).

```{r}
lrn = makeLearner("classif.rpart", minsplit = expression(round(0.1 * n)))
```

```{r, echo=FALSE}
lrn = removeHyperPars(lrn, "xval")
```

Let's have a look at the following example, which uses the `iris` and `Sonar` data sets to
create two classification tasks. These two tasks are then used to evaluate the task dependent
expression within `lrn` (by using `evaluateLearner`).

```{r}
lrn.iris = evaluateLearner(lrn, iris.task)
getHyperPars(lrn.iris)

lrn.sonar = evaluateLearner(lrn, sonar.task)
getHyperPars(lrn.sonar)
```

Similarly, a model that is based on a learner with task dependent expressions can be trained
as follows:

```{r}
lrn = makeLearner("classif.ksvm", sigma = expression(5 / n))
mod = train(task = iris.task, learner = lrn)
print(mod)
```

Note that in case of task dependent expressions, [%mlr] creates the built-in dictionary on
its own, based on the provided [&Task].


## Task dependent tuning
In the following example, we'll combine tuning, feature selection and task dependent learners.
To be more precise, we will run four iterations of a random feature selection using an SVM.
For each of those iterations, we try out two random configurations of the SVM's two parameters
`"sigma"` and `"C"`. Note that the lower and upper bound of the feasible values of `"sigma"`
depend on the number of features and therefore, these bounds can vary for each feature
selection step.

```{r}
## define the setup of the inner (tuning) loop
ps = makeParamSet(
makeNumericLearnerParam("sigma", lower = expression(0.2 * p), upper = expression(2.5 * p)),
makeDiscreteLearnerParam("C", values = 2^c(-1, 1)),
keys = "a"
)
inner.rdesc = makeResampleDesc("Subsample")
ctrl.tune = makeTuneControlRandom(maxit = 2L)
wrapped.lrn = makeTuneWrapper("classif.ksvm", par.set = ps, control = ctrl.tune,
resampling = inner.rdesc)

## define the setup of the outer (feature selection) loop
ctrl.sf = makeFeatSelControlRandom(maxit = 4L)
outer.rdesc = makeResampleDesc("Holdout")
sf = selectFeatures(learner = wrapped.lrn, task = sonar.task, resampling = outer.rdesc, control = ctrl.sf)
```