PEM Pipeline #417

studener · 2024-09-24T14:49:38Z

No description provided.

…PEM, fixed minor bugs in PipeOp...PEM

…risk scenarios in the future, formula is now passed via the form argument during pipeline creation

allow cloning of measures

bblodfon · 2025-03-18T10:10:10Z

DESCRIPTION

 LinkingTo:
    Rcpp
 Remotes:
    xoopR/distr6,
    xoopR/param6,
-    xoopR/set6
+    xoopR/set6,
+    mlr-org/mlr3,


remember to remove Remotes, mlr3learners new version will soon be on CRAN as well (mlr3extralearners is not on CRAN so its always the latest version from github)

bblodfon · 2025-03-18T10:14:46Z

R/PipeOpPredRegrSurvPEM.R

+#'  \deqn{S(t | \mathbf{x}) = \exp \left( - \int_{0}^{t} \lambda(s | \mathbf{x}) \, ds \right) = \exp \left( - \sum_{j = 1}^{J} \lambda(j | \mathbf{x}) d_j\,  \right),}
+#'  where \eqn{d_j} specifies the duration of interval \eqn{j}, 
+#'  
+#'  we compute the survival probability from the predicted hazards. 


Excellent! I would suggest 1) remove the time-dependency as we don't support it, ie x instead of x(t) 2) describe a bit g function? 3) add reference via the bibtex file => Andreas 2018 paper (A generalized additive model approach to time-to-event analysis)

bblodfon · 2025-03-18T10:20:51Z

R/PipeOpPredRegrSurvPEM.R

+
+      unique_end_times = sort(unique(data$tend))
+      # coerce to distribution and crank
+      pred_list = .surv_return(times = unique_end_times, surv = surv)


Task: I think this is the part that sometimes results in surv probabilities that are not descreasing, right?
Example:

task = tsk("lung") l = po("encode") %>>% lrn("regr.xgboost") |> as_learner() pem = ppl("survtoregr_PEM", learner = l) pem$train(task)$predict(task)

Can we please identify why that is happening? is it some sort of arithmetic instability thing? or some calculations above with the offset are wrong?

bblodfon · 2025-03-18T10:22:03Z

R/PipeOpTaskSurvRegrPEM.R

+#' time into multiple time intervals for each observation. The survival data set 
+#' stored in [TaskSurv] is transformed into Piece-wise Exponential Data (PED) format 
+#' which in turn forms the backend for [TaskRegr][mlr3::TaskRegr]. 
+#' This transformation creates a new target variable `PEM_status` that indicates


Please replace everywhere in all pipeops and pipelines: PEM_status => pem_status

bblodfon · 2025-03-18T10:24:52Z

R/PipeOpTaskSurvRegrPEM.R

+#' The target column is named `"PEM_status"` and indicates whether an event occurred
+#' in each time interval.
+#' An additional feature named `"tend"` contains the end time point of each interval.
+#' Lastly, the "output" task has an offset column `"offset"`.


more precisely: has a column with col_role offset which is the ... log of something?

bblodfon · 2025-03-18T10:25:27Z

R/PipeOpTaskSurvRegrPEM.R

+#' [TaskRegr][mlr3::TaskRegr].
+#' The target column is named `"PEM_status"` and indicates whether an event occurred
+#' in each time interval.
+#' An additional feature named `"tend"` contains the end time point of each interval.


...numeric feature... (please verify) => add this also in the DiscTime pipeop

bblodfon · 2025-03-18T10:26:08Z

R/PipeOpTaskSurvRegrPEM.R

+#'
+#' During prediction, the "input" [TaskSurv] is transformed to the "output"
+#' [TaskRegr][mlr3::TaskRegr] with `"PEM_status"` as target, while `"tend"`
+#' and `"offset"` are included as features.


mroe accurately: offset in not a feature, ie it doesn;'t havbe the col_role feature, but the offset one

bblodfon · 2025-03-18T10:29:11Z

R/PipeOpTaskSurvRegrPEM.R

+#'   unique(task_regr$data(cols = "tend"))[[1L]]
+#'
+#'   # train a regression learner
+#'   learner = lrn("regr.gam") # won't run unless learner can accept offset column role


TODO: when I finish the mlr3extralearners PR, we can safely remove this comment here.

Also correct the example + make it a bit more interesting, e.g. => l = lrn("regr.gam", formula = pem_status ~ s(age) + s(tend), family = "poisson") => you definitely need the family poisson argument here

bblodfon · 2025-03-18T10:30:03Z

R/PipeOpTaskSurvRegrPEM.R

+#'   # the end time points of the discrete time intervals
+#'   unique(task_regr$data(cols = "tend"))[[1L]]
+#'
+#'   # train a regression learner


... that supports poisson regression....

bblodfon · 2025-03-18T10:31:33Z

R/PipeOpTaskSurvRegrPEM.R

+        assert(max_time > data[get(event_var) == 1, min(get(time_var))],
+               "max_time must be greater than the minimum event time.")
+      }
+


removing redundant empty lines in all code would be nice - some space is good, more space is unnecessary

bblodfon · 2025-03-18T10:32:49Z

R/PipeOpTaskSurvRegrPEM.R

+      long_data[, id := ids]
+
+      task_PEM = TaskRegr$new(paste0(task$id, "_PEM"), long_data,
+                                  target = "PEM_status")


a bit more proper indentation style => target should be below new( <= here, please check all code for this

bblodfon · 2025-03-18T10:33:26Z

R/PipeOpTaskSurvRegrPEM.R

+      task_PEM = TaskRegr$new(paste0(task$id, "_PEM"), long_data,
+                                  target = "PEM_status")
+      task_PEM$set_col_roles("id", roles = "original_ids")
+      task_PEM$set_col_roles('offset', roles = "offset")


style: no ' anywhere in the code please, use only "!

bblodfon · 2025-03-18T10:33:54Z

R/PipeOpTaskSurvRegrPEM.R

+      status = data[[event_var]]
+      data[[event_var]] = 1
+
+


instead of extra space: a good informative comment!

bblodfon · 2025-03-18T10:34:35Z

R/aaa.R

@@ -51,7 +51,8 @@ register_reflections = function() {

  x$task_col_roles$surv = x$task_col_roles$regr
  x$task_col_roles$dens = c("feature", "target", "label", "order", "group", "weight", "stratum")
-  x$task_col_roles$classif = unique(c(x$task_col_roles$classif, "original_ids")) # for discrete time
+  x$task_col_roles$classif = unique(c(x$task_col_roles$classif, "original_ids"))# for discrete time
+  x$task_col_roles$regr = unique(c(x$task_col_roles$regr, "original_ids"))


# for pem

bblodfon · 2025-03-18T10:35:22Z

R/pipelines.R

+#' @param graph_learner `logical(1)`\cr
+#' If `TRUE` returns wraps the [Graph][mlr3pipelines::Graph] as a
+#' [GraphLearner][mlr3pipelines::GraphLearner] otherwise (default) returns as a `Graph`.
+#' @param rhs (`character(1)`)\cr


please remove! also in disc time pipeline!

bblodfon · 2025-03-18T10:36:07Z

R/pipelines.R

+#'
+#'   grlrn = ppl(
+#'     "survtoregr_PEM",
+#'     learner = lrn("regr.xgboost")


betetr example: show encoding of factor, maybe some modelmatrix trafo inside the learner? (consult Andreas)

bblodfon · 2025-03-18T10:37:15Z

R/pipelines.R

+                                           rhs = NULL, graph_learner = FALSE) {
+
+  assert_true("offset" %in% learner$properties)
+  assert_learner(learner, task_type = "regr")


combine the two: assert_learner() can check for properties too!

maybe disc time pipeline has the same?

bblodfon · 2025-03-18T10:39:50Z

R/pipelines.R

+    }
+  }
+
+  gr = mlr3pipelines::Graph$new()


you don;t need the mlr3pipelines:: as we import the package now, see other pipelines (so remove)

bblodfon · 2025-03-18T10:40:11Z

R/pipelines.R

+  gr$add_edge(src_id = "nop", dst_id = "trafopred_regrsurv_PEM", src_channel = "output", dst_channel = "transformed_data")
+
+
+  if (!is.null(rhs)) {


remove => also in disc time!

bblodfon · 2025-03-18T10:41:21Z

tests/testthat/test_PEM.R

@@ -0,0 +1,75 @@
+test_that("PipeOpTaskSurvRegrPEM", {


PEM => pem

bblodfon · 2025-03-18T10:42:01Z

tests/testthat/test_pipelines.R

+  task = tsk('rats')
+  # for this section, select only numeric covariates, 
+  # as 'regr.glmnet' does not automatically handle factor type variables
+  task$select(c('litter', 'rx'))


or po("encode")!

bblodfon · 2025-03-18T10:43:18Z

tests/testthat/test_pipelines.R

+  expect_class(grlrn, "GraphLearner")
+  suppressWarnings(grlrn$train(task))
+  p = grlrn$predict(task)
+  expect_prediction_surv(p)


check that ncol(p$data$distr) == 3? and exactly the specific cut points? (if I recall correctly that's the time points used)

bblodfon · 2025-03-18T10:45:58Z

tests/testthat/test_pipelines.R

+  p = grlrn$predict(task)
+  expect_prediction_surv(p)
+
+  # Test with rhs


refactor with modelamtrix as a pipeop (As rhs is removed)

maybe better with the gam when mlr3extarlearners PR is finished...

bblodfon · 2025-03-18T10:49:11Z

R/PipeOpTaskSurvRegrPEM.R

+
+  private = list(
+    .train = function(input) {
+      task = input[[1L]]


If you want to experiment and implement the validation stuff for xgboost, here is a bit of what is happening: task will have a predefined validation task here, which is not transformed. what we need to do is something like:

transformed_internal_valid_task = private$.train(list(task$internal_valid_task)) task$internal_valid_task = transformed_internal_valid_task and go on transforming the task

studener and others added 8 commits September 24, 2024 16:48

draft task conversion pipeop

6eb5a6e

draft pipeop + pipeline

4d49fa3

update pred conversion pipeop

d786d08

added modelmatrix pipeop to PEM pipeline, changed variable naming to …

b7bc0c6

…PEM, fixed minor bugs in PipeOp...PEM

added col_role original_ids to regression tasks

5d6b61b

changed id column role to original_ids

c19e4bb

added additional arguments to TaskSurvRegrPEM to enable more complex …

717478d

…risk scenarios in the future, formula is now passed via the form argument during pipeline creation

form is now to be passed without quotation marks

976d9c0

bblodfon mentioned this pull request Dec 6, 2024

feat: offset column role in Task mlr-org/mlr3#1225

Merged

markusgoeswein and others added 16 commits January 31, 2025 15:13

resolve merge conflict with main, before merging

35745f0

resolve merge conflict in R\piplines.R

e2a6c21

setting up unit tests for the PEM pipeline

5a75617

update function doc

a3bdbf5

update remotes for offset support

6bf7332

remove ped_formula argument in favour of automatically parsing it

690b5c6

add assert and set use_pred_offset to FALSE if not done so

f341679

Regenerate Rd files using devtools::document()

d684cd0

adjust assertions in pipeline

b14e678

Merge pull request #436 from mlr-org/main

f00f1f9

allow cloning of measures

setting up test_PEM.R and adjustments to tests in pipelines

7a848bf

change lrn() from regr.xgboost to regr.glmnet

2fc11a4

update DESCRIPTION

c035e9b

add glmnet to suggests for PEM pipeline tests

cb52f0b

included require_namespace('glmnet, ...) for PEM pipeline tests

4e509e4

minor fix

82b8af7

bblodfon reviewed Mar 18, 2025

View reviewed changes

R/PipeOpTaskSurvRegrPEM.R

status = data[[event_var]]

data[[event_var]] = 1

Copy link

Collaborator

bblodfon Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of extra space: a good informative comment!

bblodfon reviewed Mar 18, 2025

View reviewed changes

tests/testthat/test_PEM.R

@@ -0,0 +1,75 @@

test_that("PipeOpTaskSurvRegrPEM", {

Copy link

Collaborator

bblodfon Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEM => pem

bblodfon reviewed Mar 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PEM Pipeline #417

PEM Pipeline #417

studener commented Sep 24, 2024 •

edited by bblodfon

Loading

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025 •

edited

Loading

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025 •

edited

Loading

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025

bblodfon Mar 18, 2025

		gr$add_edge(src_id = "nop", dst_id = "trafopred_regrsurv_PEM", src_channel = "output", dst_channel = "transformed_data")


		if (!is.null(rhs)) {

PEM Pipeline #417

Are you sure you want to change the base?

PEM Pipeline #417

Conversation

studener commented Sep 24, 2024 • edited by bblodfon Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bblodfon Mar 18, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bblodfon Mar 18, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

studener commented Sep 24, 2024 •

edited by bblodfon

Loading

bblodfon Mar 18, 2025 •

edited

Loading

bblodfon Mar 18, 2025 •

edited

Loading