MackChainLadder should return the same result if Triangle is passed through a pipe #57

msenn · 2018-08-10T07:42:26Z

Problem

The value returned by MackChainLadder() depends on whether Triangle is passed directly (i.e. as a function argument) or using magrittr's pipe operator (%>%):

library(ChainLadder)
library(magrittr)

# Pass Triangle directly
mcl <- MackChainLadder(RAA)

# Pipe Triangle
mcl_piped <- RAA %>% 
  MackChainLadder()

identical(mcl, mcl_piped)         # Returns FALSE

Further information

Differences are in elements "call" and "Model":

idx.diff <- which(vapply(
  seq_along(mcl),
  function(i) !identical(mcl[[i]], mcl_piped[[i]]),
  logical(1))
)

names(mcl)[idx.diff]

Arguably, the only difference is in the original name of the Triangle object. This difference may look minor and cosmetic. However, it will create confusion to anybody trying verify that two pieces of code lead to the same outcome. Also, pipes are so prevalent these days that they shouldn't be ignored.

System info

I am using the current GitHub version of ChainLadder. Here's my sessionInfo():

R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.5 (Maipo)

Matrix products: default
BLAS: /opt/R/3.5.0/lib64/R/lib/libRblas.so
LAPACK: /opt/R/3.5.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] magrittr_1.5      ChainLadder_0.2.6

loaded via a namespace (and not attached):
 [1] biglm_0.9-1       statmod_1.4.30    zoo_1.8-2         tidyselect_0.2.4  purrr_0.2.5      
 [6] reshape2_1.4.3    splines_3.5.0     haven_1.1.1       lattice_0.20-35   carData_3.0-1    
[11] colorspace_1.3-2  stats4_3.5.0      yaml_2.1.19       rlang_0.2.1       pillar_1.2.3     
[16] foreign_0.8-70    glue_1.3.0        tweedie_2.3.2     readxl_1.1.0      bindrcpp_0.2.2   
[21] bindr_0.1.1       plyr_1.8.4        stringr_1.3.1     munsell_0.5.0     cplm_0.7-7       
[26] gtable_0.2.0      cellranger_1.1.0  zip_1.0.0         expint_0.1-4      coda_0.19-1      
[31] systemfit_1.1-22  rio_0.5.10        forcats_0.3.0     lmtest_0.9-36     curl_3.2         
[36] Rcpp_0.12.17      scales_0.5.0      abind_1.4-5       ggplot2_3.0.0     stringi_1.2.3    
[41] openxlsx_4.1.0    dplyr_0.7.6       grid_3.5.0        tools_3.5.0       sandwich_2.4-0   
[46] lazyeval_0.2.1    tibble_1.4.2      car_3.0-0         pkgconfig_2.0.1   MASS_7.3-50      
[51] Matrix_1.2-14     data.table_1.11.4 actuar_2.3-1      assertthat_0.2.0  minqa_1.2.4      
[56] R6_2.2.2          nlme_3.1-137      compiler_3.5.0

The text was updated successfully, but these errors were encountered:

mages · 2018-10-03T15:40:05Z

The same is true for other functions like lm:

m_piped <- data.frame(x=1:10,  y=1:10) %>% lm
m <- lm(y~x, data=data.frame(x=1:10,  y=1:10))
identical(m , m_piped)
FALSE

How do you deal with those situations?

ryanbthomas · 2019-01-16T22:45:31Z

From what I can tell the only differences are based on the call. I've modified the example to make it more clear. If you look at the differences in the original example they are all about the formula and call.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
m_piped <- data.frame(x=1:10,  y=1:10) %>% lm(formula = y ~ x)
m <- lm(y~x, data=data.frame(x=1:10,  y=1:10))
all.equal(m, m_piped)
#> [1] "Component \"call\": target, current do not match when deparsed"

^{Created on 2019-01-16 by the reprex package (v0.2.1)}

trinostics · 2019-01-17T14:21:33Z

The two objects were created differently -- with different calls:

m_piped$call

lm(formula = y ~ x, data = .)

m$call

lm(formula = y ~ x, data = data.frame(x = 1:10, y = 1:10))

Is your concern the loss of information regarding the source of 'data' in m_piped? "data = ." is a common idiom in the tidyverse. If that source is important to you -- and I can see why it would be -- then I suggest avoiding piping. Otherwise, I am happy that is the only difference in the two objects. Maybe someone in a tidyverse list can help with the "lm(formula = y ~ x, data = .)" issue. Thank you for your interest in ChainLadder! Dan

…

On Wed, Jan 16, 2019 at 2:45 PM Ryan Thomas ***@***.***> wrote: From what I can tell the only differences are based on the call. I've modified the example to make it more clear. If you look at the differences in the original example they are all about the formula and call. library(dplyr)#> #> Attaching package: 'dplyr'#> The following objects are masked from 'package:stats':#> #> filter, lag#> The following objects are masked from 'package:base':#> #> intersect, setdiff, setequal, unionm_piped <- data.frame(x=1:10, y=1:10) %>% lm(formula = y ~ x)m <- lm(y~x, data=data.frame(x=1:10, y=1:10)) all.equal(m, m_piped)#> [1] "Component \"call\": target, current do not match when deparsed" Created on 2019-01-16 by the reprex package <https://reprex.tidyverse.org> (v0.2.1) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#57 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGKcB0fqcY5zAop6XYdOS7Imq-jfp2E1ks5vD6uLgaJpZM4V3vBL> .

msenn · 2019-01-22T07:20:08Z

From what I can tell the only differences are based on the call

That's what I meant by "the only difference is in the original name of the Triangle object". Apologies for being unclear.

As for my concern: This behavior got me when I wrote unit tests for a function that uses MackChainLadder(). The tests would fail when using the pipe but not otherwise. The reason turned out to be the call object.

I have no strong opinion what to do about this. By having 'call' in the return value, MackChainLadder() is in line with lm() as demonstrated by @mages. However, its output varies slightly if passed directly versus piped.

Obviously, there are wars fought over the merits and drawbacks of the pipe and we should probably not repeat this here. Therefore, feel free to close the issue if you conclude that consistency over time and with lm() weights heavier than consistency if piped.

trinostics · 2019-01-22T14:53:32Z

I believe this is a *feature*, not an issue, of the piping paradigm. E.g., if the formula had not been omitted in the toy example, the call of the result would have been different still:

# original toy example m_piped <- data.frame(x=1:10, y=1:10) %>% lm m_piped$call

lm(formula = .)

# toy example including formula m_piped <- data.frame(x=1:10, y=1:10) %>% lm(y~x, data = .) m_piped$call

lm(formula = y ~ x, data = .)

# toy example including formula and another default argument value m_piped <- data.frame(x=1:10, y=1:10) %>% lm(x~y, data = ., model = TRUE) m_piped$call

lm(formula = x ~ y, data = ., model = TRUE) These example results are supported by the following technical note at the magrittr site (https://magrittr.tidyverse.org/reference/pipe.html): “For most purposes, one can disregard the subtle aspects of magrittr's evaluation, but some functions may capture their calling environment, and thus using the operators will not be exactly equivalent to the "standard call" without pipe-operators.” From: msenn <[email protected]> Sent: Monday, January 21, 2019 11:20 PM To: mages/ChainLadder <[email protected]> Cc: Dan Murphy <[email protected]>; Comment <[email protected]> Subject: Re: [mages/ChainLadder] MackChainLadder should return the same result if Triangle is passed through a pipe (#57) From what I can tell the only differences are based on the call That's what I meant by "the only difference is in the original name of the Triangle object". Apologies for being unclear. As for my concern: This behavior got me when I wrote unit tests for a function that uses MackChainLadder(). The tests would fail when using the pipe but not otherwise. The reason turned out to be the call object. I have no strong opinion what to do about this. By having 'call' in the return value, MackChainLadder() is in line with lm() as demonstrated by @mages <https://github.com/mages> . However, its output varies slightly if passed directly versus piped. Obviously, there are wars fought over the merits and drawbacks of the pipe and we should probably not repeat this here. Therefore, feel free to close the issue if you conclude that consistency over time and with lm() weights heavier than consistency if piped. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#57 (comment)> , or mute the thread <https://github.com/notifications/unsubscribe-auth/AGKcB0hr92hr85pcshgDmfHOO7-lVxO3ks5vFruogaJpZM4V3vBL> . <https://github.com/notifications/beacon/AGKcB4R3_HIz5K26d6VaW7DGCg1uArGYks5vFruogaJpZM4V3vBL.gif>

chiefmurph · 2019-01-23T06:27:44Z

I have two final comments:

I think this topic belongs on a tidyverse or magrittr mailing list. I would encourage the OP to take it there, and please let this thread know so we can follow the discussion there. I say that because ...
When I was working on R code for Mack (and extensions) before I met Markus, I got deep into lm's entrails. In particular, I went to great lengths to construct the function calls so that, in the end, it would be clear what data was being analyzed and what lm levers were being pulled at each step of the development process. Maybe the user's own variable names could be stored for later regurgitation as needed for clarification and communication. Alas, that was overly ambitious at that time. Ten years later, perhaps no longer so, and the magrittr approach may be flexible enough to implement such transparency.

Thanks for raising this issue, and thanks again for your interest in ChainLadder.

kennedymwavu · 2022-08-29T12:09:34Z

Leaving this here in case it might be of help to someone else.

I use all.equal() (and testthat::expect_equal()) to check for "identity" of MackChainLadder() output.

identical() expects the environment in which the calls are evaluated to also be exactly equal, so it might not be the best way to check for equality of MCL output.

suppressPackageStartupMessages(library(ChainLadder))

set.seed(1024)
mcl <- MackChainLadder(RAA)

set.seed(1024)
mcl2 <- MackChainLadder(RAA)

identical(mcl, mcl2)
#> [1] FALSE

# TL;DR:
# Difference is in model terms attribute '.Environment'. I suppose that's 
# the environment in which the calls are evaluated in. Nothing to worry about, 
# if you ask me.

# Explanation:

# which elements aren't identical:
for (nm in names(mcl)) {
  if (!identical(mcl[[nm]], mcl2[[nm]])) {
    print(nm)
  }
}
#> [1] "Models"

# The 'Models' are a bunch of calls and coefficients. Let's work with the first
# item in their list:
a <- mcl$Models[[1]]
b <- mcl2$Models[[1]]

# Which elements are different:
for (nm in names(a)) {
  if (!identical(a[[nm]], b[[nm]])) {
    print(nm)
  }
}
#> [1] "terms"
#> [1] "model"

a_terms <- a[['terms']]
b_terms <- b[['terms']]

# check which attributes aren't identical:
for (att in names(attributes(a_terms))) {
  if (!identical(attr(a_terms, which = att), attr(b_terms, which = att))) {
    print(att)
  }
}
#> [1] ".Environment"

# attr '.Environment'

a_model <- a[['model']]
b_model <- b[['model']]

# Again for the models, only the environment attribute is different since the 
# columns are identical:
for (nm in names(a_model)) {
  if (!identical(a_model[[nm]], b_model[[nm]])) {
    print(nm)
  }
}

# checking the attributes:
for (att in names(attributes(a_model))) {
  if (!identical(attr(a_model, which = att), attr(b_model, which = att))) {
    print(att)
  }
}
#> [1] "terms"

# The 'terms' are the same as we had seen before: 
identical(a[['terms']], attr(a_model, which = 'terms'))
#> [1] TRUE

# Meaning only the '.Environment' attribute is different.

^{Created on 2022-08-29 with reprex v2.0.2}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MackChainLadder should return the same result if Triangle is passed through a pipe #57

MackChainLadder should return the same result if Triangle is passed through a pipe #57

msenn commented Aug 10, 2018

mages commented Oct 3, 2018 •

edited

Loading

ryanbthomas commented Jan 16, 2019

trinostics commented Jan 17, 2019 via email

msenn commented Jan 22, 2019

trinostics commented Jan 22, 2019 via email

chiefmurph commented Jan 23, 2019

kennedymwavu commented Aug 29, 2022 •

edited

Loading

MackChainLadder should return the same result if Triangle is passed through a pipe #57

MackChainLadder should return the same result if Triangle is passed through a pipe #57

Comments

msenn commented Aug 10, 2018

Problem

Further information

System info

mages commented Oct 3, 2018 • edited Loading

ryanbthomas commented Jan 16, 2019

trinostics commented Jan 17, 2019 via email

msenn commented Jan 22, 2019

trinostics commented Jan 22, 2019 via email

chiefmurph commented Jan 23, 2019

kennedymwavu commented Aug 29, 2022 • edited Loading

mages commented Oct 3, 2018 •

edited

Loading

kennedymwavu commented Aug 29, 2022 •

edited

Loading