Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MackChainLadder should return the same result if Triangle is passed through a pipe #57

Open
msenn opened this issue Aug 10, 2018 · 7 comments

Comments

@msenn
Copy link

msenn commented Aug 10, 2018

Problem

The value returned by MackChainLadder() depends on whether Triangle is passed directly (i.e. as a function argument) or using magrittr's pipe operator (%>%):

library(ChainLadder)
library(magrittr)

# Pass Triangle directly
mcl <- MackChainLadder(RAA)

# Pipe Triangle
mcl_piped <- RAA %>% 
  MackChainLadder()

identical(mcl, mcl_piped)         # Returns FALSE

Further information

Differences are in elements "call" and "Model":

idx.diff <- which(vapply(
  seq_along(mcl),
  function(i) !identical(mcl[[i]], mcl_piped[[i]]),
  logical(1))
)

names(mcl)[idx.diff]

Arguably, the only difference is in the original name of the Triangle object. This difference may look minor and cosmetic. However, it will create confusion to anybody trying verify that two pieces of code lead to the same outcome. Also, pipes are so prevalent these days that they shouldn't be ignored.

System info

I am using the current GitHub version of ChainLadder. Here's my sessionInfo():

R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.5 (Maipo)

Matrix products: default
BLAS: /opt/R/3.5.0/lib64/R/lib/libRblas.so
LAPACK: /opt/R/3.5.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] magrittr_1.5      ChainLadder_0.2.6

loaded via a namespace (and not attached):
 [1] biglm_0.9-1       statmod_1.4.30    zoo_1.8-2         tidyselect_0.2.4  purrr_0.2.5      
 [6] reshape2_1.4.3    splines_3.5.0     haven_1.1.1       lattice_0.20-35   carData_3.0-1    
[11] colorspace_1.3-2  stats4_3.5.0      yaml_2.1.19       rlang_0.2.1       pillar_1.2.3     
[16] foreign_0.8-70    glue_1.3.0        tweedie_2.3.2     readxl_1.1.0      bindrcpp_0.2.2   
[21] bindr_0.1.1       plyr_1.8.4        stringr_1.3.1     munsell_0.5.0     cplm_0.7-7       
[26] gtable_0.2.0      cellranger_1.1.0  zip_1.0.0         expint_0.1-4      coda_0.19-1      
[31] systemfit_1.1-22  rio_0.5.10        forcats_0.3.0     lmtest_0.9-36     curl_3.2         
[36] Rcpp_0.12.17      scales_0.5.0      abind_1.4-5       ggplot2_3.0.0     stringi_1.2.3    
[41] openxlsx_4.1.0    dplyr_0.7.6       grid_3.5.0        tools_3.5.0       sandwich_2.4-0   
[46] lazyeval_0.2.1    tibble_1.4.2      car_3.0-0         pkgconfig_2.0.1   MASS_7.3-50      
[51] Matrix_1.2-14     data.table_1.11.4 actuar_2.3-1      assertthat_0.2.0  minqa_1.2.4      
[56] R6_2.2.2          nlme_3.1-137      compiler_3.5.0   
@mages
Copy link
Owner

mages commented Oct 3, 2018

The same is true for other functions like lm:

m_piped <- data.frame(x=1:10,  y=1:10) %>% lm
m <- lm(y~x, data=data.frame(x=1:10,  y=1:10))
identical(m , m_piped)
FALSE

How do you deal with those situations?

@ryanbthomas
Copy link

From what I can tell the only differences are based on the call. I've modified the example to make it more clear. If you look at the differences in the original example they are all about the formula and call.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
m_piped <- data.frame(x=1:10,  y=1:10) %>% lm(formula = y ~ x)
m <- lm(y~x, data=data.frame(x=1:10,  y=1:10))
all.equal(m, m_piped)
#> [1] "Component \"call\": target, current do not match when deparsed"

Created on 2019-01-16 by the reprex package (v0.2.1)

@trinostics
Copy link
Collaborator

trinostics commented Jan 17, 2019 via email

@msenn
Copy link
Author

msenn commented Jan 22, 2019

From what I can tell the only differences are based on the call

That's what I meant by "the only difference is in the original name of the Triangle object". Apologies for being unclear.

As for my concern: This behavior got me when I wrote unit tests for a function that uses MackChainLadder(). The tests would fail when using the pipe but not otherwise. The reason turned out to be the call object.

I have no strong opinion what to do about this. By having 'call' in the return value, MackChainLadder() is in line with lm() as demonstrated by @mages. However, its output varies slightly if passed directly versus piped.

Obviously, there are wars fought over the merits and drawbacks of the pipe and we should probably not repeat this here. Therefore, feel free to close the issue if you conclude that consistency over time and with lm() weights heavier than consistency if piped.

@trinostics
Copy link
Collaborator

trinostics commented Jan 22, 2019 via email

@chiefmurph
Copy link
Contributor

I have two final comments:

  1. I think this topic belongs on a tidyverse or magrittr mailing list. I would encourage the OP to take it there, and please let this thread know so we can follow the discussion there. I say that because ...
  2. When I was working on R code for Mack (and extensions) before I met Markus, I got deep into lm's entrails. In particular, I went to great lengths to construct the function calls so that, in the end, it would be clear what data was being analyzed and what lm levers were being pulled at each step of the development process. Maybe the user's own variable names could be stored for later regurgitation as needed for clarification and communication. Alas, that was overly ambitious at that time. Ten years later, perhaps no longer so, and the magrittr approach may be flexible enough to implement such transparency.

Thanks for raising this issue, and thanks again for your interest in ChainLadder.

@kennedymwavu
Copy link

kennedymwavu commented Aug 29, 2022

Leaving this here in case it might be of help to someone else.

I use all.equal() (and testthat::expect_equal()) to check for "identity" of MackChainLadder() output.

identical() expects the environment in which the calls are evaluated to also be exactly equal, so it might not be the best way to check for equality of MCL output.

suppressPackageStartupMessages(library(ChainLadder))

set.seed(1024)
mcl <- MackChainLadder(RAA)

set.seed(1024)
mcl2 <- MackChainLadder(RAA)

identical(mcl, mcl2)
#> [1] FALSE

# TL;DR:
# Difference is in model terms attribute '.Environment'. I suppose that's 
# the environment in which the calls are evaluated in. Nothing to worry about, 
# if you ask me.

# Explanation:

# which elements aren't identical:
for (nm in names(mcl)) {
  if (!identical(mcl[[nm]], mcl2[[nm]])) {
    print(nm)
  }
}
#> [1] "Models"

# The 'Models' are a bunch of calls and coefficients. Let's work with the first
# item in their list:
a <- mcl$Models[[1]]
b <- mcl2$Models[[1]]

# Which elements are different:
for (nm in names(a)) {
  if (!identical(a[[nm]], b[[nm]])) {
    print(nm)
  }
}
#> [1] "terms"
#> [1] "model"

a_terms <- a[['terms']]
b_terms <- b[['terms']]

# check which attributes aren't identical:
for (att in names(attributes(a_terms))) {
  if (!identical(attr(a_terms, which = att), attr(b_terms, which = att))) {
    print(att)
  }
}
#> [1] ".Environment"

# attr '.Environment'

a_model <- a[['model']]
b_model <- b[['model']]

# Again for the models, only the environment attribute is different since the 
# columns are identical:
for (nm in names(a_model)) {
  if (!identical(a_model[[nm]], b_model[[nm]])) {
    print(nm)
  }
}

# checking the attributes:
for (att in names(attributes(a_model))) {
  if (!identical(attr(a_model, which = att), attr(b_model, which = att))) {
    print(att)
  }
}
#> [1] "terms"

# The 'terms' are the same as we had seen before: 
identical(a[['terms']], attr(a_model, which = 'terms'))
#> [1] TRUE

# Meaning only the '.Environment' attribute is different.

Created on 2022-08-29 with reprex v2.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants