Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

as.triangle.data.frame has column gaps #93

Open
DVDVTAL opened this issue Jul 24, 2024 · 0 comments
Open

as.triangle.data.frame has column gaps #93

DVDVTAL opened this issue Jul 24, 2024 · 0 comments

Comments

@DVDVTAL
Copy link

DVDVTAL commented Jul 24, 2024

In the definition of as.triangle.data.frame, the triangle is aggregated by taking all of the unique origin and development values and aggregating these values. (Triangles.R, lines 83-85).

When dealing with long tails in smaller books, I have certain development periods do not have any new claims within them. This leads to the creation of triangles with missing columns.

This approach leads to inconsistencies with other functions within the ChainLadder package. Of particular note is the incr2cum function with na.rm = TRUE. In this context, the definition of upper is col(Triangle) <= ncol(Triangle) + 1 - row(Triangle). Here, it assumes that the column is equivalent to the development period and the number of columns is equivalent to the maximum number of development periods, however this assumption is not true in the current implementation. This leads to a gap between the boundary that the function expects and the boundary that a human would intuit by looking at the data in a spreadsheet and NAs being able to appear along the final diagonal (and then breaking other functions).

This issue also applies to the rows - periods with no claims will also lead to gaps, however the absence of certain periods are not easy to impute - especially when using months/quarters and the complete absence of claims in incident periods before the most recent one I believe would be far less common an issue and less important to address.

As a potential solution, a skeleton could be created containing all of the unique origin values, but the range of development periods inferred by the dataset and then have the aggregate data joined into it. Something like the following:

dev_range <- 1:max(Triangle[[dev]], na.rm = TRUE)
skeleton <- expand.grid(unique(triangle[[origin]]), dev_range, stringsAsFactors = FALSE)
colnames(skeleton) <- c(origin, dev)

aggTriangle <- merge(skeleton, aggTriangle[, c(origin, dev)], by = c(origin, dev), all.x = TRUE)

origin_names <- as.character(unique(aggTriangle[, origin]))
dev_names <-  as.character(dev_range)

(Unfortunately, my enterprise permissions prevent me from forking the repo to perform tests so I can't validate this.)

Such a change inherently assumes that the user is providing sequential values (1, 2, 3, etc.). This contrasts with what may potentially be a user implementation of providing month number at end of quarter (3, 6, 9, 12) where such users would receive a series of nil-development development periods with the associated warning messages. If such a change in functionality is undesired, then the documentation of triangles should be updated to specify the user requirements of the input data.frame - namely that every possible combination is represented within the data.frame.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant