Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified Model Relative Influences #72

Open
nffarabaugh opened this issue Mar 23, 2022 · 10 comments
Open

Simplified Model Relative Influences #72

nffarabaugh opened this issue Mar 23, 2022 · 10 comments

Comments

@nffarabaugh
Copy link

Is there a way to access the relative influence information for the parameters that are included in the simplified model? It does not appear in the report CSV. Similarly is there a way to autogenerate plots for these?

Cheers!

@SimonDedman
Copy link
Owner

mate please could you send me your run script and your data (or a representative chunk so it'll run)? Thanks.

@SimonDedman
Copy link
Owner

gbm.auto: report around L1036 populated by Bin_Bars$var,
from L858: summary(get(Bin_Best_Model)
from L686 can be bin best simp if worthy.
So 858 should populate 1036 with simp thus make simplified bars, and simplified best vars / rel info Report entries.

@nffarabaugh
Copy link
Author

nffarabaugh commented Mar 29, 2022 via email

@SimonDedman
Copy link
Owner

I just tried the first run with only tc 1, lr 0.01 and bf 0.5. Best combo was the unsimplified version, so even though Report.csv lists the simp predictors dropped and kept, if the best simp run doesn't lower the deviance, it won't outcompete the existing best unsimplified BRT run, so the relative influence values for the simp model aren't included because they're not relevant.

You can tell which model was chosen as best under "Best Gaussian BRT"; if this doesn't end in "_simp" then there's no issue. Please let me know tc lr bf combos for examples where this isn't the case, and the simp run wins but its best variables & their relative influence scores aren't produced correctly.

One confusing element is that the simp_dops_gaus.jpeg has a negative change in predictuve deviance for the removal of 1, 2, 6, 7, & 8 variables, with 8 being the greatest reduction. Intuitively this would mean the one with 8 dropped variables was better than the 'parent' combo with all variables retained, but actually simp is only selected if it's self.statistics$correlation score is better, aka training data correlation.

LMK how you get on, and please close this if this answers everything. Cheers!

@nffarabaugh
Copy link
Author

Thanks I think this was an error in my understanding. So far none of my models have simp as the "best" model. I was confused because of the negative change in the predictive deviance (simp_dops_gaus.jpeg). Thanks for the help!

@nffarabaugh
Copy link
Author

Hello, seems this is an issue even when the best model is a simplified model. I have attached a the generated report and code below.
Report_carangidae.csv
Self_CV_Statistics.csv
gbm.auto(
grids = NULL,
samples = wide.df1 %>% filter(site != "Nuka Hiva"),
expvar = c("temp", "ave_npp", "depth", "visibility", "topo", "pop.dens", "bait", "time.no.bait", "isl_grp", "Season", "lagoon.size"), # fix to final variables
resvar = "carangidae_maxN_a",
tc = c(5), # add combos you want to see for initial runs and it will try each. doens't run the whole gambit like the loops do
lr = c(0.0005),
bf = c(0.55),
n.trees = 50,
ZI = "CHECK",
fam1 = c("bernoulli", "binomial", "poisson", "laplace", "gaussian"),
fam2 = c("poisson"), #
simp = TRUE, # Change to true
gridslat = 2,
gridslon = 1,
multiplot = TRUE,
cols = grey.colors(1, 1, 1),
linesfiles = TRUE, # change to true for final run
smooth = TRUE,
savedir = "~/Documents/My Documents/FinPrint French Poly/Analysis/DataExploration_03_2022/Teleosts",
savegbm = TRUE, # change to true for final runs
loadgbm = NULL,
varint = TRUE,
map = TRUE,
shape = NULL,
RSB = TRUE,
BnW = TRUE,
alerts = TRUE, # this is the noise alerts
pngtype = c("quartz"), # quartz for mac this one for windows : "cairo-png"
gaus = TRUE,
MLEvaluate = TRUE,
brv = NULL,
grv = NULL,
Bin_Preds = NULL,
Gaus_Preds = NULL)

@nffarabaugh
Copy link
Author

SimonDedman added a commit that referenced this issue Oct 11, 2022
…ied expvars, based on #72.

DESCRIPTION version to 1.5.9
@SimonDedman
Copy link
Owner

gaus:
L1143 & 4:

Report[1:(length(Gaus_Bars[,1])),(reportcolno - 2)] <- as.character(Gaus_Bars$var)
Report[1:(length(Gaus_Bars[,2])),(reportcolno - 1)] <- as.character(round(Gaus_Bars$rel.inf), 2)

Bin is L1067:75

L887:
if (gaus) {Gaus_Bars <- summary(get(Gaus_Best_Model),
so bin/gaus_bars are already simp if simp was better...
So why are they printing all of the rel.inf's if most of the vars got dropped?

Gaus_Bars <- summary(get(Gaus_Best_Model),
                                      cBars = length(get(Gaus_Best_Model)$var.names),
                                      n.trees = get(Gaus_Best_Model)$n.trees,
                                      plotit = FALSE, order = TRUE, normalize = TRUE, las = 1, main = NULL)
      write.csv(Gaus_Bars, file = paste0("./", names(samples[i]), "/Gaussian BRT Variable contributions.csv"), row.names = FALSE)

Output csv colnames: var, rel.inf. I.e. not cBars nor n.trees. Odd.

L668: simplification.
L671: Gaus_Best_Simp assigned gbm object AFTER simplification, so should have extra variables dropped?

See notes from Bonnie having the same issue, L674:681
L680 & 645 replacements testing now.

@SimonDedman
Copy link
Owner

Pushed change, model re-run by Frances didn't need simplifying so change not tested, dangerzone.

@SimonDedman
Copy link
Owner

NFF any update on this, did the change solve the issue? If so please mark as closed. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants