Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InferenceData (.nc file) not generated for HDDMRegression with stimulus coding #18

Open
shabnamhossein opened this issue Nov 18, 2024 · 5 comments
Assignees

Comments

@shabnamhossein
Copy link

I am using Docker version 4.35.1 (173168) for Mac (Sequoia 15.1). The Kabuki version is 0.6.5RC4 and the HDDM version is 1.0.1RC.
I am trying to run your tutorial “HDDM_Regression_Stimcoding” in the "OfficialTutorials" folder in Jupyter notebook with the addition of saving the InferenceData to be able to do posterior predictive check later. Line 16 of the tutorial is changed to:

save_name = "model_fitted/hddmregressor_example"
model_reg_infdata = m_reg.sample(500, return_infdata = True, save_name = save_name,  sample_prior = True, loglike = True, ppc = True)

However, the .nc file cannot be generated due to this error:

Start converting to InferenceData...
Start to calculate pointwise log likelihood...
The time of calculation of loglikelihood took 99.754 seconds
Start generating posterior prediction...
fail to convert posterior predictive check (self.ppc) to xarray: could not broadcast input array from shape (900,1) into shape (900,)
@panwanke
Copy link
Collaborator

panwanke commented Nov 20, 2024

I am using Docker version 4.35.1 (173168) for Mac (Sequoia 15.1). The Kabuki version is 0.6.5RC4 and the HDDM version is 1.0.1RC. I am trying to run your tutorial “HDDM_Regression_Stimcoding” in the "OfficialTutorials" folder in Jupyter notebook with the addition of saving the InferenceData to be able to do posterior predictive check later. Line 16 of the tutorial is changed to:

save_name = "model_fitted/hddmregressor_example"
model_reg_infdata = m_reg.sample(500, return_infdata = True, save_name = save_name,  sample_prior = True, loglike = True, ppc = True)

However, the .nc file cannot be generated due to this error:

Start converting to InferenceData...
Start to calculate pointwise log likelihood...
The time of calculation of loglikelihood took 99.754 seconds
Start generating posterior prediction...
fail to convert posterior predictive check (self.ppc) to xarray: could not broadcast input array from shape (900,1) into shape (900,)

Thank you for your feedback. After testing, we successfully replicated the issue you reported. Upon investigation, we found that the problem stems from an update in HDDM. Specifically, this issue does not occur when using HDDM version 0.8.0.

Let me first share a solution, followed by an explanation of the issue's origin.

Solution:

You can pull a Docker image with HDDM version 0.8.0 to fit models using stimcoding as a regressor. Use the command:
docker pull hcp4715/hddm:0.8.0.
This approach ensures that the environment remains stable without being affected by custom modifications.

Here are my test results with version 0.8.0 and it shows that it works.
image


Source of the Issue:

The issue arises from the following line in the HDDM repository:
https://github.com/hddm-devs/hddm/blob/6e766ef315629c20cd0be7267555c90c39cc0446/hddm/models/hddm_regression.py#L130.

Here, the wfpt_reg_like function is defined with the sampling_method="cssm" parameter, which is fixed and cannot be adjusted during model definition. However, this sampling method causes PPC errors, whereas the default method (sampling_method="drift") does not lead to such issues.

We will report this issue to the official HDDM maintainers. However, updates may not be guaranteed as they seem to be focusing on resolving these problems in HSSM.

@shabnamhossein
Copy link
Author

Thanks for your reply. I changed the HDDM version to 0.8.0 and the .nc file is generated for the tutorial data. However, I still have issues with generating the .nc file for my own data when using HDDMStimCoding. Am I doing something wrong in defining my model or sampling? This is my model:

model = hddm.HDDMStimCoding(data, include=['v', 'a', 't'], stim_col='stim', split_param='v', p_outlier=0.05)
model_infdata = model.sample(10000, burn=500, chains=1, save_name='model', return_infdata=True, ppc = True)

And this is the error I get:

Start converting to InferenceData...
/opt/conda/lib/python3.8/site-packages/kabuki/hierarchical.py:1157: UserWarning: n_ppc is not given, set to default 500
  warnings.warn("n_ppc is not given, set to default 500")
Start generating posterior prediction...
fail to convert posterior predictive check (self.ppc) to xarray: Supply a grouping so that at most 1 observed node codes for each group.

I am using these versions of the packages:
The current HDDM version is: 0.8.0
The current kabuki version is: 0.6.5RC4
The current PyMC version is: 2.3.8
The current ArviZ version is: 0.15.1

@panwanke
Copy link
Collaborator

Thanks for your reply. I changed the HDDM version to 0.8.0 and the .nc file is generated for the tutorial data. However, I still have issues with generating the .nc file for my own data when using HDDMStimCoding. Am I doing something wrong in defining my model or sampling? This is my model:

model = hddm.HDDMStimCoding(data, include=['v', 'a', 't'], stim_col='stim', split_param='v', p_outlier=0.05)
model_infdata = model.sample(10000, burn=500, chains=1, save_name='model', return_infdata=True, ppc = True)

And this is the error I get:

Start converting to InferenceData...
/opt/conda/lib/python3.8/site-packages/kabuki/hierarchical.py:1157: UserWarning: n_ppc is not given, set to default 500
  warnings.warn("n_ppc is not given, set to default 500")
Start generating posterior prediction...
fail to convert posterior predictive check (self.ppc) to xarray: Supply a grouping so that at most 1 observed node codes for each group.

I am using these versions of the packages: The current HDDM version is: 0.8.0 The current kabuki version is: 0.6.5RC4 The current PyMC version is: 2.3.8 The current ArviZ version is: 0.15.1

I didn't reproduce the error and everything works fine when I use real data. I guess it has something to do with the data?

image

@shabnamhossein
Copy link
Author

Can you tell me what the "grouping" in the error is referring to?
fail to convert posterior predictive check (self.ppc) to xarray: Supply a **grouping** so that at most 1 observed node codes for each group.

@panwanke
Copy link
Collaborator

Can you tell me what the "grouping" in the error is referring to? fail to convert posterior predictive check (self.ppc) to xarray: Supply a **grouping** so that at most 1 observed node codes for each group.

This error source was not caught by us. If you want to know where this error comes from, you can try to run ppc alone, and run model.gen_ppc (n_ppc = 500) or model.gen_ppc (n_ppc = 500, parallel = False) after fitting the model. Then you can see the generated posterior prediction data through model.ppc. If there is no problem, you can try model.to_infdata (ppc = True) again to see if there is a problem?
If model.gen_ppc doesn't work, you can try

from kabuki.analyze import post_pred_gen

ppc = post_pred_gen(model, samples=500, parallel=False)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants