Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Creating a new Bayesian Regressor with PyMC as a backend #358

Merged
merged 56 commits into from
Nov 16, 2024

Conversation

meraldoantonio
Copy link
Contributor

@meraldoantonio meraldoantonio commented May 23, 2024

Reference Issues/PRs

#7

What does this implement/fix? Explain your changes.

This WIP PR implements a Bayesian Linear Regressor with PyMC as a backend

Does your contribution introduce a new dependency? If yes, which one?

Yes - it depends on PyMC family: PyMC itself, XArray and ArviZ

What should a reviewer concentrate their feedback on?

The design of the BayesianLinearRegressor. Especially:

  1. The introduction of the priors. For now, the class hardcodes the priors. We need to think about the way in which the users should inject their own priors.

Did you add any tests for the change?

Not yet

Any other comments?

N/A

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • [ X] The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators

(This is not yet done)

  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
    dependency isolation, see the estimator dependencies guide.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@meraldoantonio meraldoantonio marked this pull request as draft May 23, 2024 17:11
fkiraly added a commit that referenced this pull request May 25, 2024
This PR removes the legacy base modules.

* base class: equivalent functionality is now contained in
`BaseDistribution`, `BaseProbaRegressor`, `_DelegatedProbaRegressor`
* pymc vendor interface: currently worked on in
#358
* density estimation: tracked via
#7
# Priors for unknown model parameters
self.intercept = pm.Normal("intercept", mu=self.intercept_mu, sigma=self.intercept_sigma)
self.slopes = pm.Normal("slopes", mu=self.slopes_mu, sigma=self.slopes_sigma, shape = self._X.shape[1], dims=("pred_id"))
self.noise = pm.HalfNormal("noise", sigma=self.noise_sigma)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would inverse gamma not be more standard here, as it is conjugate to the normal?

@meraldoantonio
Copy link
Contributor Author

meraldoantonio commented Oct 9, 2024

is this PR ready for merge?

Yes it is!

I would really suggest to chunk PRs in smaller, self-contained additions, so we can merge your contributions more quickly and PR do not get too large. For instance, update we can add later and it should not "hurt" (or would it?)

Sure, noted! Yes, you're right, we can add the update method in later PRs

Also, I've documented the logic behind this class in this PR, could you please take a look at this when you have time? Thank you!

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 10, 2024

is this PR ready for merge?

Yes it is!

Then you should adjust the tagging/signposting:

  • remove the "WIP" tag from the title
  • turn the PR form draft to ready (top right side menu)
  • "re-request a review", press circling arrows next to dev pictures who have already reviewed

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 10, 2024

PS: there is a conflict in the pyproject.toml, as your PR does not seem up to date with the recent release. Fixing is as usual: sync your fork, and merge main into your branch.

@meraldoantonio meraldoantonio changed the title [ENH] (WIP) Creating a new Bayesian Regressor with PyMC as a backend [ENH] Creating a new Bayesian Regressor with PyMC as a backend Oct 10, 2024
@meraldoantonio meraldoantonio marked this pull request as ready for review October 10, 2024 15:39
@meraldoantonio
Copy link
Contributor Author

PS on the update method: Based on my research, there isn't an ideal method for performing update using PyMC. There are 2 possible routes, however, both have significant limitations:

  1. Rebuilding the prior from the posterior using marginal 1D empirical approximation: This approach, described in the PyMC documentation, works fine when dealing with a single variable. However, if we have multiple variables (such as in our case, where we have an intercept and a slope per feature), this method is problematic because it loses the correlation structure between variables.

  2. Rebuilding the prior using a multivariate normal approximation: Another approach is to approximate the posterior as a multivariate normal distribution, as discussed in the PyMC Experimental library. While this method retains the correlation between variables, it assumes normality, which may not hold in cases where the posterior is multimodal or skewed.

Instead of introducing a potentially misleading update method, I think it would be better to omit it at this stage

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks great!

There are some testing issues:

  • the failure on 3.9 still seems there, even though we clearly set the tag. Could you check if you can understand what is going on there?
  • get_test_params should be populated with at least two examples.

@meraldoantonio
Copy link
Contributor Author

The code looks great!

There are some testing issues:

  • the failure on 3.9 still seems there, even though we clearly set the tag. Could you check if you can understand what is going on there?
  • get_test_params should be populated with at least two examples.

I'm not sure what's going on with the failing tests with Python 3.9, I've tried playing around with the versions but somehow the test suite ignores this tag :(

I've added examples to get_test_param

@fkiraly
Copy link
Collaborator

fkiraly commented Nov 9, 2024

Summarizing our collaborative coding session on November 8, we have found out what the problem is in the failures on python 3.9, namely a known issue with the dependency checking framework that is already fixed in sktime and skbase.

The todo is summarized in #489, and should happen in a separate pull request.

@fkiraly
Copy link
Collaborator

fkiraly commented Nov 16, 2024

oops, looks like this got accidentally closed, I reopened.

The reason for this is, you accidentally used a reserved "closing keyword" in #490, that is, you wrote "fixes #358" in the preamble. If you do this, then the mentioned issue or PR gets closed once the PR is merged! I did not know btw that this also applied to PR, not just to issues.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now. I fixed a merge conflict in pyproject, somehow the dead deps tabulate and uncertainties returned.

@fkiraly fkiraly merged commit 09cd723 into sktime:main Nov 16, 2024
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement module:regression probabilistic regression module
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants