Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a fallback distribution in GaussianNormalizer in case the given distribution fails #945

Open
npatki opened this issue Feb 26, 2025 · 0 comments
Labels
feature request Request for a new feature

Comments

@npatki
Copy link
Contributor

npatki commented Feb 26, 2025

Problem Description

The GaussianNormalizer RDT estimates the shape of a marginal distribution using scipy, and then performs an inverse CDF transform to return a normal distribution.

As noted in SDV #2391, the requested scipy distribution may sometimes fail to converge for reasons that are outside of our control. In such cases, GaussianCopula typically falls back to using the normal distribution.

It would be nice if GaussianNormalizer could do the same. That way, any synthesizer that is dependent on it can proceed without crashing.

Expected behavior

Update the GaussianNormalizer fall back to using 'norm' if something goes wrong. The fallback distribution cannot officially be changed by the user (via a parameter) but we should keep it as an attribute so it's accessible:

>>> transformer = GaussianNormalizer(distribution='beta')
>>> transformer._fallback_distribution
'norm'
  • During fit, we should use the _fallback_distribution if anything goes wrong in scipy. If this happens, log it (logger.INFO) the same way that we do for Copulas.
  • After fitting, it should be possible to access the learned parameters via an attribute. The returned values should be similar to GaussianCopulaSynthesizer's get_learned_distributions function.
  • If the fallback is used, make sure to update the name of the distribution from the originally requested distribution (eg. 'beta') to the fallback distribution (eg. 'norm')
>>> transformer.learned_distribution
{
  'distribution': 'norm',
  'parameters': { 'loc': 2.100, 'scale': 0.12121 } # these keys vary based on the distribution name
}

Additional Context

This would resolve the CopulaGANSynthesizer bug in SDV #2391.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

1 participant