Add a fallback distribution in `GaussianNormalizer` in case the given distribution fails #945

npatki · 2025-02-26T16:34:46Z

Problem Description

The GaussianNormalizer RDT estimates the shape of a marginal distribution using scipy, and then performs an inverse CDF transform to return a normal distribution.

As noted in SDV #2391, the requested scipy distribution may sometimes fail to converge for reasons that are outside of our control. In such cases, GaussianCopula typically falls back to using the normal distribution.

It would be nice if GaussianNormalizer could do the same. That way, any synthesizer that is dependent on it can proceed without crashing.

Expected behavior

Update the GaussianNormalizer fall back to using 'norm' if something goes wrong. The fallback distribution cannot officially be changed by the user (via a parameter) but we should keep it as an attribute so it's accessible:

>>> transformer = GaussianNormalizer(distribution='beta')
>>> transformer._fallback_distribution
'norm'

During fit, we should use the _fallback_distribution if anything goes wrong in scipy. If this happens, log it (logger.INFO) the same way that we do for Copulas.
After fitting, it should be possible to access the learned parameters via an attribute. The returned values should be similar to GaussianCopulaSynthesizer's get_learned_distributions function.
If the fallback is used, make sure to update the name of the distribution from the originally requested distribution (eg. 'beta') to the fallback distribution (eg. 'norm')

>>> transformer.learned_distribution
{
  'distribution': 'norm',
  'parameters': { 'loc': 2.100, 'scale': 0.12121 } # these keys vary based on the distribution name
}

Additional Context

This would resolve the CopulaGANSynthesizer bug in SDV #2391.

The text was updated successfully, but these errors were encountered:

npatki added the feature request Request for a new feature label Feb 26, 2025

npatki mentioned this issue Feb 26, 2025

CopulaGANSynthesizer more likely to see see FitError (Optimization converged to parameters that are outside the range allowed by the distribution.) sdv-dev/SDV#2391

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a fallback distribution in `GaussianNormalizer` in case the given distribution fails #945

Add a fallback distribution in `GaussianNormalizer` in case the given distribution fails #945

npatki commented Feb 26, 2025 •

edited

Loading

Add a fallback distribution in GaussianNormalizer in case the given distribution fails #945

Add a fallback distribution in GaussianNormalizer in case the given distribution fails #945

Comments

npatki commented Feb 26, 2025 • edited Loading

Problem Description

Expected behavior

Additional Context

Add a fallback distribution in `GaussianNormalizer` in case the given distribution fails #945

Add a fallback distribution in `GaussianNormalizer` in case the given distribution fails #945

npatki commented Feb 26, 2025 •

edited

Loading