Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choleski-Decomposition #1068

Closed
Koenig128 opened this issue Nov 9, 2023 · 10 comments
Closed

Choleski-Decomposition #1068

Koenig128 opened this issue Nov 9, 2023 · 10 comments

Comments

@Koenig128
Copy link

Koenig128 commented Nov 9, 2023

Dear Jacob,

I ran into a problem with densehmm and I saw that some previous issues dealt with similar problems - but because of your recent rewrite, I am not sure if that was fixed and I am doing something wrong. I would appreciate if you could take a look.

A minimal example:

np.random.seed(42)
data = np.random.rand(20000)
df = pd.DataFrame(data, columns=['value'])
sin_values = np.sin(2 * np.pi * df.index / 24)
cos_values = np.cos(2 * np.pi * df.index / 24)
sin_normalized = (sin_values - sin_values.min()) / (sin_values.max() - sin_values.min())
cos_normalized = (cos_values - cos_values.min()) / (cos_values.max() - cos_values.min())

df['sinday'] = sin_normalized
df['cosday'] = cos_normalized

data_array = df.values
tensor_3d = data_array[np.newaxis, :, :]

model = DenseHMM([Normal(), Normal(), Normal(), Normal(), Normal(),
                 Normal(), Normal(), Normal(), Normal(), Normal()], max_iter=10, verbose=True)
model.fit(tensor_3d)

The sinus and cosinus functions are for encoding time information.

I am getting the following error:
_LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 3 is not positive-definite).

I do not quite understand the problem. It works (sometimes) if I reduce the number of states:

model = DenseHMM([Normal(), Normal(), Normal(), Normal(), Normal(), Normal(), Normal(), Normal(), Normal(), Normal()], max_iter=10, verbose=True) model.fit(tensor_3d)

But I would need more states because my real data are more complex. Do you have any suggestions?

Thank you very much in advance!

@jmschrei
Copy link
Owner

jmschrei commented Nov 9, 2023

This is a recurring problem. Basically, when you have many more states than you need to model your data, some states do not get assigned many/any points because the rest of the states model the data well. Having too few examples can cause an issue with the Cholesky decomposition if there isn't enough variance among them. I know scikit-learn gets around this using some trick -- I need to get around to implementing that. In the meantime, it's basically a sign that your model has too many parameters. If you had more complex real world data you'd be less likely to see this error because the number of states you chose would be needed to capture the heterogeneity of it.

@AKuederle
Copy link
Contributor

AKuederle commented Nov 9, 2023

I just had a look in the sklearn-code out of interest. They basically add a small value (reg_val) to the terms of the covariance matrix. For example see here: https://github.com/scikit-learn/scikit-learn/blob/093e0cf14aff026cca6097e8c42f83b735d26358/sklearn/mixture/_gaussian_mixture.py#L153

This value is set to 1-6 by default

    reg_covar : float, default=1e-6
        Non-negative regularization added to the diagonal of covariance.
        Allows to assure that the covariance matrices are all positive.

(from https://github.com/scikit-learn/scikit-learn/blob/093e0cf14aff026cca6097e8c42f83b735d26358/sklearn/mixture/_gaussian_mixture.py#L510)

In case the decomposition still fails, the error message suggests to either reduce the number of states, or increase the value of reg_covar

@jmschrei
Copy link
Owner

jmschrei commented Nov 9, 2023

Yeah that's what I thought... I'll try adding that in soon. In the meantime, pomegranate is modular enough that you can just copy/paste the Normal class, add this in yourself, and drop those objects into native pomegranate objects.

@AKuederle
Copy link
Contributor

AKuederle commented Nov 9, 2023

If I understood correctly, it should be possible to subclass Normal and just add a small value to covs calculated in this method.

def from_summaries(self):

class StableNormal(Normal):

    	def from_summaries(self):
		"""Update the model parameters given the extracted statistics.

		This method uses calculated statistics from calls to the `summarize`
		method to update the distribution parameters. Hyperparameters for the
		update are passed in at initialization time.

		Note: Internally, a call to `fit` is just a successive call to the
		`summarize` method followed by the `from_summaries` method.
		"""

		if self.frozen == True:
			return

		means = self._xw_sum / self._w_sum

		if self.covariance_type == 'full':
			v = self._xw_sum.unsqueeze(0) * self._xw_sum.unsqueeze(1)
			covs = self._xxw_sum / self._w_sum -  v / self._w_sum ** 2.0

		elif self.covariance_type in ['diag', 'sphere']:
			covs = self._xxw_sum / self._w_sum - \
				self._xw_sum ** 2.0 / self._w_sum ** 2.0
			if self.covariance_type == 'sphere':
				covs = covs.mean(dim=-1)
		
                # This is the magic :)
                covs += 1e-6 

		_update_parameter(self.means, means, self.inertia)
		_update_parameter(self.covs, covs, self.inertia)
		self._reset_cache()

@jmschrei
Copy link
Owner

jmschrei commented Nov 9, 2023

Yes, that's fine too. I was thinking you might want to add a user-defined covariance but hardcoding some small number is fine too if it works.

@jmschrei
Copy link
Owner

jmschrei commented Nov 9, 2023

Sorry, but to be clear you probably want to add covs += numpy.eye(covs.shape[0]) * 1e-6 under the full covariance type, right? You just want to add it to the diagonal of the covariance matrix, not every element.

@AKuederle
Copy link
Contributor

Yes, that's fine too. I was thinking you might want to add a user-defined covariance but hardcoding some small number is fine too if it works.

Yes, that would of course be the more elegant version. On that note, the ditribution has a parameter called min_cov which seems to be unused. Was this maybe intended for exactly this workaround?

Sorry, but to be clear you probably want to add covs += numpy.eye(covs.shape[0]) * 1e-6 under the full covariance type, right? You just want to add it to the diagonal of the covariance matrix, not every element.

Ah yes, you are right, I misread the scikit-learn code! Only the diagonal makes way more sense.

@Koenig128
Copy link
Author

Dear Arne and Jacob,

thank you very much for your clarifications and helpful suggestions! I really appreciate your efforts!

@AKuederle
Copy link
Contributor

Btw. is this a "duplicate" of #1039 ?

@jmschrei
Copy link
Owner

Not exactly a duplicate. min_cov doesn't necessarily fix the issue but could. Regardless, min_cov should be added in too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants