Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: updates for numpy2 closes #509 #510

Merged
merged 5 commits into from
Apr 5, 2024
Merged

MAINT: updates for numpy2 closes #509 #510

merged 5 commits into from
Apr 5, 2024

Conversation

andyfaff
Copy link
Contributor

@andyfaff andyfaff commented Apr 4, 2024

Starts to address #509

  • bumps the Python versions being tested to a minimum of cp39. This might have a mixed reception...
  • makes a GHA entry for testing with numpy>=2.0.0rc1
  • changes the usage of np.VisibleDeprecationWarning to np.exceptions.VisibleDeprecationWarning.

@andyfaff
Copy link
Contributor Author

andyfaff commented Apr 4, 2024

@MuellerSeb can you provide a minimal working example that would exercise the broadcasting issue in #509 (comment)?

@MuellerSeb
Copy link
Contributor

MuellerSeb commented Apr 5, 2024

Here it is (works with np 1.* but fails with np 2.0.0rc1):

import emcee as mc
import numpy as np


def ln_pdf(x):
    return np.log(np.sqrt(np.pi) * np.exp(-((x / 2.0) ** 2)))


init = np.random.rand(50, 1)
sampler = mc.EnsembleSampler(50, 1, ln_pdf)
burn_in_state = sampler.run_mcmc(initial_state=init, nsteps=20)

This gives:

ValueError: could not broadcast input array from shape (50,0) into shape (50,)

@MuellerSeb
Copy link
Contributor

In backend.py in L227, i added some print statements to look at the shapes:

        if state.blobs is not None:
            print(self.blobs.shape)
            print(state.blobs.shape)
            self.blobs[self.iteration, :] = state.blobs

With np 1.x, I get

(20, 50, 0)
(50, 0)

With np 2 I get:

(20, 50)
(50, 0)

And self.blobs looks like this:

[[b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'']
 [b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'']
 [b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'']
 [b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'']
 [b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'']
 [b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'']
 [b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'']
 [b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'']
 [b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'']
 [b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'']
 [b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'']
 [b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'']
 [b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'']
 [b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'']
 [b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'']
 [b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'']
 [b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'']
 [b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'']
 [b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'']
 [b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b''
  b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'' b'']]

So self.blobs is not correctly generated with numpy 2.

@MuellerSeb
Copy link
Contributor

And this seems to come from L179 in backend.py:

        if blobs is not None:
            dt = np.dtype((blobs.dtype, blobs.shape[1:]))
            a = np.empty((i, self.nwalkers), dtype=dt)

Here, a has shape (20, 50, 0) for np 1.x and (20, 50) with numpy 2.

@MuellerSeb
Copy link
Contributor

MuellerSeb commented Apr 5, 2024

So I guess the special dtype is the root of this error: np.dtype((blobs.dtype, blobs.shape[1:]))
dt=dtype(('<f8', (0,))) for both numpy versions but self.blobs.dtype=dtype('float64') for numpy 1.x and self.blobs.dtype=dtype('V') for numpy 2.

@MuellerSeb
Copy link
Contributor

@MuellerSeb
Copy link
Contributor

Here is what ChatGPT says about the problem:

The Problem

In NumPy 1.x, creating an array with a dtype shaped like (datatype, (0,)) and then attempting to concatenate it along a new axis could lead to an array that still retains the original datatype (float64 in your case), but with an additional dimension of size 0. This behavior effectively results in an array with a shape that includes a 0-sized dimension, which might not be what was intended but works due to how NumPy 1.x handles such cases.

In NumPy 2, the behavior seems to have changed to more strictly interpret the dtype, leading to the creation of an array where the dtype is V (void) or similar, representing a block of memory of the specified size but without a direct numerical datatype associated with it. This change reflects a stricter interpretation of the dtype specification, particularly around structured arrays and zero-sized dimensions.

The Solution

To address this issue and ensure compatibility with both NumPy 1 and 2, you should revise the approach to creating and manipulating these arrays. If the goal is to have self.blobs be a structured array with dimensions that match blobs but starting empty, you need to reconsider how to define dt and how to initialize a.

If blobs.shape[1:] results in a (0,) shape and you still wish to use this structure for consistency, consider explicitly handling this case or rethinking the necessity of the zero-sized dimension. It might be more straightforward to avoid creating a dtype that includes a zero-sized shape unless it's essential for your application.

For applications that don't strictly require the structured array with a zero-sized dimension, you can simplify the dtype definition. If blobs is expected to have additional dimensions that are meaningful, use those directly. If the zero-sized dimension is not essential, you might avoid including it in your dtype definition.

Here's a conceptual fix, assuming you want to preserve the non-zero dimensions of blobs:

if blobs is not None:
    # If blobs.shape[1:] is (0,), this needs handling or reconsideration
    if blobs.shape[1:]:  # This checks if there are meaningful dimensions beyond the first
        dt = np.dtype((blobs.dtype, blobs.shape[1:]))
    else:
        # Adjust the dtype definition to suit your needs when blobs.shape[1:] is (0,)
        dt = blobs.dtype  # Or another appropriate adjustment

    a = np.empty((i, self.nwalkers), dtype=dt)

    if self.blobs is None:
        self.blobs = a
    else:
        self.blobs = np.concatenate((self.blobs, a), axis=0)

This adjustment ensures you're consciously deciding on the dtype structure based on the presence of meaningful dimensions in blobs. If handling zero-sized dimensions is crucial for your application, you may need to include conditional logic to manage these cases explicitly, aligning with the behaviors of NumPy 2.

Given the migration to NumPy 2 and its stricter behaviors, it's also beneficial to consult the NumPy 2.0 migration guide for specific guidance on handling dtype and structured array changes.

@MuellerSeb
Copy link
Contributor

MuellerSeb commented Apr 5, 2024

This did the trick for me:

            if not np.any(blobs.shape[1:]):
                dt = blobs.dtype
                a = np.empty((i, self.nwalkers) + blobs.shape[1:], dtype=dt)
            else:
                dt = np.dtype((blobs.dtype, blobs.shape[1:]))
                a = np.empty((i, self.nwalkers), dtype=dt)

Should work with both, np 1.x and np 2.

@MuellerSeb
Copy link
Contributor

Maybe it is sufficient to always use:

a = np.empty((i, self.nwalkers) + blobs.shape[1:], dtype=blobs.dtype)

To not depend on this custom dtype.

Copy link
Contributor

@MuellerSeb MuellerSeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would only implement compat fixes to support numpy 1 and 2 and not touch the minimal required python version. The added check job in CI is a good idea until numpy 2 is out.

@@ -16,7 +16,7 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
python-version: ["3.7", "3.8", "3.9", "3.10"]
python-version: ["3.9", "3.10", "3.11", "3.12"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would only add 3.11 and 3.12 and not remove 3.7 (maybe because EOL) and 3.8.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The leading_edge job either way only runs on py 3.12 and numpy 2 wont be available for py 3.8 and lower.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go with what @dfm decides.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to test 3.9-3.12 as implemented in this PR. Thanks!!

@MuellerSeb
Copy link
Contributor

See: andyfaff#1

@dfm
Copy link
Owner

dfm commented Apr 5, 2024

Thanks for figuring this all out so quickly, @andyfaff! Much appreciated!

@dfm dfm merged commit 239bd3b into dfm:main Apr 5, 2024
13 checks passed
@MuellerSeb
Copy link
Contributor

Maybe it is sufficient to always use:

a = np.empty((i, self.nwalkers) + blobs.shape[1:], dtype=blobs.dtype)

To not depend on this custom dtype.

This issue is still unsolved and I think my suggested fix works with numpy 1 and 2. @dfm what do you think?

Or did I use emcee in a wrong way?

@andyfaff
Copy link
Contributor Author

andyfaff commented Apr 5, 2024

The fixes applied here addressed the original issue (for np1 and 2). Your comments around dtypes were unclear to me. Was there another bug you found?

I tried out the fixes here with returning a blob and the blobs were stored correctly in the state.

@MuellerSeb
Copy link
Contributor

@andyfaff sorry! I misinterpreted the _scalar fix. Everything is working now. Sorry for the spam then!

@MuellerSeb
Copy link
Contributor

@dfm could you create a new release including these fixes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants