Bug: _select_device_and_dtype does not work on virtualized hardware #232

ibevers · 2025-01-02T20:35:08Z

Description

I ran into this issue because MacOS tests were passing locally, but not on GitHub Actions. It is because _select_device_and_dtype selected MPS on the MacOS GitHub Actions runner, but the runner virtualizes MacOS, so no MPS device is available.

See details here:
#215

Steps to Reproduce

Run code with DeviceType.MPS on a GitHub Actions MacOS runner (ARM).

Expected Results

The tests will segfault.

Actual Results

The tests segfaulted.

Additional Notes

No response

The text was updated successfully, but these errors were encountered:

fabiocat93 · 2025-01-09T23:09:19Z

Isn't this more of a GitHub limitation? https://docs.github.com/en/actions/using-github-hosted-runners/using-github-hosted-runners/about-github-hosted-runners#limitations-for-arm64-macos-runners

ibevers · 2025-01-10T22:26:56Z

@fabiocat93 This could come up in other contexts though too. Perhaps we could add a try/except that tries to run something on MPS in the function? We could do the same thing for GPU.

if torch.backends.mps.is_available():
    try:
        torch.empty(0, device="mps")
        available_devices.append(DeviceType.MPS)
    except Exception as e:
        print(f"MPS is available but encountered an error: {e}")

fabiocat93 · 2025-01-11T20:14:40Z

@fabiocat93 This could come up in other contexts though too. Perhaps we could add a try/except that tries to run something on MPS in the function? We could do the same thing for GPU.
if torch.backends.mps.is_available():
    try:
        torch.empty(0, device="mps")
        available_devices.append(DeviceType.MPS)
    except Exception as e:
        print(f"MPS is available but encountered an error: {e}")

Have you tested this on the GitHub runner? Does it solve the issue you describe above? If yes, it looks a good solution to me. I agree with you that we should apply the same approach to GPU. Also, I would recommend adding a comment explaining the rationale behind these edits

ibevers · 2025-01-13T15:18:09Z

@fabiocat93 I haven't tested it yet--I just wanted to see what you thought of that approach in concept. I will make proper PR and test. Thanks for your feedback!

ibevers added the bug Something isn't working label Jan 2, 2025

ibevers mentioned this issue Jan 2, 2025

Add Word-Level Alignment #215

Open

6 tasks

ibevers linked a pull request Jan 13, 2025 that will close this issue

Test Device Truly Available by Creating Trivial Object #234

Merged

6 tasks

fabiocat93 added the help wanted Extra attention is needed label Jan 15, 2025

ibevers self-assigned this Jan 21, 2025

fabiocat93 closed this as completed in #234 Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: _select_device_and_dtype does not work on virtualized hardware #232

Bug: _select_device_and_dtype does not work on virtualized hardware #232

ibevers commented Jan 2, 2025

fabiocat93 commented Jan 9, 2025

ibevers commented Jan 10, 2025

fabiocat93 commented Jan 11, 2025

ibevers commented Jan 13, 2025

Bug: _select_device_and_dtype does not work on virtualized hardware #232

Bug: _select_device_and_dtype does not work on virtualized hardware #232

Comments

ibevers commented Jan 2, 2025

Description

Steps to Reproduce

Expected Results

Actual Results

Additional Notes

fabiocat93 commented Jan 9, 2025

ibevers commented Jan 10, 2025

fabiocat93 commented Jan 11, 2025

ibevers commented Jan 13, 2025