Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force CPU, useful when OpenCL bugs are present and prevent running #113

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

dccote
Copy link
Contributor

@dccote dccote commented Jul 31, 2024

General

The OpenCL code is fairly complex and needs better testing. This is a first pass at it, with a bug fix on the smoothNormal kernel

I started a new series of tests testsCLProgram.py and testGeneralOpenCL.py.

Three main bugs fixed:

  1. The smoothing kernel has been fixed: there was a compilation problem with global versus local pointers. I fixed it by assigning to a local variable first then assigning back to the global array upon return. (See below for details)
  2. In the case of an error in the kernel (e.g., new OpenCL version, new machine etc...) we do not want to code to fail, it should fall back to CPU code. However, I was unable to find the proper way of doing this, so I added an environment variable and a function call to force CPU calculations. There is now an option to force the calculation to use the CPU. There is also useHardwareAcceleration when calling propagate, but to be able to run the examples as they are, we need an option to disable OpenCL completely without modifying the code. In my case, the smoothingNormal kernel did not compile and everything failed (see below):
    1. environment variable PYTISSUE_FORCE_CPU=1 in your shell
    2. the function forceCalculationOnCPU()
  3. The test kernel in random.c does not work when CLObject is used, but does work in testGeneralOpenCL.py when the basic PyOpenCL structures are used. There may be an error in CLObject.

Smooth kernel bug details:

The kernel for smoothNormal and smoothNormals passes __global Intersection* pointers around, but one function (_findClosestPolygonIntersection) actually does not generate an intersection in global space and on my machine, the OpenCL kernel does not compile. By assigning to a local variable and assigning back to the global array worked.

dccote added 3 commits July 31, 2024 13:41
Removed macOS specific files and allowed for mulitple venvs with different endings (venv-3.x)
Some requirements did not work with python3.12, configobj was missing
Even when available, bugs in the OpenCL code can render the whole module useless if we cannot bypass OpenCL. The user can disable it:

1. with the environement variable PYTISSUE_FORCE_CPU = 1
2. with the function `forceCalculationOnCPU`

It is not perfect, but at least it works.
@dccote dccote requested a review from JLBegin July 31, 2024 18:33
@dccote dccote self-assigned this Jul 31, 2024
@dccote dccote requested a review from PyMarc2 July 31, 2024 18:33
@JLBegin
Copy link
Contributor

JLBegin commented Jul 31, 2024

There's already a useHardwareAcceleration parameter that can be used when instanciating a light source. Is that enough?

dccote added 2 commits July 31, 2024 16:13
The current bug on M2 with OpenCL is related to the smoothNormal kernel.  I added a check for an environment variable to avoid smoothing, but I also had to modify intersection.c and now some other tests don't pass.

The problem in intersection.c is related to passing __global pointers around but one function creates a __local INtersection, so passing the pointer makes no sense and the compiler complains.  We need to find a way to either pass Intersection directly( which I tried to do) or always refer to the __global intersections array.  I tried to make it work by passing Intersections around instead of passing the pointers, but not sure if this is functional (the tests don't pass but I dont have a proof they passed before).
@dccote
Copy link
Contributor Author

dccote commented Jul 31, 2024

There's already a useHardwareAcceleration parameter that can be used when instanciating a light source. Is that enough?

Somewhat: it is a parameter to the method propagate, and I really wanted to be able to run the examples without modifying them (and by default, they use hardwareAcceleration). I really preferred to have a method for completely short-circuiting it in case of bugs, like now: I still want the PyTissueOptics code to run, and now it could not because technically, OpenCL was available, but I really needed to bypass its detection.

@dccote
Copy link
Contributor Author

dccote commented Jul 31, 2024

I have doubts about the testCLSmoothing.py file. Are the smoothed values correct?

dccote added 6 commits August 1, 2024 11:30
1. CLProgram now checks immediately to see if program file exists
2. the fillRandomFloatBuffer kernel does not work as expected: same seeds give different resuts when using CLObject base classes, but fine when not using them.
3. New Buffer base class from which EmptyBuffer derives.
4. PLenty of OpenCL tests added (testGeneralOpenCL.py) to understand OpenCL
I figured out how to get the smoothNormals kernel to work: assing a local variable then assgning back to global array. The tests is fine now.
@dccote
Copy link
Contributor Author

dccote commented Aug 2, 2024

I have doubts about the testCLSmoothing.py file. Are the smoothed values correct?

I fixed it, the kernel for smoothing is fine now. (See other branch)

Test OpenCL code added to force_cpu branch
@dccote dccote changed the title Force CPU Force CPU, useful when OpenCL bugs are present and prevent running Aug 2, 2024
@dccote dccote closed this Aug 5, 2024
@dccote dccote deleted the force_cpu branch August 5, 2024 16:27
@dccote dccote restored the force_cpu branch August 5, 2024 16:38
@dccote dccote reopened this Aug 5, 2024
@@ -12,9 +13,16 @@
else:
CONFIG = None

def forceCalculationOnCPU():
os.environ["PYTISSUE_FORCE_CPU"] = "1"
# warnings.warn("You can define PYTISSUE_FORCE_CPU=1 in your profile to avoid this call.")
Copy link
Contributor

@JLBegin JLBegin Aug 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes a total of 3 different endpoints to disable OpenCL, and I prefer this new apprach, so I think we should remove the parameter useHardwareAcceleration from Source.

I don't think "forceCPU" is the proper formulation, OpenCL is hardware-agnostic.
What about "PYTISSUE_DISABLE_OPENCL" and disableOpenCL() ?

Commented lines could be removed.

def test_call_random_value_with_same_seeds_buffer_should_give_same_results(self):
"""
If we use the same seeds, then the two calculations should be the same.
This tests fails and should not, but the impact is minimal: the cude still runs fine
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was fixed right?

'vtk>=9.2.2',
'mayavi==4.8.1',
'mayavi>=4.8.1',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to bump mayavi >= 4.8.2 for traitsui 8 compatibility.
https://docs.enthought.com/mayavi/mayavi/auto/changes.html

for i, value in enumerate(valueBuffer):
self.assertEqual(value, i)

self.assertEqual(np.sum(local_sizes.hostBuffer), nWorkUnits)
Copy link
Contributor

@JLBegin JLBegin Aug 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I run into a few issues with these tests on windows and linux with OpenCL 3.0. Same results on an old python 3.8 env and fresh 3.10.

  • Line 491: AttributeError: 'Array' object has no attribute 'hostBuffer'
  • Line 425:
   -->  self.assertEqual(i%32, value)
AssertionError: 0 != cl.Array(32, dtype=uint32)
  • And on windows only:
    Line 427: test_trivial_kernel_local_id_set_workgroup_size : exit code 0xC0000094 (Integer division by zero)

setup.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants