Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DX solver function accelerateParticles looks wrong #47

Open
DaveC79 opened this issue Sep 3, 2019 · 1 comment
Open

DX solver function accelerateParticles looks wrong #47

DaveC79 opened this issue Sep 3, 2019 · 1 comment

Comments

@DaveC79
Copy link

DaveC79 commented Sep 3, 2019

The CUDA version of that function updates a float per thread so threads 0,1,2,3 would update floats X,Y,Z,W of particle 0.
The DX version looks like a straight port of that but is using float4 so threads 0,1,2,3 will read/update/write all of XYZW for the same particle at the same time. Since thread 3 has a zero value for sqrIterDt you've got 4 threads writing different values back to the same location.

Should probably change sqrIterDt to
float4 sqrIterDt = float4(gFrameData.mIterDt * gFrameData.mIterDt, gFrameData.mIterDt * gFrameData.mIterDt, gFrameData.mIterDt * gFrameData.mIterDt, 0.0f);
and remove all the "*4" and "/4" calculations to leave it as one thread per particle. To make it one thread per float would require additional get/set operators for curParticles.

@KevinGliewe
Copy link

Yes, this is definitely not intended.

I think it was made like this, to reduce bank conflicts. But the performance gain should be negauiable since the w axis needs to get loaded additionaly for every thread that handles the corrosponding particle to check if it is asleep.

And the groupshared particles spread the axises across the memory anyway.

It would make sense to calculate a particle on a single thread instead of spreading it i think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants