DX solver function accelerateParticles looks wrong #47

DaveC79 · 2019-09-03T11:25:03Z

The CUDA version of that function updates a float per thread so threads 0,1,2,3 would update floats X,Y,Z,W of particle 0.
The DX version looks like a straight port of that but is using float4 so threads 0,1,2,3 will read/update/write all of XYZW for the same particle at the same time. Since thread 3 has a zero value for sqrIterDt you've got 4 threads writing different values back to the same location.

Should probably change sqrIterDt to
float4 sqrIterDt = float4(gFrameData.mIterDt * gFrameData.mIterDt, gFrameData.mIterDt * gFrameData.mIterDt, gFrameData.mIterDt * gFrameData.mIterDt, 0.0f);
and remove all the "*4" and "/4" calculations to leave it as one thread per particle. To make it one thread per float would require additional get/set operators for curParticles.

KevinGliewe · 2020-02-20T19:12:16Z

Yes, this is definitely not intended.

I think it was made like this, to reduce bank conflicts. But the performance gain should be negauiable since the w axis needs to get loaded additionaly for every thread that handles the corrosponding particle to check if it is asleep.

And the groupshared particles spread the axises across the memory anyway.

It would make sense to calculate a particle on a single thread instead of spreading it i think.

KevinGliewe mentioned this issue Feb 20, 2020

Fix DX Solver particle acceleration #52

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DX solver function accelerateParticles looks wrong #47

DX solver function accelerateParticles looks wrong #47

DaveC79 commented Sep 3, 2019

KevinGliewe commented Feb 20, 2020

DX solver function accelerateParticles looks wrong #47

DX solver function accelerateParticles looks wrong #47

Comments

DaveC79 commented Sep 3, 2019

KevinGliewe commented Feb 20, 2020