Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize add_area_node_boxes in collision code #15719

Merged
merged 4 commits into from
Feb 15, 2025

Conversation

sfan5
Copy link
Collaborator

@sfan5 sfan5 commented Jan 26, 2025

probably(?) nobody has a perf problem with collision in the real world but I did this anyway

this PR is essentially just three simple optimizations. nothing complex.

References

relates to #15227

somewhat addresses the root cause of #15208: the "load" is now 16² not 16³

To do

This PR is Ready for Review.

How to test

I used a very unscientific benchmark by letting the client perform collisions in any direction:

diff --git a/src/client/client.cpp b/src/client/client.cpp
--- a/src/client/client.cpp
+++ b/src/client/client.cpp
@@ -22,6 +22,8 @@
 #include "client/mesh_generator_thread.h"
 #include "client/particles.h"
 #include "client/localplayer.h"
+#include "collision.h"
+#include "content_cao.h"
 #include "util/auth.h"
 #include "util/directiontables.h"
 #include "util/pointedthing.h"
@@ -550,6 +552,16 @@ void Client::step(float dtime)
 		{
 			counter = 0.0;
 			sendPlayerPos();
+
+			ScopeProfiler sp(g_profiler, "collide test", SPT_AVG, PRECISION_MICRO);
+			v3f pos = player->getPosition();
+			for (auto dir : g_6dirs) {
+				v3f speed;
+				collisionMoveSimple(&m_env, this,
+					player->getCollisionbox(), 0, 0.25f,
+					&pos, &speed, v3f(dir.X, dir.Y, dir.Z) * 200 * BS,
+					player->getCAO());
+			}
 		}
 	}
 

(keep position and world the same ofc)
this showed 0-60% faster times

@lhofhansl
Copy link
Contributor

Tried with a modified entity that creates two ones when destroyed. Still works fine, but I see no performance improvement in my case. After 1000 entities it crawls to halt at less than 10 FPS.

@sfan5
Copy link
Collaborator Author

sfan5 commented Feb 4, 2025

Yeah it won't change the fixed overhead. You are more likely to measure a difference with high-speed entities.

@SmallJoker SmallJoker self-requested a review February 6, 2025 20:34
Copy link
Member

@SmallJoker SmallJoker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  master / us PR / us
median 6.8 4.5
average 6.7 4.5
std. deviation 0.9 0.5

Sample size: 30
Joined the same dummy backend world in the same spot and profiled the function call with

// in add_area_node_boxes
ScopeProfiler sp(g_profiler, __func__, SPT_AVG, TimePrecision::PRECISION_MICRO);

The code makes sense, thus 👍. I'm still surprised that getNodeBlockPosWithOffset + getBlockNoCreateNoEx + MapBlock::getNodeNoCheck is apparently faster than Map::getNode.

@sfan5 sfan5 merged commit 75dcd94 into luanti-org:master Feb 15, 2025
17 checks passed
@sfan5 sfan5 deleted the photosemiconductor branch February 15, 2025 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants