Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add indirect draw functionality to MultiMesh #99455

Merged

Conversation

Bonkahe
Copy link
Contributor

@Bonkahe Bonkahe commented Nov 20, 2024

Currently has errors with the automated tests, not sure how to solve this, if anyone with experience could have a look and tell me what I'm missing, I'd be very grateful.

Implements a new indirect mode to MultiMeshes, when allocating data a "use_indirect" Boolean can be set to true, initiating generation of a command buffer, this will be updated whenever a new mesh is added, and works with multiple surfaces, or theoretically non-indexed meshes, this hasn't been tested however.

The command buffer can be used to allow multimeshes, or any buffer alteration function, directly alter the number of instances drawn, in the example project is a C# implementation, as well as a GDScript test, which is much less robust.

NEW example project:
Contains working frustum culling using compute shaders, and a simple grass model with implementation.
ZIP: newindirectdrawmultimeshexample.zip
Video: https://www.youtube.com/watch?v=NPtDSJaN3cQ

New project Gifs for eye candy purposes:
CommandBufferTests8

Also some Examples of pausing the updating of the frustum culling, to show exactly what is being rendered by the gpu, with this method you can have an extreme amount of objects on screen with very good performance.
CommandBufferTests7
CommandBufferTests9

OLD:
Example Project:
IndirectDrawMultimeshExample.zip

The C# example in the project, 32k instances being enabled and disabled using a sine wave controlled through a compute shader with the command buffer bound to one of it's uniforms:
IndirectRenderingexample

There's also a very ham-handed attempt at frustum culling in here which does not work, just an evening of me messing around with it.

Things that may be able to be improved, the method by which I composite the data into the command buffer within the mesh_storage could be improved, I am inexperienced in handling vectors in c++ and as such ended up with a somewhat clumsy struct method of building the data. Every other attempt I had to just write to a single entry in the command buffer (for example to directly update the number of instances without touching anything else) resulted in errors, and even crashing my computer several times, so for now I figured I would go ahead and put out what I have, and see what you all think.

@Bonkahe Bonkahe requested review from a team as code owners November 20, 2024 03:37
@Saul2022
Copy link

Awesome, though could this bring a significant improvement upon instancing thousands of say blade of grass like some unity video had?

Aside, making it more frustrum culling friendly would be a cool thing to investigate, maybe internally partition the multimesh in chunks? Not sure.

@Bonkahe
Copy link
Contributor Author

Bonkahe commented Nov 20, 2024

Awesome, though could this bring a significant improvement upon instancing thousands of say blade of grass like some unity video had?

Aside, making it more frustrum culling friendly would be a cool thing to investigate, maybe internally partition the multimesh in chunks? Not sure.

Oh to be fair, the frustum culling not working is entirely my bad xD I just never implemented frustum culling and I failed at it, this system should work just fine with frustum culling, it's entirely user error in it not working in the test project.

And using it for grass is exactly what I'll be doing, this is mostly to help with grass not being used but still needing to be in memory when you are in a location that doesn't have any grass.

@Bonkahe Bonkahe force-pushed the IndirectMultimeshImplementation branch 7 times, most recently from 688ae8b to 4b3b570 Compare November 21, 2024 19:23
@kitbdev
Copy link
Contributor

kitbdev commented Nov 21, 2024

To fix the error see https://docs.godotengine.org/en/latest/contributing/development/handling_compatibility_breakages.html#id3
You can add the error to the file 4.3-stable.expected.

@Bonkahe Bonkahe force-pushed the IndirectMultimeshImplementation branch from 4b3b570 to 5eba36c Compare November 22, 2024 00:59
@Bonkahe Bonkahe requested review from a team as code owners November 22, 2024 00:59
@Bonkahe Bonkahe force-pushed the IndirectMultimeshImplementation branch from 5eba36c to cc5f18c Compare November 22, 2024 01:04
@Bonkahe
Copy link
Contributor Author

Bonkahe commented Nov 22, 2024

To fix the error see https://docs.godotengine.org/en/latest/contributing/development/handling_compatibility_breakages.html#id3 You can add the error to the file 4.3-stable.expected.

Unfortunately I figured that out as well, and it has not resolved the error, not sure why, maybe I added it incorrectly?

@kitbdev
Copy link
Contributor

kitbdev commented Nov 22, 2024

Looks like the error says RenderingServer but you put RenderingDevice.

@Bonkahe Bonkahe force-pushed the IndirectMultimeshImplementation branch from cc5f18c to 214dbdc Compare November 22, 2024 01:41
@Bonkahe
Copy link
Contributor Author

Bonkahe commented Nov 22, 2024

Looks like the error says RenderingServer but you put RenderingDevice.

I don't think I could sigh any louder.
Thank you very much, we'll see how the checks go.

@Bonkahe Bonkahe force-pushed the IndirectMultimeshImplementation branch from 214dbdc to 711e352 Compare November 22, 2024 21:30
@Bonkahe Bonkahe force-pushed the IndirectMultimeshImplementation branch from e8abf8a to e546ae2 Compare December 4, 2024 16:31
@Bonkahe
Copy link
Contributor Author

Bonkahe commented Dec 4, 2024

Mickeons change implemented, they worded it way better than me lol, thank you!
Rebased and conflicts solved.

@Bonkahe Bonkahe force-pushed the IndirectMultimeshImplementation branch from e237dae to 8ba50d5 Compare January 7, 2025 15:45
@Bonkahe
Copy link
Contributor Author

Bonkahe commented Jan 7, 2025

Resolved both those issues and pushed out the stride count to an enum to resolve just having a magic number 5 everywhere xD
Should be good to go for review again.

@jknightdoeswork
Copy link

Does this allow the multi mesh to use transforms computed by a compute shader? ie: can the mesh instances be moved/rotated/scaled by a compute shader?

@clayjohn
Copy link
Member

clayjohn commented Jan 9, 2025

Does this allow the multi mesh to use transforms computed by a compute shader? ie: can the mesh instances be moved/rotated/scaled by a compute shader?

That is already possible! This PR takes it one step further and allows you to change the instance count via compute shader so you can generate or cull instances fully from compute

@Bonkahe
Copy link
Contributor Author

Bonkahe commented Jan 9, 2025

Clay is correct, that should be fully doable, the example project in the first post has a functional version of that exact thing being done with a compute shader.

Also some random eye candy from said project, cause I was playing around with it (displaying the functionality of frustum culling on the gpu):
CommandBufferTests7
CommandBufferTests8
CommandBufferTests9

Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there used to be code that handled the case where a mesh had a new surface added? Was I just imagining that?

In either case, this is going to fail if a mesh has a surface added after being assigned to the MultiMesh

@Bonkahe
Copy link
Contributor Author

Bonkahe commented Jan 9, 2025

I feel like there used to be code that handled the case where a mesh had a new surface added? Was I just imagining that?

In either case, this is going to fail if a mesh has a surface added after being assigned to the MultiMesh

I was thinking before that maybe I should push out the actual generation of a command buffer to another function just within the mesh_storage.cpp and have it called when a mesh is added or changed, but was unsure if that would be ideal/if I should be adding new functions to something for specifically that, but it would prevent duplicated code, what do you think?

Edit: Actually that might not be a good idea, I am very unsure how I would actually update a multimesh if it's mesh surface count changed, and yes that would break this, but the only thing I can think of is a rebuild command buffer function which could be called to refresh it.
We would just have to put somewhere that if a surface is added the multimesh has to be refreshed it's command buffer, do you think this is a situation that would actually occur very often though?

@Bonkahe Bonkahe force-pushed the IndirectMultimeshImplementation branch from 8ba50d5 to 783350c Compare January 9, 2025 22:50
@clayjohn
Copy link
Member

We would just have to put somewhere that if a surface is added the multimesh has to be refreshed it's command buffer, do you think this is a situation that would actually occur very often though?

No, I don't think it will occur often, but I do think it is something we have to account for. Currently if someone adds a surface to a mesh they will get a nasty Vulkan error message at best, at worst, they may get a crash due to an out of bounds memory read.

Update doc/classes/RenderingServer.xml

Co-Authored-By: Micky <[email protected]>
@Bonkahe Bonkahe force-pushed the IndirectMultimeshImplementation branch from 783350c to e6daec9 Compare January 13, 2025 20:52
Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really like to include this in 4.4 so I am approving it despite having two changes I would like to make. Overall the code is great and the feature is amazing, so these nitpicks shouldn't stand in the way

  1. We need to update the command buffer when a new surface is added or removed
  2. We can improve how we set the vertex count in the command buffer in _multimesh_set_mesh

Comment on lines +1628 to +1631
newVector.set(i * sizeof(uint32_t) * INDIRECT_MULTIMESH_COMMAND_STRIDE, static_cast<uint8_t>(count));
newVector.set(i * sizeof(uint32_t) * INDIRECT_MULTIMESH_COMMAND_STRIDE + 1, static_cast<uint8_t>(count >> 8));
newVector.set(i * sizeof(uint32_t) * INDIRECT_MULTIMESH_COMMAND_STRIDE + 2, static_cast<uint8_t>(count >> 16));
newVector.set(i * sizeof(uint32_t) * INDIRECT_MULTIMESH_COMMAND_STRIDE + 3, static_cast<uint8_t>(count >> 24));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an easier way to do this. But let's leave it for a follow up PR so we can sneak this in to 4.4

@clayjohn clayjohn modified the milestones: 4.x, 4.4 Jan 14, 2025
@AThousandShips AThousandShips self-requested a review January 14, 2025 09:16
@akien-mga akien-mga merged commit 85b066a into godotengine:master Jan 14, 2025
20 checks passed
@akien-mga
Copy link
Member

Thanks!

@AThousandShips
Copy link
Member

Didn't get a chance to add my style review (was busy and added a request for it) maybe I can do some of that in a follow-up

@akien-mga
Copy link
Member

Yeah Clay made it clear he wanted this merged before beta1, i.e. right now. Style improvements can be done in a follow-up and merged any time.

@stuartcarnie
Copy link
Contributor

Confirmed that it works on Metal after I converted the C# to GDScript:

CleanShot 2025-01-21 at 06 44 23

@bikemurt
Copy link
Contributor

bikemurt commented Feb 9, 2025

Wondering if this could be used for multimesh batching (appears the answer is no): godotengine/godot-proposals#9673

I might be missing something here in the implementation, but I'm not getting a reduction in draw calls according to the runtime statistics (I haven't tried RenderDoc yet): @fire

Built from latest master.

2025-02-09.14-30-46.mp4
@tool
extends Node3D
@onready var multimeshes: Node3D = $Multimeshes
const NEW_QUAD_MESH = preload("res://mm_test/new_quad_mesh.tres")
func _ready() -> void:
	for m: MultiMeshInstance3D in multimeshes.get_children():
		var mm := MultiMesh.new()
		mm.transform_format = MultiMesh.TRANSFORM_3D
		mm.mesh = NEW_QUAD_MESH
		mm.instance_count = 5
		
		var rid := mm.get_rid()
		RenderingServer.multimesh_allocate_data(rid, 5, RenderingServer.MULTIMESH_TRANSFORM_3D, false, false, true)
		for j in range(5):
			var t := Transform3D()
			t.origin = Vector3(j * 2, 0, 0)
			
			RenderingServer.multimesh_instance_set_transform(rid, j, t)
		
		m.multimesh = mm

Any ideas what I'm doing wrong? Maybe I need to set the mesh using the RenderingServer API as well?

Edit: just noticed I am getting this error.

W 0:00:00:195   RenderingContextDriverVulkan::_debug_messenger_callback: GENERAL - Message Id Number: 0 | Message Id Name: Loader Message
	windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files.
	Objects - 1
		Object[0] - VK_OBJECT_TYPE_INSTANCE, Handle 2420302383344
  <C++ Source>  drivers\vulkan\rendering_context_driver_vulkan.cpp:639 @ RenderingContextDriverVulkan::_debug_messenger_callback()

@fire
Copy link
Member

fire commented Feb 9, 2025

Can you move RenderingServer.multimesh_allocate_data(rid, 5, RenderingServer.MULTIMESH_TRANSFORM_3D, false, false, true) outside the for loop?

Like use one multimesh rather than many multimeshes.

Like have all the meshes in the multimesh but per each instance hide the vertices using trickery.

@bikemurt
Copy link
Contributor

bikemurt commented Feb 9, 2025

Sure, that works as expected, but really that's just the base Multimesh functionality.
image
I also tried setting the mesh from the RenderingServer API, but no difference there:

@tool
extends Node3D
@onready var multimeshes: Node3D = $Multimeshes
const NEW_QUAD_MESH = preload("res://mm_test/new_quad_mesh.tres")
func _ready() -> void:
	#for m: MultiMeshInstance3D in multimeshes.get_children():
	var m: MultiMeshInstance3D = multimeshes.get_child(0)
	var mm := MultiMesh.new()
	mm.transform_format = MultiMesh.TRANSFORM_3D
	#mm.mesh = NEW_QUAD_MESH
	mm.instance_count = 10
	
	var rid := mm.get_rid()
	RenderingServer.multimesh_allocate_data(rid, 10, RenderingServer.MULTIMESH_TRANSFORM_3D, false, false, true)
	RenderingServer.multimesh_set_mesh(rid, NEW_QUAD_MESH.get_rid())
	for i in range(2):
		for j in range(5):
			var t := Transform3D()
			t.origin = Vector3(j * 2, 0, i * 2)
			
			RenderingServer.multimesh_instance_set_transform(rid, j + i*5, t)
	
	m.multimesh = mm

@fire
Copy link
Member

fire commented Feb 9, 2025

My bad, I don't think you can use it as a batch api. I'm assuming that each call to indirect draw is a call.

@bikemurt
Copy link
Contributor

bikemurt commented Feb 9, 2025

My bad, I don't think you can use it as a batch api. I'm assuming that each call to indirect draw is a call.

No worries, I assume this PR has more to do with being able to control instances uniquely within the multimesh (maybe I'm wrong, haven't dug into it too far).

@Bonkahe
Copy link
Contributor Author

Bonkahe commented Feb 9, 2025

My bad, I don't think you can use it as a batch api. I'm assuming that each call to indirect draw is a call.

No worries, I assume this PR has more to do with being able to control instances uniquely within the multimesh (maybe I'm wrong, haven't dug into it too far).

That is correct, this lets you modify the number of instances in a multimesh from within compute shader.
Also for clearification, the number of draw calls/instance/triangle counts, in debugger is handled via the cpu, and as such does not reflect changes done to a multimesh command buffer via compute shader, and thus if you had a multimesh set with 1000 instances, and then used a compute shader to set it to 10, the number of triangles and other metrics would still reflect 1000. This is because of the nature of indirect draw, being the idea that it doesn't have to directly consult with the cpu for the instance counts, as such to accurately track the actual count would defeat the purpose of having it be indirect.

As for batching, that's an interesting concept, what you could do is take all the multimeshes with the same base mesh, instantiate a new one that is the target, and use a compute shader to merge all the buffers and set the count for the target as the total count, then you could even theoretically do frustum culling on that compute shader, while this exact thought was not talked about, the subject of having a toggle on multimeshes to make them automatically cull almost like particles but just based on frustum using the command buffer was floated in one of the rendering team meetings, and I think they both have similar objectives, I'm thinking about revisiting this with the purpose of implementing at least frustum culling.

Not sure if all that helps any, as it's mostly just my musings on the subject, but I hope it at least gave some clarification. We do need more documentation though for sure.

@bikemurt
Copy link
Contributor

bikemurt commented Feb 9, 2025

As for batching, that's an interesting concept, what you could do is take all the multimeshes with the same base mesh, instantiate a new one that is the target, and use a compute shader to merge all the buffers and set the count for the target as the total count, then you could even theoretically do frustum culling on that compute shader, while this exact thought was not talked about, the subject of having a toggle on multimeshes to make them automatically cull almost like particles but just based on frustum using the command buffer was floated in one of the rendering team meetings, and I think they both have similar objectives, I'm thinking about revisiting this with the purpose of implementing at least frustum culling.

Frustum culling would be great. There's a proposal here for it. I'm investigating it today. I think for most users, being able to batch multimeshes finds the best usecase with chunked/cell based systems, and even if there were frustum culling on the broader chunk, that would be good enough. RenderingDevice::draw_list_draw is already intelligent enough to either take in a batch of single draw calls that were batched, or a multimesh, so I'm hoping there's not too much more work to get it to batch multimeshes. They will have already been sorted correctly, as the sort keys are the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.