Bevy Render Rework: Initial Framing and Proof of Concept #2265

cart · 2021-05-28T02:36:38Z

cart
May 28, 2021
Maintainer

bevy_render: The Current State of `main`

In my opinion the current bevy_render gets a lot of things right:

Modular render logic (via the Render Graph)
Multiple backends (everything supported by bevy_wgpu as well as a custom WebGL 2 backend)
High level data-driven api that makes it easy and ergonomic to write custom per-entity render logic

However it also has a number of significant shortcomings:

The "high level ease of use" comes at the cost of significant implementation complexity, performance overhead, and invented jargon (RenderResources / RenderResourcesNode / AssetRenderResourcesNode / RenderResourceBindings are notable offenders). Users are often overwhelmed when trying to operate at any level but "high level".
It stores a lot of internal render state on each entity. This takes up space, computing the state is expensive (largely due to hashing and allocations), and it gunks up "userspace" with a bunch of "do not touch" render state. This state (or at least, the component metadata) needs to be written to / read from Scenes, which is also suboptimal.
It doesn't provide any tools to handle "repeated render logic", such as viewports and shadow maps. Implementing this logic is possible, but it requires hard-coding / special casing, which isn't aligned with our goals for modularity.
Features like "sprite rendering" are built on the high level abstractions mentioned above. Performance is not good when compared to other options in the ecosystem. A high-end machine running BevyMark (our implementation of the popular "bunnymark" sprite rendering benchmark) can render ~8,000 sprites before dropping below 60fps. My godot-bunnymark implementation (now a bit out of date) renders ~20,000 godot Sprites with GDScript and ~40,000 Sprites with C++. I hear rafx can currently hit ~53,000.

Bevy is now being used at a scale where these shortcomings are no longer acceptable. Its time to rework our rendering abstractions. My goals for this rework:

Fast by default: Our abstractions shouldn't get in the way of building high-performance render logic. We should be competitive on benchmarks like bunnymark (which expose abstraction overhead).
Simple and Extensible High Level Apis: Modern low level rendering is hard. The current bevy_render makes writing custom shaders and binding entity data accessible to newbies. The new abstractions should (ultimately) be at least as ergonomic and accessible as the current ones.
Modular: Most general purpose engines struggle with this one. They often just provide "fixed" render logic with a few limited hooks for custom behaviors (ex: Godot ShaderMaterials). Engines that provide the option for "custom render paths" often require those paths to be defined by a top-level "render path main function" (see unity and armory). I believe a middle ground exists where logic can be composed in a modular way.
Decoupled: "app state" shouldn't have to care about "render state". "render state" should be derived from "app state".

Potential Paths Forward

There are many paths we could take here, but I want to scope this conversation to three options:

Iterate on bevy_render and continue using wgpu
- Adopt a "pipelined rendering" model
- Remove most existing high level and mid-level abstractions and replace them with better abstractions
- Adapt the Render Graph to handle "dynamic rendering" scenarios
- Apply polish to some rough edges
Iterate on bevy_render, but use rafx-api
- Same as (1), but replace wgpu with rafx-api. We would likely re-export aliased versions of rafx-api types and remove the current low level bevy_render RenderContext / RenderResourceContext abstractions.
Fully adopt rafx-framework
- Embrace rafx's mid/high level abstractions

`bevy_render` Rework: Initial Proof of Concept

This first proof of concept fleshes out Path (1)

The code is available here

All pipelined code / plugins live in the top level pipelined folder. This rework is completely decoupled from the original render code (it isn't a full rewrite, but it does change a lot).

This currently aims at making "low-ish level" code pipelined. The high level abstractions have been stripped out and new ones will need to be designed and built. But we should focus on making the low-ish level abstractions good first.

Render App Model

Render App Stages

Extract: Extracts "app world" data and writes to "render world" using Commands
Prepare: Prepares "render world" data by writing it to the gpu
Queue: Queue up draw calls and create bind groups for things being drawn
Render: Execute the render graph (which reads data produced in the previous stages)

SubApps

To enable a separate "app world", "app schedule", "render world", and "render schedule", I added SubApps. SubApps have their own World and Schedule. They are owned by the main App. Currently they are identified by an integer index for simplicity of implementation, but a final implementation would probably use ZSTs for identifiers.

impl Plugin for SpritePlugin {
    fn build(&self, app: &mut App) {
        // this is registered on the "main app"
        app.register_type::<Sprite>();

        // this adds systems and resources to the "render app"
        let render_app = app.sub_app_mut(0);
        render_app
            .add_system_to_stage(RenderStage::Extract, render::extract_sprites.system())
            .add_system_to_stage(RenderStage::Prepare, render::prepare_sprites.system())
            .add_system_to_stage(RenderStage::Queue, queue_sprites.system())
            .init_resource::<SpriteShaders>()
            .init_resource::<SpriteBuffers>();
    }
}

Currently I don't actually parallel-pipeline subapp execution because it will involve some thought on how we interact with winit. But the pieces are all there / the dataflow is defined in the right way.

I am not convinced this is the best api yet, but its relatively simple and gets the job done. We've been discussing apis like this in the SubWorlds RFC.

New Abstractions

BufferVec<T: Pod>, UniformVec<T: AsStd140>, and DynamicUniformVec<T: AsStd140>
- Simple ways to mirror vectors of gpu-compatible data to gpu buffers.
- Reuses buffers across frames, provided they have enough space for each frame's data. Handles resizes "automatically"
- Fills a similar niche as RenderResourcesNode and AssetRenderResourcesNode, just way simpler
- Uses Crevice and bytemuck types instead of custom bevy Bytes/Byteable and RenderResources types. This improves safety and Crevice ensures that the data is padded for compatibility with glsl
- Not necessarily the final abstraction we'll use (ex: they each have their own staging buffer instead of using something like a staging belt), but they are very simple, decently efficient, and easy to use.

Draw and DrawFunctions

Draw is a trait that draws a specific "thing"

pub trait Draw: Send + Sync + 'static {
    fn draw(
        &mut self,
        world: &World,
        pass: &mut TrackedRenderPass,
        draw_key: usize,
        sort_key: usize,
    );
}

DrawFunctions is a collection of boxed Draw traits

RenderPhase
- A resource to collect and sort "drawn item requests". These include the Draw function index, an id identifying the drawn thing, and a value used for sorting.
MainPassNode
- A render graph node that currently just draws a single RenderPhase (but this will eventually be split up into multiple phases)
TrackedRenderPass
- Tracks the current render pass state to avoid binding resources that are already bound.
- This is a generalization of the logic in the old MainPassNode
ParamState<(Res<A>, ResMut<B>)>
- A way to use system params without all the overhead of systems. ParamState stores cached system param state (to avoid doing that more than once). You can call ParamState::get(world) to return system param values.

Removals and Tweaks

Remove AppBuilder in favor of adding methods directly to App (which made SubApps nicer to work with). This might be controversial. I'm happy to discuss alternatives.
Res<Box<dyn RenderResourceContext>> -> Res<RenderResources> (derefs to &dyn RenderResourceContext). This just makes it nicer / more ergonomic to deal with render resources in userspace
Made shaders and pipelines proper render resources
- This removes bevy_asset types from the renderer api, which is much cleaner imo
Removed old RenderResources / RenderResource traits
- For now, converting Components and Assets to gpu-compatible types is a manual process. We can consider alternative high level apis later, but I'd prefer to focus on making the "explicit" path nice first.
Removed RenderResourcesNode and AssetRenderResourcesNode
- These were extremely complicated and relied on data living on "app entities". Similar functionality is now possible using the new much simpler BufferVec, UniformVec, and DynamicUniformVec
Removed Window from RenderResourceContext::next_spawn_chain_texture in favor of the more generic SwapChainDescriptor
Removed RenderResourceBindings
- This was extremely expensive
- It is now the job of "arbitrary render app logic" to track and create bind groups for entities
- We can build high level abstractions later to fill a similar niche (if needed), but the current impl had to go
Removed ShaderDefs
- This was extremely expensive
- It is now the job of "arbitrary render app logic" to decide what shaders are needed to draw an entity
- We can build high level abstractions later to fill a similar niche (if needed), but the current impl had to go
Removed asset tracking from RenderResourceContext
- RenderResourceContext implementors shouldn't need to think about this mapping. It can happen at a higher level. This was added in bevy's early pre-release stages and I've been regretting it ever since.
Removed SharedBuffers
- We can do better (ex: https://github.com/gfx-rs/wgpu-rs/blob/master/src/util/belt.rs)
Removed other assorted cruft from RenderResourceContext api

Drawing Sprites

Extract: iterate over Sprites and insert a resource containing a vector of ExtractedSprites
- note: ultimately we might want to create an entity for each sprite to increase the modularity of the data. this would allow users to attach new extracted data to extracted sprites.
Prepare: produce a combined "sprite vertex buffer" and "sprite index buffer" by pre-transforming each sprite vertex by its extracted GlobalTransform + size. This cuts down on the amount of data sent to the gpu / the amount of work the shader does.
Queue: setup the camera bind group, sprite texture bind group, and queue up Draw calls in the TransparentPhase
Draw: The MainPassNode iterates over the sorted TransparentPhase, looks up and runs each sprite's Draw function (this is currently the same for each sprite), which adds the draw calls to the command queue.
The SpriteNode also does the final copy from staging buffers to final buffers, but I'm planning on making it easier to do this in the Prepare step without creating new nodes to run these commands.

The pipelined sprite code is here: https://github.com/cart/bevy/tree/pipelined-rendering/pipelined/bevy_sprite2/src

Pipelined BevyMark

Results are currently quite good relative to other options. The PoC can currently render ~89,000 sprites at 60fps on my machine, which is better than all of the other results listed at the beginning of this post. Theres also plenty of room for improvement. We can try drawing everything with a single draw call, moving the mesh data into the shader, replacing Asset hashing with generational indexing, etc.

You can test this by running cargo run --example bevymark_pipelined --release

Next Steps for PoC

"Sub Graphs": define "named" render graphs with inputs and outputs, which can be programmatically executed zero or more times from within Render Graph Nodes.
Allow passing entities along graph edges (currently only textures and buffers can be passed along edges):
- This feeds in to (1). Ex: A ViewportNode might query for "viewport entity ids", then execute a "forward render graph" that takes each viewport entity id as input and renders the entire scene from that view. A Node that depends on the ViewportNode wouldn't run until all viewports have been rendered.
Implement some render features that benefit from (1) and (2) (ex: shadow maps, viewports, etc)
Find a good place to sort Phases. The PoC currently does this in the sprite queue system. It sort of makes sense to do it on the MainPassNode, but it currently (intentionally) only has read access to World data to allow nodes to be processed in parallel.

Next Steps for Productionizing PoC

If we decide to take the "extend bevy_render" path, this is some work we'd need to do to make it "production ready"

Everything in previous section
Remove "command hack extract step" in favor of true multi-world systems
- Enables access to "render world" Resources in the extract step
- Removes the need for spawn_and_forget hack. Commands would allocate entity ids using the "render world" entity allocator
- Synergies with other "multiple world" scenarios instead of context-specific hacks
Ergonomic Custom Shaders
- How will users easily bind component and resource data to "custom shader materials"?
- Will this data be re-usable across shaders and contexts? If so, how?
- How will custom shaders bind bevy-provided uniforms like Camera data?
- Some built in shaders like "sprite.vert" use custom vertex data instead of uniforms. How will users write custom "sprite shaders"? Godot accomplishes this by have specialized "custom shader types". Ex: a "custom sky shader" is slotted into a "custom sky template" and has different bound data and restrictions than a custom sprite shader.
RAII for render resources like TextureId and BufferId. Users shouldn't need to remember to manually free resources
Consider making Draw generic on the "drawn thing" instead of a single static Drawable type. Some "things" won't need a draw_key or a sort_key. Others might need more data (such as distance). Rafx solves this by just including all of these properties on all RenderFeatureSubmitNodes, but adding some flexibility could be nice.
Looking up assets for each entity could be faster. Hashing is expensive. If we can move to normal arrays of assets (probably via generation indexing), we can lighten that load.

Discussion Kickoff

I'd like a general critique of the current PoC / the vision outlined in the "next steps" sections (please focus on critiquing the "vision" over the current PoC)
wgpu vs rafx-api
- I really like how simple and flat rafx-api is. Adding new features feels way less overwhelming and the dependency tree is smaller than wgpu. Adding a feature to wgpu involves: (1) exposing it in each gfx-backend (2) exposing it in gfx-hal (3) exposing it in wgpu-core (4) exposing it in wgpu-rs. Adding a feature to rafx is just (1) expose it in each rafx-backend (2) expose it in rafx-api.
- I really like that wgpu has a lot of momentum, a full time developer, and a higher level "safe" api. rafx-api is comparatively more low level, untested, and unsafe. It puts more burden on us to manage things like sync points, descriptor sets per-swap chain, etc.
- rafx-api currently supports OpenGl and WebGL. wgpu's WebGL impl is not usable for anything real, which means we need to either invest in making that work, or continue supporting a custom bevy_render backend.
- wgpu's primary purpose is to be a WebGPU impl. They plan on adding more safety checks for the web, which might affect native performance too (but this is just speculation).
- wgpu has more backends implemented: Dx12, Dx11, OpenGl, Vulkan, Metal (although some of these don't work as well as others)
- wgpu supports multiple backends in the same binary with runtime "platform support" detection. this enables you to distribute a single Windows binary with both Vulkan, Dx12, and Dx11 support. If the platform only supports Dx11, it will pick that. It will prefer "better" backends first. rafx-api doesn't have this runtime platform detection and only supports one backend per binary. This might work eventually when vulkan is supported everywhere. But that doesn't work when shipping today.
- wgpu supports "render device features" and "render feature detection". it enables you to build backend-agnostic apis like "OpenXR support" into wgpu, then try to request a device with that feature (which will fail if the current system / backend does not support that feature). rafx does not support this. feature detection is very useful for something like a general purpose game engine.
Low level bevy_render (RenderContext + RenderPass + RenderResourceContext) vs rafx-api vs wgpu
- These all fill very similar roles.
- wgpu and rafx-api both use RAII, bevy_render currently does not. I think bevy's rendering abstraction needs this to be ergonomic / user friendly.
- wgpu and rafx-api are both "static" apis. the only way to add a backend is to fork the code
- bevy_render is a "dynamic" api. Downstream users can implement their own backends without forking bevy_render. This is nice, but its uncommon enough that I'm not sure its worth the added complexity / dynamic overhead. A dynamic api implies that backend-handles can't be stored in the "generic interface handle". This means a bevy_id -> backend_id mapping is required, which adds a lot of extra RwLock synchronization and hashing.
- Long term I think rafx-api is probably the right abstraction for us. It fills the same role as low-level bevy_render, but with slightly lower level abstractions. Ultimately I planned on implementing things like a direct Vulkan backend for bevy_render to improve perf / increase flexibility. Rafx-api is basically that! I think we should seriously consider swapping out the low-level bevy_render RenderContext / RenderResourceContext and the wgpu impls for rafx-api. This makes this system as a whole much simpler:
  - current: bevy_render -> wgpu-rs backend impl -> wgpu-rs-> wgpu-core-> gfx-hal-> gfx-backend-vulkan -> ash (vulkan)
  - new: rafx-api-> rafx-vulkan-backend -> ash (vulkan)
High Level bevy_render vs rafx-framework
- rafx-framework is shaping up to be a solid "fully featured" high level render abstraction. It handles general pipelined dataflow, multiple views, abstractions over rafx-api to make it easier to work with, and "immediate mode" render graphs. Recently they've been working on improving modularity with "render plugins".
- rafx-framework introduces a lot of jargon. I have plenty of context when it comes to both modern rendering and the rafx codebase and its still intimidating to scroll through render features and try to understand whats happening and how the pieces fit together. I think the bevy PoC Sprite impl (which is very comparable to the rafx impl) has less type complexity, is much smaller (even accounting for things that haven't been abstracted out yet, such as shader construction), and is more legible. However we also aren't accounting for multiple views yet, which will certainly add some code and complexity (although I don't expect it will be much). Rafx also provides more "structure", whereas the PoC is more "ad-hoc" at the moment.
- rafx-framework aims to be "generically usable" in many contexts (ex: with any ecs framework). This makes interacting with ecs data a little more cumbersome. It also results in a number of concepts that are redundant with bevy_ecs concepts
  - Their own bevy_ecs::Resource-like abstraction (resource_map) and associated reference pointers
  - rafx RenderObjects have parallels with Entities
- rafx-framework uses a lot of macros like declare_render_feature!(SpriteRenderFeature, SPRITE_FEATURE_INDEX), which I would very much prefer to not use in bevy, as I try to avoid macros whenever possible.
- rafx-framework has baked in ways to enable / disable "render features" like "mesh rendering" (including hooks to automatically disable features when relevant). In bevy, it is currently the implementor's job to detect when work shouldn't be done and not do the work. Theres no way to "turn off a render feature" as a high level thing. Baking this in seems nice / valuable, but I'm not yet convinced we need it.
- I have concerns about the "modularity" of rafx-framework. The render graph is constructed on the fly "immediate mode style", so theres no way to build up a graph statically across plugins. Additionally, render features are very "scoped". Its hard to share data between them because the data is stored in the extract job / prepare job / etc. By storing data directly in the bevy_ecs Word (like the PoC currently does), we enable plugin developers to export data that is consumable by other plugin developers. Additionally "render entities" can be used to aggregate data for "render objects" across plugins. This should make it easier to, say, draw a sprite with a custom render plugin that reuses data bindings generated by the main SpritePlugin.
- My current take is that we shouldn't use rafx-framework (at least, not entirely). I do think that long term we should consider adopting rafx-api, and we might be able to use some higher level rafx stuff too. But I think by optimizing for tight integration with bevy apis, we can have a better system (for bevy specifically).

aevyrie · 2021-05-28T07:49:47Z

aevyrie
May 28, 2021
Collaborator

I'd like a general critique of the current PoC / the vision outlined in the "next steps" sections (please focus on critiquing the "vision" over the current PoC)

Beyond the technical merits you've outlined for these three paths, I'd like to discuss the use of the two potential dependencies - rafx and wgpu. You've touched on this in a few different places, but I think it might be worth highlighting on its own. I can distill my thoughts down to two areas: goals and risk.

I hope this doesn't come off as a platitude, but how well do Bevy's goals align with wgpu and rafx? Assuming the two projects continue in perpetuity, are they - or at least the sub-APIs we intend to use - going to meet our needs? You've outlined the current state in the "wgpu vs rafx-api" section really well, but I'm also curious of the future goals of the projects. Is wgpu aiming to be a general-purpose renderer? Is rafx trying to target certain use-cases? Are either of them hoping to target consoles? Is there a risk Bevy will diverge with their stated goals?

While rafx looks very promising in your technical comparison, it also looks like a much higher risk for abandonment. That's not to say I expect the maintainers to just walk away from it, but wgpu has, as you said, a full-time developer and a much more investment. Are we prepared to fork/take ownership of rafx if it loses steam? On top of that, wgpu already has an ecosystem growing around it - learning materials, community, example content. Those things have a lot of hard-to-quantify value, especially for an open-source project like Bevy that will really benefit from having a larger community of contributors to draw from.

I understand you mentioned most, if not all, of these points. However, my perceived risk of using rafx long term gives me pause. That said, I don't know what that value tradeoff is; maybe there are technical merits of rafx that outweigh the associated risk!

3 replies

cart May 28, 2021
Maintainer Author

Is wgpu aiming to be a general-purpose renderer?

No. I think it will continue to occupy the same space it currently does (low level platform agnostic render api)

Is rafx trying to target certain use-cases?

rafx-framework aims to be a generic "mid-level" renderer (and they intend to implement "modern rendering" features on top of that). The scope rafx-api is more along the lines of wgpu and i doubt that will change.

Are either of them hoping to target consoles?

I think both are open to this, but no work is currently being done / it isn't a priority.

Is there a risk Bevy will diverge with their stated goals?

If we depend on "low level" apis like wgpu or rafx-api, the risk is very low.

While rafx looks very promising in your technical comparison, it also looks like a much higher risk for abandonment.

The risk of abandonment is higher, but I'm perfectly comfortable pulling rafx-api under the bevy umbrella and maintaining it (due to its simplicity). The same can't be said for wgpu.

However, my perceived risk of using rafx long term gives me pause.

Adopting rafx-api is a lot like adopting bevy_render in terms of risk. We need something at that level, and the overall complexity isn't high enough to scare me.

superdump May 28, 2021
Collaborator

Adopting rafx-api is a lot like adopting bevy_render in terms of risk

I find statements like this a bit confusing as I've considered the role of bevy_render to be more of the rendering infrastructure and less of the graphics API abstraction. What am I missing?

cart May 31, 2021
Maintainer Author

bevy_render currently includes a low level graphics api abstraction that basically directly mirrors wgpu. It also includes higher level stuff. I was talking about the "low level bevy_render api" in that comment.

superdump · 2021-05-28T08:52:03Z

superdump
May 28, 2021
Collaborator

Thanks for sharing this @cart! As a note, I am generally positive to a lot of the things you've written. To try to be more brief I will avoid +1-ing things and try to only comment on things where I think I have something to add.

In my opinion the current bevy_render gets a lot of things right:
* Modular render logic (via the Render Graph)

The modularity is going in a good direction, but I'm not sure full render graphs can be implemented in a way that does not require writing the render graph code to glue the pieces together in a way that is efficient and makes sense.

Also, modifying things most of the time means modifying the main pass to make use of the thing you added so without a way of composing shaders, people will anyway have to fork the main pass shaders and adapt them to all their pieces.

* Multiple backends (everything supported by bevy_wgpu as well as a custom WebGL 2 backend)

For transparency and context on my opinions - I'm a bit double on abstracting away external dependencies. In the long term it's useful to be able to swap out backends without having to change user code. But it also adds another layer (or multiple layers) on top where features have to be added to expose underlying functionality in the external dependency. It risks lowest common denominator abstraction that limits what can be done. That said, the existence of bevy_render, bevy_wgpu doesn't require their use, so this abstraction approach doesn't have to limit developers, rather provides an option and solid default.

However it also has a number of significant shortcomings:

* The "high level ease of use" comes at the cost of significant implementation complexity, performance overhead, and invented jargon (RenderResources / RenderResourcesNode / AssetRenderResourcesNode / RenderResourceBindings are notable offenders). Users are often overwhelmed when trying to operate at any level but "high level".

The automatic bindings was a nice idea but as noted previously, debugging it is very difficult. Also, @mtsr added a GlobalRenderResourcesNode in a PR to be able to bind ECS Resources but trying to understand how to add texture support to it by looking at the implementations of the other resource nodes was really difficult.

Bindings must be flexible and simple to implement in order to support any custom shader work, whether it's custom materials, main renderer techniques, post-processing, whatever. Getting data in/out and routing it to the right place needs to be simple for good developer UX. If we don't have a good 'automatic' way of binding, then there should exist a simple, explicit, manual way of doing it to enable the exceptions.

* Features like "sprite rendering" are built on the high level abstractions mentioned above. Performance is _not good_ when compared to other options in the ecosystem. A high-end machine running BevyMark (our implementation of the popular "bunnymark" sprite rendering benchmark) can render ~8,000 sprites before dropping below 60fps. My [godot-bunnymark](https://github.com/cart/godot3-bunnymark) implementation (now a bit out of date) renders ~20,000 godot Sprites with GDScript and ~40,000 Sprites with C++. I hear rafx can currently hit ~53,000.

I asked on Discord but I think it's useful context to the discussion of the prototype later - why did you choose to implement drawing as many sprites on screen at 60fps as possible as the prototype to focus on? Is it because this is a good basic test of renderer overhead?

Bevy is now being used at a scale where these shortcomings are no longer acceptable. Its time to rework our rendering abstractions. My goals for this rework:

* **Modular**: Most general purpose engines struggle with this one. They often just provide "fixed" render logic with a few limited hooks for custom behaviors (ex: Godot ShaderMaterials). Engines that provide the option for "custom render paths" often require those paths to be defined by a top-level "render path main function" (see unity and armory). I believe a middle ground exists where logic can be composed in a modular way.

I like the sound of this goal, and I appreciate prioritising UX for whoever the users are. I think perhaps the render resources stuff was 'rushing' into a higher-level solution before the fundamental pieces were in place. I think if we can find the right fundamentals, we will be able to support good convenience APIs as well as providing still-simple-but-probably-verbose fallbacks for when those don't work. If a convenience API doesn't work for someone's use case, we don't want to leave them hanging.

3. Fully adopt rafx-framework
   
   * Embrace rafx's mid/high level abstractions

I'm trying out rafx by implementing the SSAO stuff I've been doing in a fork of bevy, in the rafx demo. I wanted to get a feel for its APIs to help me learn how things are in rafx, other ways of doing these things, generally get more knowledge about renderers to be able to have more informed / less naive opinions.

SubApps

To enable a separate "app world", "app schedule", "render world", and "render schedule", I added SubApps. SubApps have their own World and Schedule. They are owned by the main App. Currently they are identified by an integer index for simplicity of implementation, but a final implementation would probably use ZSTs for identifiers.

What are ZSTs?

This is a neat idea and looks clean from the PoC code. This solution allows us to get the data from the app world, put it into data structures that help prepare the data for rendering, do that preparation, and render.

impl Plugin for SpritePlugin {
    fn build(&self, app: &mut App) {
        // this is registered on the "main app"
        app.register_type::<Sprite>();

        // this adds systems and resources to the "render app"
        let render_app = app.sub_app_mut(0);
        render_app
            .add_system_to_stage(RenderStage::Extract, render::extract_sprites.system())
            .add_system_to_stage(RenderStage::Prepare, render::prepare_sprites.system())
            .add_system_to_stage(RenderStage::Queue, queue_sprites.system())
            .init_resource::<SpriteShaders>()
            .init_resource::<SpriteBuffers>();
    }
}

This looks clean and should encapsulate getting the data from the app, to the render systems, bound and ready for rendering. I like it.

Currently I don't actually parallel-pipeline subapp execution because it will involve some thought on how we interact with winit. But the pieces are all there / the dataflow is defined in the right way.

When reading, it was clear to see how having the renderer as a sub-app with its own schedule would enable pipelining.

New Abstractions

* `BufferVec<T: Pod>`, `UniformVec<T: AsStd140>`, and `DynamicUniformVec<T: AsStd140>`
  
  * Simple ways to mirror vectors of gpu-compatible data to gpu buffers.

I feel like this is a stupid question because I just don't know enough but, how do texture bindings fit into this?

  * Reuses buffers across frames, provided they have enough space for each frame's data. Handles resizes "automatically"

How do you imagine we would handle updating them? It seems, similar to having many draw calls, making lots of buffer copy calls is slow. At the same time bandwidth is limited. I feel like we need some way of efficiently updating parts of such uniform vectors that tries to both minimise copy calls and amount of data copied.

  * Fills a similar niche as RenderResourcesNode and AssetRenderResourcesNode, just way simpler

I like this. Presumably these *Vecs containing only one item is also just fine?

  * Uses Crevice and bytemuck types instead of custom bevy Bytes/Byteable and RenderResources types. This improves safety and Crevice ensures that the data is padded for compatibility with glsl

Why did you 'vendor' Crevice? Did you have to make changes to it?

  * Not necessarily the final abstraction we'll use (ex: they each have their own staging buffer instead of using something like a staging belt), but they are very simple, decently efficient, and easy to use.

I think for bindings in general we need to consider the different binding rates and data sources. Rafx suggests per view/material/object bindings, as well as per-pass configuration. And from what I've been doing so far, it's seemed odd to me that I haven't been able to bind an ECS resource. I feel one should be able to bind components or resources, and if they contain handles to assets then that needs to be handled (ha! ;p ) too.

* `Draw` and `DrawFunctions`
  
  * Draw is a trait that draws a specific "thing"
    ```rust
    pub trait Draw: Send + Sync + 'static {
        fn draw(
            &mut self,
            world: &World,
            pass: &mut TrackedRenderPass,
            draw_key: usize,
            sort_key: usize,
        );
    }
    ```
  * DrawFunctions is a collection of boxed `Draw` traits

While working on SSAO with the current bevy_render, I had to use another of @mtsr's PRs to make Draw and RenderPipelines generic on the pass component, so that I could run passes over the mesh entities in a depth/normal pre-pass and in the main pass. How do you intend to handle that?

* `ParamState<(Res<A>, ResMut<B>)>`
  
  * A way to use system params without all the overhead of systems. ParamState stores cached system param state (to avoid doing that more than once). You can call  `ParamState::get(world)` to return system param values.

Where is ParamState meant to be used?

Removals and Tweaks

* Removed old RenderResources / RenderResource traits
  
  * For now, converting Components and Assets to gpu-compatible types is a manual process. We can consider alternative high level apis later, but I'd prefer to focus on making the "explicit" path nice first.

Don't forget Res (ECS resources)! :)

* Removed RenderResourcesNode and AssetRenderResourcesNode
  
  * These were extremely complicated and relied on data living on "app entities". Similar functionality is now possible using the new _much_ simpler BufferVec, UniformVec, and DynamicUniformVec

I saw the sprite and camera extraction and preparation code and it looks very clean. I like it!

* Removed RenderResourceBindings

I am happy that I won't have to debug this again. :) I'm sure I will have to debug new problems though. Hopefully the new problems need debugging less and are simpler to debug.

* Removed ShaderDefs

If it's clean to do, I'm happy this is becoming explicit too.

Drawing Sprites

* **Prepare**: produce a combined "sprite vertex buffer" and "sprite index buffer" by pre-transforming each sprite vertex by its extracted GlobalTransform + size. This cuts down on the amount of data sent to the gpu / the amount of work the shader does.

This made me pause - is it really better to reconstruct all this data on the CPU every single frame? Is the amount of data just using quads and model matrices really that much? 80k * 4*4 * 4 bytes per float is just over 5MB which doesn't seem like much data to me. Still, this isn't relevant to the discussion as this structure allows you to do whatever you like. It's a detail. :) That it is a detail is a great strength.

Pipelined BevyMark

Results are currently quite good relative to other options. The PoC can currently render ~89,000 sprites at 60fps on my machine, which is better than all of the other results listed at the beginning of this post. Theres also plenty of room for improvement. We can try drawing everything with a single draw call, moving the mesh data into the shader, replacing Asset hashing with generational indexing, etc.

This flexibility is important and great!

Next Steps for PoC

1. "Sub Graphs": define "named" render graphs with inputs and outputs, which can be programmatically executed zero or more times from within Render Graph Nodes.

I like this for defining contained units of processing. It would probably be good if we try to implement them so they can also output intermediate results for reuse in other sub-graphs.

3. Implement some render features that benefit from (1) and (2) (ex: shadow maps, viewports, etc)

I am of course interested in trying to implement SSAO within this setup. :) It doesn't exercise different views, but it does exercise different bindings.

Next Steps for Productionizing PoC

If we decide to take the "extend bevy_render" path, this is some work we'd need to do to make it "production ready"
* Ergonomic Custom Shaders
  
  * How will users easily bind component and resource data to "custom shader materials"?

YES. I care about this.

  * Will this data be re-usable across shaders and contexts? If so, how?

Good question. It makes me think about the per-view/-material/-object/-pass data again.

  * How will custom shaders bind bevy-provided uniforms like Camera data?

It should be generally available within a view, in my opinion.

  * Some built in shaders like "sprite.vert" use custom vertex data instead of uniforms. How will users write custom "sprite shaders"? Godot accomplishes this by have specialized "custom shader types". Ex: a "custom sky shader" is slotted into a "custom sky template" and has different bound data and restrictions than a custom sprite shader.

What do these templates look like in practice? They define what data is available to you and what you have to give back and then your shader can slot into the hole?

Discussion Kickoff

* wgpu vs rafx-api
  
  * I really like how simple and flat rafx-api is. Adding new features feels way less overwhelming and the dependency tree is smaller than wgpu. Adding a feature to wgpu involves: (1) exposing it in each gfx-backend (2) exposing it in gfx-hal (3) exposing it in wgpu-core (4) exposing it in wgpu-rs. Adding a feature to rafx is just (1) expose it in each rafx-backend (2) expose it in rafx-api.

I like flat APIs. I don't like deep APIs. Deep APIs too often add a lot of structure that is difficult to rework, adding new features is slow, and it's difficult to understand because of the many layers and that it exists in that one framework and is non-transferable knowledge.

I don't have an opinion on wgpu vs rafx-api yet though.

    * current: bevy_render -> wgpu-rs backend impl -> wgpu-rs-> wgpu-core-> gfx-hal-> gfx-backend-vulkan -> ash (vulkan)
    * new: rafx-api-> rafx-vulkan-backend -> ash (vulkan)

I like the flatness. I do like the simplicity of wgpu's APIs compared to raw Vulkan though. I don't know what rafx-api looks like yet.

* High Level bevy_render vs rafx-framework
  
  * rafx-framework introduces _a lot_ of jargon. I have plenty of context when it comes to both modern rendering and the rafx codebase and its still intimidating to scroll through render features and try to understand whats happening and how the pieces fit together. I think the [bevy PoC Sprite impl](https://github.com/cart/bevy/blob/pipelined-rendering/pipelined/bevy_sprite2/src) (which is very comparable to the [rafx impl](https://github.com/aclysma/rafx/tree/master/demo/src/features/sprite)) has less type complexity, is much smaller (even accounting for things that haven't been abstracted out yet, such as shader construction), and is more legible. However we also aren't accounting for multiple views yet, which will certainly add some code and complexity (although I don't expect it will be much). Rafx also provides more "structure", whereas the PoC is more "ad-hoc" at the moment.

I do think your implementation is simple. I need to look at the rafx sprite feature to compare but I'll have to do that another time.

  * rafx-framework has baked in ways to enable / disable "render features" like "mesh rendering" (including hooks to automatically disable features when relevant). In bevy, it is currently the implementor's job to detect when work shouldn't be done and not do the work. Theres no way to "turn off a render feature" as a high level thing. Baking this in seems nice / valuable, but I'm not yet convinced we need it.

This think this is good an necessary to be able to do in some way to be able to configure renderer settings. Many games require some kind of restarting of the game or so to change settings. They don't necessarily have to be something you can toggle from one frame to the next, but if it's not complicated to do, I think it's nice. If nothing else, it's nice for development to be able to toggle things on and off to compare. That makes for nice engine demo videos. :)

  * I have concerns about the "modularity" of rafx-framework. The render graph is constructed on the fly "immediate mode style", so theres no way to build up a graph statically across plugins. Additionally, render features are very "scoped". Its hard to share data between them because the data is stored in the extract job / prepare job / etc. By storing data directly in the bevy_ecs Word (like the PoC currently does), we enable plugin developers to export data that is consumable by other plugin developers. Additionally "render entities" can be used to aggregate data for "render objects" across plugins. This should make it easier to, say, draw a sprite with a custom render plugin that reuses data bindings generated by the main SpritePlugin.

I like where you're going with this.

  * My current take is that we _shouldn't_ use rafx-framework (at least, not entirely). I do think that long term we should consider adopting rafx-api, and we might be able to use some higher level rafx stuff too. But I think by optimizing for tight integration with bevy apis, we can have a better system (for bevy specifically).

My main concern about the tight integration in bevy approach is that it is a layer of abstraction on top of other things and that app developers will need to deal with the wait for features in graphics APIs to bubble up through the layers so they can start to use them. And if the layers are opinionated and want to be done well, that always takes time. Again, I recognise that if this is too much of a problem for people, then they can bypass bevy_render and related and make their own renderer that uses APIs directly, as well as being on their own with porting to/from that approach.

2 replies

bjorn3 May 28, 2021
Collaborator

What are ZSTs?

Zero sized types like () or struct Foo;.

cart May 29, 2021
Maintainer Author

The modularity is going in a good direction, but I'm not sure full render graphs can be implemented in a way that does not require writing the render graph code to glue the pieces together in a way that is efficient and makes sense.

Yup theres plenty of uncertainty here. We'll just need to experiment and see how it plays out

I'm a bit double on abstracting away external dependencies. In the long term it's useful to be able to swap out backends without having to change user code. But it also adds another layer (or multiple layers) on top where features have to be added to expose underlying functionality in the external dependency. It risks lowest common denominator abstraction that limits what can be done. That said, the existence of bevy_render, bevy_wgpu doesn't require their use, so this abstraction approach doesn't have to limit developers, rather provides an option and solid default.

I covered this on discord, but for the peanut gallery: we can try our best to make our "app logic" decoupled from bevy_render to make fully custom renderer apis as easy as "disabling bevy_render and consuming app data", but making bevy + bevy_render meet the needs of as many people as possible is my priority. And that means using an api that allows targeting multiple backends. Most people don't want to write both a Metal implementation of a feature and a Vulkan implementation. Its hard to build an ecosystem around those kinds of fractures.

Bindings must be flexible and simple to implement in order to support any custom shader work, whether it's custom materials, main renderer techniques, post-processing, whatever. Getting data in/out and routing it to the right place needs to be simple for good developer UX. If we don't have a good 'automatic' way of binding, then there should exist a simple, explicit, manual way of doing it to enable the exceptions.

Agreed

I asked on Discord but I think it's useful context to the discussion of the prototype later - why did you choose to implement drawing as many sprites on screen at 60fps as possible as the prototype to focus on? Is it because this is a good basic test of renderer overhead?

Yup!

its a basic test of rendering overhead (because drawing sprites is cheap)
sprite rendering is a known bottleneck in bevy
its easy to compare to other engines because bunnymark is so popular

If a convenience API doesn't work for someone's use case, we don't want to leave them hanging.

agreed

I feel like this is a stupid question because I just don't know enough but, how do texture bindings fit into this?

They don't. Currently nothing fills the old RenderResources niche. Organizing multiple bindings is currently an exercise for the user. BufferVec / UniformVec / DynamicUniformVec exist solely to make "cpu-> gpu list buffer" patterns easier.

How do you imagine we would handle updating them? It seems, similar to having many draw calls, making lots of buffer copy calls is slow. At the same time bandwidth is limited. I feel like we need some way of efficiently updating parts of such uniform vectors that tries to both minimise copy calls and amount of data copied.

Updates happen whole-sale right now (the entire list is copied every frame). This is "fast enough" and equivalent to doing updates where "everything is moving". Static things could be updated less frequently, but the abstractions actually allow for this because syncs are manual. You could put things that don't move into a separate list.

I like this. Presumably these *Vecs containing only one item is also just fine?

Yup!

Why did you 'vendor' Crevice? Did you have to make changes to it?

I added glam support. We will want to upstream this

think for bindings in general we need to consider the different binding rates and data sources. Rafx suggests per view/material/object bindings, as well as per-pass configuration

Yup I plan to cover this in the next stage of my experiments.

And from what I've been doing so far, it's seemed odd to me that I haven't been able to bind an ECS resource. I feel one should be able to bind components or resources, and if they contain handles to assets then that needs to be handled (ha! ;p ) too.

Yup you're touching on "high level" abstractions. We'll need to build new ones.

While working on SSAO with the current bevy_render, I had to use another of @mtsr's PRs to make Draw and RenderPipelines generic on the pass component, so that I could run passes over the mesh entities in a depth/normal pre-pass and in the main pass. How do you intend to handle that?

Draw has no relationship to the previous impl. It isn't directly tied to an entity as a component, but rather passes have "drawn things" queued up into them. There can already be a 1:many relationship between entities and passes/views.

Where is ParamState meant to be used?

Currently its used in Draw implementations to ergonomically access ECS data.

Don't forget Res (ECS resources)! :)

I haven't built high level sync abstractions yet. Currently its equally as easy to extract information from resources as it is components.

This made me pause - is it really better to reconstruct all this data on the CPU every single frame? Is the amount of data just using quads and model matrices really that much? 80k * 4*4 * 4 bytes per float is just over 5MB which doesn't seem like much data to me. Still, this isn't relevant to the discussion as this structure allows you to do whatever you like. It's a detail. :) That it is a detail is a great strength.

Yup by doing it this way is actually faster. I implemented the "old" approach too and it was more than twice as slow.

What do these templates look like in practice? They define what data is available to you and what you have to give back and then your shader can slot into the hole?

A predefined set of bindings / assumptions about vertex data, as well as additional functions made available (this is in Godot's case). Of course we can make it whatever we want it to be.

superdump · 2021-05-28T10:16:12Z

superdump
May 28, 2021
Collaborator

I think it is important to introduce views as a clear concept in the renderer. They nicely scope the execution of a graph from the perspective of a camera and should serve as a sensible point of collection for code relating to that.

On plugins as subgraphs - if the plugins are low enough level then I can imagine implementing a depth/normal prepass plugin which has depth and view space normal textures as output, and gets run for whatever views you configure it to be run for, obtaining its camera bindings from the view and culling/identifying visibility for the view, sorting from the perspective of the view, etc. Then another plugin for SSAO or other AO implementations that takes depth and normals from the depth/normal prepass, a noise texture from the app’s world (following @cart’s model), and outputs an AO texture. Again, the camera bindings are provided from the view. A blur plugin could/would run an X pass and a Y pass on the input texture and provide an output texture (maybe you provide both input and output textures so you can swap them over for a second pass to save space and/or see the intermediate state or something) and that could then be used both for SSAO and for bloom, just depending on what texture you give it. So it would need to handle different texture formats and blur the components appropriately. I just realised that the blurring for SSAO needs to have depth as well to avoid blurring AO across significant depth differences or around corners, so maybe in practice it’s a different pass or a different shader or so but if it didn’t, that kind of plugin structure sounds like a nice unit.

Render features in the Bungie Destiny architecture are more end to end though. I don’t know how they share things but it feels like a layer on top of this that says ‘I need to run depth/normal prepass, SSAO, blur, hooked up and configured in this way’ and something else says ‘I need depth/normal prepass, opaque pass, hooked up and configured in this way, and then something that hooks the SSAO into the main pass. These are fuzzy thoughts but I’m seeing hierarchical groupings.

I feel like these are really hard problems to solve up front though and it would likely be better to take this in a couple of stages where we try to pin down the foundations, then build some stuff and as we build, all that information feeds into what we need for convenience layers on top. To me it feels way too complicated to try to design up front. What do you think?

1 reply

cart May 29, 2021
Maintainer Author

This is basically the plan. I've started with the fundamentals of pipelined rendering (and slimming down bevy_render). Now we move on to more complicated things like view managment / programmatic triggering of pre-defined render logic. This design will evolve over time as we try building things on it and discover new patterns.

superdump · 2021-05-28T14:44:57Z

superdump
May 28, 2021
Collaborator

More thoughts on wgpu and rafx - I think the division of crates, if I have understood the purpose of each of them, is good:

bevy_wgpu - the adapter between bevy_render and wgpu. Similarly one could imagine a bevy_rafx-api at this level that probably uses rafx-api, right?
bevy_render - core bevy renderer infrastructure that should take things from the app ECS and be able to transfer data to the GPU, configure bindings and generate draw calls. This would be competing with the mid-level rafx-framework and high-level pipelined renderer rafx-renderer?
bevy_pbr / bevy_sprite / bevy_ui ... - specific render targets with focused goals. This approach could also be used for implementing two render targets where one is intended to be highly interoperable, and another that leverages the latest features available and aims for high fidelity, as these are commonly incompatible targets. One could imagine similar crates that implement the render graphs on top of rafx-renderer

Given this understanding of bevy and rafx's render infrastructures is correct, I see two very separate concerns as two possibly common questions and decision points:

what graphics API / graphics API abstraction do we want to use?
what renderer infrastructure do we want to use?

If we want to be able to change out renderer infrastructure without affecting app code (too much?) then we need to have types that can be used when developing apps that will be easy to map to bevy_render and/or rafx-framework. I feel like these types should/must live outside of bevy_render so that one can build things without bevy_render. Do you agree?

If we want to be able to change out the graphics API / graphics API abstraction then we will need a well-defined interface between bevy_render (or rafx-framework would need this) and said API. These seem to be bevy_wgpu, and rafx-api.

I personally am not so concerned about the rafx-api versus wgpu question. Perhaps others are and I don't mean to say this should not be discussed now if it is something that others think is important to decide at this point. However, from what I have seen of the activity or desired activity in the community, being able to build renderer features is the focus and priority need. That is the question about the high-level API, so rafx-framework or bevy_render. If we had the common types that apps use, we could build both in parallel to test that we can swap out the renderer without needing to change app code, if that is an interesting and desirable goal. Is it?

The proposed design is similar to the Bungie Destiny architecture and similar to rafx. I think that's a good thing, it seems to be a good renderer architecture for performance and flexibility. As for whether to do one, the other, or both... let the discussion continue. :D

1 reply

cart May 29, 2021
Maintainer Author

I feel like these types should/must live outside of bevy_render so that one can build things without bevy_render. Do you agree?
If we want to be able to change out the graphics API / graphics API abstraction then we will need a well-defined interface between bevy_render (or rafx-framework would need this) and said API. These seem to be bevy_wgpu, and rafx-api.

Not really. I think bevy_render should be the interface that people use to write "bevy render logic". bevy_render will build on top of a low level abstraction (wgpu, rafx-api, or the current low-level bevy_render RenderContext/RenderResourceContext), and it will add high level abstractions on top.

Ideally "bevy app logic" is separate enough from "bevy render logic" that people can just drop bevy_render and write something fully custom, but my priority is to make bevy_render generically usable, and build an ecosystem around it.

alice-i-cecile · 2021-05-28T18:40:58Z

alice-i-cecile
May 28, 2021
Maintainer

Remove AppBuilder in favor of adding methods directly to App (which made SubApps nicer to work with). This might be controversial. I'm happy to discuss alternatives.

Why would this be controversial? From my perspective working with end users, this slightly reduces boilerplate and is unlikely to have any other consequences in the common case.

6 replies

bjorn3 May 28, 2021
Collaborator

Merging AppBuilder into App would allow for adding plugins after the app has already started, which could be useful for the editor. Care has to be taken though to ensure that the startup systems run. (maybe clear the startup stage after running all startup systems and run the startup stage again after adding plugins?)

alice-i-cecile May 28, 2021
Maintainer

Only in that merging a "builder style" wrapper struct into the "built type" might rub people the wrong way. It doesn't bother me personally (and we already do this a lot in bevy).

Yeah I don't mind this at all.

Merging AppBuilder into App would allow for adding plugins after the app has already started

I was suspicious that might be the case, and it's a nice perk.

Care has to be taken though to ensure that the startup systems run. (maybe clear the startup stage after running all startup systems and run the startup stage again after adding plugins?)

This sounds like a use case for States :) TBH, I've been toying with the possibility that startup systems may generally be better modeled as states in anything but the simplest cases.

mtsr May 28, 2021

I have one minor nit, that is that we currently have some order of plugins issues due to plugins adding stuff to App(Builder) directly. One solution could be to have AppBuilder handle this correctly when building App. But it can also be done by making the Plugin trait more comprehensive (fn insert_resources(), fn register_assets(), etc).

alice-i-cecile May 29, 2021
Maintainer

Yep. See #1255 for related thoughts.

cart May 29, 2021
Maintainer Author

Yeah I think deferred plugin init is still worth considering. I think that implies that a consolidated App type would need to have some internal machinery to collect "uninitialized plugins" and convert them to "initialized plugins", but that doesn't seem terrible relative to the benefits.

alice-i-cecile · 2021-05-28T20:58:08Z

alice-i-cecile
May 28, 2021
Maintainer

ParamState<(Res, ResMut)>
A way to use system params without all the overhead of systems. ParamState stores cached system param state (to avoid doing that more than once). You can call ParamState::get(world) to return system param values.

I like this quite a bit; I think this should get its own PR independent of the rendering work.

1 reply

cart May 28, 2021
Maintainer Author

Definitely. A small amount of work is required to make it safe (ex: adding a ReadOnlySystemParam trait), but other than that its close to being in a good spot. I think we should consider adapting FunctionSystem to use ParamState too. (we could also then constrain systems to "read only", which is cool)

cart · 2021-05-29T00:24:18Z

cart
May 29, 2021
Maintainer Author

This conversation continued on Discord

To summarize:

In the immediate short term, we will continue using bevy_render + wgpu to experiment with building a custom Bevy "Render App Model". Making progress here is a higher priority than selecting a low level api (which is largely superficial in the grand scheme of things).
Wgpu meets our needs currently, with the exception of solid WebGL2 support. Continuing to use bevy_render means that we can also use bevy_webgl2.
We will try to assist with adding WebGL2 support to wgpu. As soon as wgpu's WebGL2 backend is a suitable replacement for bevy_webgl2, we will attempt to remove the low-level bevy_render abstractions in favor of using wgpu directly.
To make the direct-wgpu transition easier, we will continue to make the current low-level bevy_render abstractions a closer map to wgpu. Ideally this includes adding RAII support (but only if we can keep the implementation scoped + quick).
We can revisit wgpu vs rafx-api once rafx-api has matured a bit. I still think rafx-api is closer to the "ideal low level rendering abstraction" by nature of being simpler/flatter, but provided we can get solid web support in wgpu, there is no real pressure to make the move now. Migrating to rafx-api would also currently involve building a lot of infrastructure, because wgpu handles a number of hard problems for us (managing descriptor sets across swap chains, handling memory sync points, feature detection, multiple render backends with fallback). rafx-framework handles some of these concerns and we might be able to borrow some of that code, but it would still be "a major endeavor".
Rafx-framework is still under construction. I'm relatively convinced that Bevy should be defining its own render app model, but theres no reason to make a call at this stage. By the time we have completed our "Bevy Render App Model" experiments and explored the design space, rafx-framework will be more mature, and we can make a final call on "the ideal render app model for bevy".

3 replies

superdump May 29, 2021
Collaborator

Roughly how long do you think we’ll need to build enough of this new bevy_render architecture to be able to build render features?

superdump May 31, 2021
Collaborator

How do you intend to proceed from here with your own efforts and how would it be good for the community to contribute? At first from the discord discussion it felt like wgpu must get good WebGL2 support for the bevy_render rework to be able to be merged, and that should then be parallelisable work. But now rereading what you wrote here, it sounds like you mean that we can continue to use bevy_webgl2 for now and deprecate it when WebGL2 support in wgpu is working reliably, so we could instead put efforts into the rework. So what do you think?

cart Jun 1, 2021
Maintainer Author

Roughly how long do you think we’ll need to build enough of this new bevy_render architecture to be able to build render features?

Hard to say at this point. Ideally over the course of a couple of weeks we'll have some foundations laid and people can start attempting to build features on top / start providing feedback.

How do you intend to proceed from here with your own efforts and how would it be good for the community to contribute? At first from the discord discussion it felt like wgpu must get good WebGL2 support for the bevy_render rework to be able to be merged, and that should then be parallelisable work. But now rereading what you wrote here, it sounds like you mean that we can continue to use bevy_webgl2 for now and deprecate it when WebGL2 support in wgpu is working reliably, so we could instead put efforts into the rework. So what do you think?

I think ideally we remove "low level" bevy_render in the near future, but we don't need to do that before merging. And we can't remove it until direct wgpu is a suitable replacement for bevy_webgl2. Therefore making wgpu's webgl backend usable for bevy is a great (and uncontroversial) way for the community to help improve the graphics stack / prepare for the future.

I think the community can also start experimenting on top of the PoC / start playing with building new abstractions. The more things we try, the better. People don't need to wait for me to finish up my next round of experiments. But I also can't really provide more guidance than I already have because I need to get my hands dirty first to develop opinions.

Attempting to build render features on the experimental abstractions is another great way to contribute, but that work will need to wait until we have laid a bit more infrastructure (ex: probably a couple more weeks).

aclysma · 2021-05-29T19:52:24Z

aclysma
May 29, 2021
Collaborator

Benchmark

First, I want to make a few comments about this benchmark. As a rough litmus test for “can my system draw >10k things”, it might be ok, but it’s not a realistic workload.

Most games do not have a large number of moving entities on screen (particle systems are generally treated as a single entity). Assuming you have a large enough number of sprite entities for performance to be of concern:
- Most entities are not visible
- Most entities are not moving
Most sprites should be on some finite number of distinct Z levels (which allows them to be batched, even if they are transparent)

If you implement this benchmark naively in rafx, it will run slower. As an example lets say we draw 30k sprites/frame.. those 30k sprites result in 30k visibility structure updates per frame (because everything is moving), a visibility query across all 30k sprites that culls nothing (because everything is on screen), and unsuccessfully trying to batch the 30k sprites. (Because none of the sprites are on the same Z level).

Also keep in mind that having a visibility system results in extraction being random-access to all visible entities instead of a linear-access query across all entities. So that’s slower too.

We tried an experiment where we stripped out visibility, sprite batching, and much of the frame packet plumbing (that would enable us to split heavy jobs across threads), and exceeded the prototype’s performance by 20% (measured by number of entities before frame rate went below 60fps). It was a worthwhile learning exercise, but we believe removing these systems will be harmful for real workloads.

Keep in mind, if you have a bunch of static sprites (the common case), you can batch them together offline and treat chunks of them as single entities. This is exactly what we do in our LDTK (tile map editor) render feature. The processing happens in distill when the asset is imported. At runtime, rendering the largest LDTK example map requires no visibility updates, a query across 20 visible objects, and no vertex/index buffer allocation or sprite batching logic. This is certainly apples/oranges comparison with the benchmark. But I think just in general, backing up and asking “why do I need to render this many sprites” produces a better solution in the end. So I would be careful with this benchmark as it may lead to optimizing the wrong things.

Responding to a few comments

rafx-api doesn't have this runtime platform detection and only supports one backend per binary

You can enable multiple crate features (I.e. —features = “rafx-vulkan,rafx-metal”) to produce a binary with as many backends as you like. Then, you can attempt to initialize them in your preferred priority order, falling back to a different choice until you run out of choices.

I do think this is something that should be improved on in rafx-api. “Which backend should be preferred if multiple are available” is an opinionated choice though. I think rafx-api should be extended to provide a bit more data about the GPUs/APIs available on the system, and some other high level code should choose (possibly referencing a config file that bans certain GPUs from certain APIs due to known bugs.) I don’t see much technical risk here, just a matter of prioritization. We’re generally tacking the highest-technical-risk issues first.

rafx-framework introduces a lot of jargon

I think any abstraction that introduces new concepts (such as the ones in the destiny talk) will need names for those concepts. (For example, anything to do with visibility, or the plumbing that merges/sorts draw calls.) This is exacerbated by some concepts (like descriptor sets) needing multiple levels of abstraction (i.e. an API-specific descriptor set vs. something higher level and ref counted).

Some systems also become more complicated in order to support multi-threaded usage without excessive locking. For example in rafx-framework, you might first acquire an allocator, and then use that to create N of something else. This allows us to have a single critical section that isn’t beholden to a graphics API call returning quickly. This adds new types: the allocator, the “allocator allocator”, plumbing for some sort of chunking (which iallows locking granularity to be somewhere between “global” and “one per instance”.)

I’m not sure how bevy will avoid the same problem if it has something similar to rafx-framework in it (aside from cutting features.) While I'm sure there are some improvements that could be made, the solution is rarely simpler than the problem itself, and I think it’s easy to underestimate the complexity of this problem (assuming you want to scale - both in terms of performance and supporting a wide variety of use-cases.)

However we also aren't accounting for multiple views yet, which will certainly add some code and complexity (although I don't expect it will be much). Rafx also provides more "structure", whereas the PoC is more "ad-hoc" at the moment.

The reason rafx has more structure is because it uses the “frame packet” approach described in the destiny talk. This architecture reduces duplicated work and allows for parallel processing. (Both processing multiple features in parallel, and splitting a single feature’s workload across multiple threads)

Work is deduplicated by processing per-frame data once, and then making that available to per-view logic. This is available to use during extract and prepare phases.
The naive solution to this will likely involve significant dynamic allocation and will be difficult for the borrow checker to analyze if you try to split the work across threads (imagine a cloth simulation feature). We actually had a less rigid design before, but made it more rigid for these reasons specifically.

Visibility/culling also needs to be considered here, and the frame packet approach fits this well. I think the prototype will need a significant amount of changes and API redesign to support this.

rafx-framework uses a lot of macros

There are three macros, but they are slightly different flavors of the same thing. Here’s the code for them:
https://github.com/aclysma/rafx/blob/master/rafx-framework/src/render_features/macro_render_phase.rs
https://github.com/aclysma/rafx/blob/master/rafx-framework/src/render_features/macro_render_feature.rs
https://github.com/aclysma/rafx/blob/master/rafx-framework/src/render_features/macro_render_feature_flag.rs

The main reason they exist is to allow all phases/features to be registered at runtime with an integer that’s 0..N (friendly to array indexing and bitfields), which can be accessed from anywhere with something like UiRenderPhase::render_phase_index().

I have strong distaste for macros too! I tried very hard to avoid them, but I didn’t find a better solution that allowed BOTH non-intrusive registration of new features, but easy and cheap access to the registered index from anywhere.

The render graph is constructed on the fly "immediate mode style", so theres no way to build up a graph statically across plugins.

This is something I’d like to try to solve in the future. It will stay “immediate mode” because render graphs could change significantly from frame to frame depending on many factors. There’s no reason an immediate mode API could not also allow for “patching” the graph that has been built so far by other plugins.

Additionally, render features are very "scoped". Its hard to share data between them because the data is stored in the extract job / prepare job / etc. By storing data directly in the bevy_ecs Word (like the PoC currently does), we enable plugin developers to export data that is consumable by other plugin developers.

This is by design to permit jobs to run concurrently. Anything that needs to be shared across jobs of different types can be registered as a render resource. There’s no reason an ECS world couldn’t be a render resource, but I haven't personally found a case where sharing per-entity data across features is useful.

Miscellaneous

Implementing shadows and visibility in evenstubbed form will potentially help with scaling up the design to support the intended use cases.
Some of the wgpu stack is MPL-licensed. This has additional restrictions over bevy’s license that can include disclosure of modified source code. Some users under NDA with certain platforms might not be comfortable with this in their rendering stack.

Closing thoughts

My response unfortunately reads like a “rebuttal” but I actually think a “wait and see/stick with what worked so far” approach is best. This is a huge space to explore, and it will take a good amount of time to do it.
I think rafx is on the right path, but it’s not production ready. Ideally we would have better documentation and some community around it (i.e. discord or GitHub discussions). We’ve been holding off on this because we think we are still a bit too early in development to do either at a satisfyingly high quality level. :) More time also allows interested members in the bevy community to try rafx out in more limited ways.
I think an interested person could get pretty far today treating bevy as an ECS with “extras” and bolting on rafx for rendering/asset handling. Ideally, bevy and rafx would both be modular enough to support this without requiring massive structural changes :)

3 replies

cart May 31, 2021
Maintainer Author

Thanks for the write up. These are all good things to point out and I generally agree with your points.

My response unfortunately reads like a “rebuttal”

To me it reads less as a "rebuttal" and more as "providing context and clarity that I didn't include in my post". Given that I uncharitably threw an "rafx bevymark number" in there without this context (which I will now apologize for ... sorry 😄), I think your message helps establish the nuance of the actual situation and contextualizes many of the "required next steps" for this PoC.

but [bevymark] is not a realistic workload.

Yeah I would never pretend "bunnymark-style" benchmarks are a test of realistic workloads. They just happen to be a good (and common) way to stress test "per-moving-entity render overhead for a single material". Completely artificial: only a single material, everything is moving, etc. The reason I wanted to focus on it first is "per-moving-entity render overhead" is a major bottleneck in bevy_render. Many of the "optimizations for real world use cases" you called out would hide that overhead. Minimizing that first ensures we don't make that mistake again. Its a baseline number that will go up and down in the future based on "real world scenarios" we choose to test and the optimizations we choose to implement.

[there are no visibility / checks yet and those affect perf]

Yeah this is a big one. I assumed rafx was doing the "no visibility for sprites, just draw everything and let the gpu just crunch quads" that you discussed awhile back for the "~53k on bunnymark" number Dvd shared with me. I outlined my methodology and it seemed like they tried to reproduce a similar setup to bevymark. On my beefy-ish machine I get ~30k on the "public" rafx benchmark (which includes all of the "expensive" features that optimize for real world scenarios), which also fed in to that assumption. Maybe @DavidVonDerau can provide some context here / clarify what went in to the "~53k" number specifically / how it differs from the experiment you mentioned that stripped out "visibility, sprite batching, and some frame packet plumbing".

You can enable multiple crate features (I.e. —features = “rafx-vulkan,rafx-metal”) to produce a binary with as many backends as you like. Then, you can attempt to initialize them in your preferred priority order, falling back to a different choice until you run out of choices.

Ah yeah that makes sense. The DMs we had prior to this github discussion about "initializing multiple backends" lead me to believe that it still wasn't possible yet. We discussed how the "returns" in this constructor prevent initializing anything but the first supported backend. But yeah it definitely looks like trying one-by-one is possible now.

Also as we discussed, I agree that rafx-api shouldn't be "opinionated". It should just provide the hooks necessary to detect backend and feature support. And it sounds like it does already support backend detection. It also has per-backend feature detection, so the only major missing piece is a "unified cross-backend feature" api, which I agree isn't a technical risk (and also doesn't need to be a priority). It could also easily be implemented in userspace.

I’m not sure how bevy will avoid the same problem [jargon] if it has something similar to rafx-framework in it (aside from cutting features.) While I'm sure there are some improvements that could be made, the solution is rarely simpler than the problem itself, and I think it’s easy to underestimate the complexity of this problem (assuming you want to scale - both in terms of performance and supporting a wide variety of use-cases.)

Yeah I agree that there are still a ton of abstractions to implement and that they will need names (aka "jargon"). However the "structure" of Rafx forces a lot of this jargon into every context. I think a slightly less structured approach (ex: making "views" and "phases" an ecs component in the render world instead of a part of the "render feature") might allow users to opt-in to these things when they are needed instead of putting them there from the get-go. I have no clue how that will play out. It might end up being worse, but I feel inclined to try. Its very possible that the rafx framework approach is the best design for us! I just want to explore the space a bit.

The reason rafx has more structure is because it uses the “frame packet” approach described in the destiny talk. This architecture reduces duplicated work and allows for parallel processing. (Both processing multiple features in parallel, and splitting a single feature’s workload across multiple threads)

I think this can still work in an unstructured way. Multiple features run in parallel because features are "ecs systems" and a single feature's workloads can also be split up into multiple systems (or use parallel iterators) where appropriate.

I think the prototype will need a significant amount of changes and API redesign to support this.

Totally agreed. I think most of the hard work is still ahead of us. This initial prototype was more about setting up pipelining and stripping out old abstractions.

This is something I’d like to try to solve in the future. It will stay “immediate mode” because render graphs could change significantly from frame to frame depending on many factors. There’s no reason an immediate mode API could not also allow for “patching” the graph that has been built so far by other plugins.

Agreed. I think "immediate mode" is definitely viable and could be made modular. Static graphs just happen to align better with the general "accumulate app state as each plugin initializes" pattern we use in bevy apps.

This is by design to permit jobs to run concurrently. Anything that needs to be shared across jobs of different types can be registered as a render resource. There’s no reason an ECS world couldn’t be a render resource, but I haven't personally found a case where sharing per-entity data across features is useful.

I would argue that storing data in the ecs world also permits jobs to run concurrently (and with more granularity). The difference is that jobs are the "unit of parallelism granularity" in rafx, whereas in Bevy data dependencies come first (and are the unit of parallelism granularity).

I think sharing data across render features will become useful in the context of "accumulating per-view state". RenderPlugins will be able to attach arbitrary per-view data to view entities. Ex: a camera might have a View, ExtractedCamera, and CameraBuffers component, which downstream plugins could consume.

It might also help if we decide to treat entities as "render objects" and allow attaching arbitrary data to them. Ex: if we extract data / create bindings for all Mesh objects (encoded as entities), then a user defining custom render logic for that Mesh object could consume that binding by querying for it.

Implementing shadows and visibility in evenstubbed form will potentially help with scaling up the design to support the intended use cases.

Yup this is a good call. I'm definitely planning on building out "more interesting / challenging" abstractions like this next.

I think rafx is on the right path, but it’s not production ready. Ideally we would have better documentation and some community around it (i.e. discord or GitHub discussions). We’ve been holding off on this because we think we are still a bit too early in development to do either at a satisfyingly high quality level. :) More time also allows interested members in the bevy community to try rafx out in more limited ways.

Totally agreed. I think rafx-api is pretty much the ideal low-level rendering abstraction for us (especially once it matures). Down the road I think we should seriously consider adopting it.

I also think rafx-framework is fantastic. It already solves a bunch of hard problems, the api is nice, i trust you and your experience, and it has the benefit of drawing inspiration from the battle tested destiny renderer). Theres still a very solid chance that we will adopt it, but I've gotta do "due diligence" bevy-native designs first before just accepting it wholesale.

rafx is being developed with your goals in mind / optimizes for what you think is best for a generic rendering framework. Bevy needs to optimize for Bevy. These constraints might align, but the only way to guarantee that would have been to start with that intent / to work with us from the beginning.

DavidVonDerau May 31, 2021

@cart 53k is what I was able to get by 1.) swapping in a simpler Sprite feature that doesn't try to combine draw calls, and 2.) disabling the updates to visibility positions (but not the visibility culling itself) in the rafxmark scene.

This was the easiest test that I could do to try and more closely approximate what you said bevymark was doing on your branch without touching the actual framework code -- or in other words, this is perf that a user could achieve by writing their own Sprite feature and registering the sprites with the visibility system as-if they are static (since we know they can't move outside of the main camera frustum).

The changes to remove "visibility, sprite batching, and some frame packet plumbing" that we did over the weekend was much more destructive.

We removed framework code for handling deduplicating entities visible in multiple views into the single, unique per-frame entity.
We removed framework code for visibility more so than just turning off position updates, and made the feature pull directly from the world instead of driving it from the visibility results
We removed code related to supporting multiple views. (not all of it obviously, but code that was slowing down the feature)
We elided atomics that are used purely to uphold safety invariants. rafx can multi-thread features at a sub-feature granularity (e.g. chunks of entities), so there are collections in rafx to support that. If features only have a single entry point, those collections can be simplified to non-thread-safe variants (like a simple Vec) and you can gain that perf back.

We also made smaller optimizations that we'll port back:

Check to see if the descriptor set for the texture has changed, and if it hasn't, don't bother rebinding it.
Support writing to buffers in-place so users don't need to first push to a Vec and then copy it.
Support an optimization for when a feature doesn't need deduplication across views & skip that.

DavidVonDerau May 31, 2021

Multiple features run in parallel because features are "ecs systems" and a single feature's workloads can also be split up into multiple systems (or use parallel iterators) where appropriate.

rafx prior to PR aclysma/rafx#135 looked like this Bevy proof-of-concept, if you make the following table

rafx	bevy
extract	extract
prepare	prepare + queue
write	draw

When we tested parallelizing rafx at this level, we found issues with scaling it. Features could have drastically different execution times, e.g. 100x. Some features take so little time to run that they shouldn't be put onto their own thread. Other features took enough time that they should be broken up into chunks of entities for best performance. The changes in PR 135 made rafx more closely mirror Destiny's design specifically because we saw the need to support the type of dynamic load balancing discussed in https://advances.realtimerendering.com/destiny/gdc_2015/Tatarchuk_GDC_2015__Destiny_Renderer_web.pdf on page ~214.

The difference is that jobs are the "unit of parallelism granularity" in rafx, whereas in Bevy data dependencies come first (and are the unit of parallelism granularity).

After PR 135, the smallest unit of parallelism in rafx is not a job. Render features can be parallelized across each other, across views, and down to individual entities if desired. This doc comparing rafx to Destiny provides an overview: https://github.com/aclysma/rafx/blob/master/docs/renderer/renderer_architecture.md

When you write a render feature in rafx and it implements the extract_render_object_instance endpoint...

    fn extract_render_object_instance(
        &self,
        job_context: &mut RenderObjectsJobContext<'extract, MeshRenderObject>,
        context: &ExtractRenderObjectInstanceContext<'extract, '_, Self>,
    ) {
        let render_object_static_data = job_context
            .render_objects
            .get_id(context.render_object_id());

        let mesh_asset = self
            .asset_manager
            .committed_asset(&render_object_static_data.mesh);

        context.set_render_object_instance_data(mesh_asset.and_then(|mesh_asset| {
            let entry = self.world.entry_ref(context.object_id().into()).unwrap();
            let transform_component = entry.get_component::<TransformComponent>().unwrap();
            Some(MeshRenderObjectInstanceData {
                mesh_asset: mesh_asset.clone(),
                translation: transform_component.translation,
                rotation: transform_component.rotation,
                scale: transform_component.scale,
            })
        }));
    }

...that endpoint could be called at any granularity. It could be called in a single loop, without any parallelization, across all entities. Or every entity can be treated as a work item in a task pool. Or they could be chunked. This can be done in parallel with other features, or back-to-back synchronously.

We opted for those more structured end points because generally dropping parallel for-each loops into leaf code isn't going to be a good way to get the best performance out of hardware. The fastest parallelization will be workload dependent and hardware dependent and require a holistic "big picture" view of exactly what features are needed each frame, the relative complexity of each feature on the hardware, and the actual number of entities in view.

Unstructured parallelism also didn't support one of the goals of rafx: render-features-as-libraries. Writing the parallelization code directly into each feature makes the features harder to port between platforms (hardware) or applications (workload). Forcing features into the structured end points like Destiny means that a feature's degree of parallelism is controlled by the user.

Nilirad · 2021-06-02T17:12:11Z

Nilirad
Jun 2, 2021
Collaborator

How does rafx-api compare to gfx-hal? They both seem to have the same goal, but in this discussion gfx-hal has not been taken in consideration, which is currently the rust community standard for low-level backend-agnostic graphics API.

What's the best choice for Bevy, considering that rafx-api will improve over time?

2 replies

kvark Jun 2, 2021

gfx-hal is lower level and less opinionated, also has wider scope: Vulkan API is rich, and we target a lot of backends. Calling it a community standard is not entirely correct, since there is only a few direct users of gfx-hal. For the most part, it serves as a backend for wgpu-core and gfx-portability. Read more at gfx-rs/gfx#3768

aclysma Jun 3, 2021
Collaborator

I agree with everything @kvark said, and the linked discussion is a good one.

As a specific example of scope, gfx-hal has targeted implementing almost everything that vulkan can do, and rafx-api targets just the things that are broadly supported across many graphics APIs. (See rafx-api in pseudocode to get a feel for what is and isn't abstracted.) These approaches yield different tradeoffs. gfx-hal has a more strongly defined specification (Vulkan's spec, basically) and a broader feature set. However, this does add complexity to some backend implementations, and I imagine some features won't be on the "happy path" of the abstracted backend. With rafx-api you will certainly know if you are not on a happy path, because the abstraction won't exist and you will have to write natively against the underlying backend. I expect some people will like that, and some people won't. :)

Another difference is that rafx-api expects you to load shaders in the current backend's native format. (rafx provides tools/workflow/asset pipeline integration to automate this). gfx-hal does shader translation at runtime.

Finally, while gfx-hal and rafx-api are both somewhat similar in that they provide graphics API abstraction, they are both a single piece of their respective larger projects. The high-level goal of the rafx crates is to provide a modular renderer with a rendering pipeline appropriate for typical games, and an asset pipeline and workflow to load content into it. It is likely at some point we will bypass rafx-api for "hot" paths where a difficult to abstract feature in some API provides a major uplift in performance (subpasses/tile shaders) or savings in memory usage (render target aliasing).

Neo-Zhixing · 2021-07-13T04:48:04Z

Neo-Zhixing
Jul 13, 2021

Just adding my 2 cents: No matter what low-level API we're going to use, I would like to maintain the ability to retrieve the (dx/gl/ash) instance and directly call the APIs in my render graph.

My work involves custom voxel raytracing pipelines that are highly specialized, use many API-specific features, and differ significantly from a traditional rasterization pipeline. For example, my current implementation uses Sparse Binding and Sparse Residency for manual memory management, and I use a compute shader to directly render onto the framebuffer. Right now, I completely disabled bevy_wgpu and bevy_render and render directly onto the winit window each frame.

Ideally, I still want to take advantage of the bevy rendering pipeline for a small number of rasterized items and UI. The custom raytracing pipeline would render to an image which gets blended to the framebuffer during rasterization. Unfortunately, this is not easy to do, primarily due to the fact that wgpu hides the wgpu-core instance which hides the gfx instance which hides the ash instance. If we use rafx instead, I would imagine that it would be much easier to fully expose the underlying APIs.

1 reply

cart Jul 13, 2021
Maintainer Author

There is definitely interest in adding this type of functionality (both on our end and on the wgpu end). The current wgpu re-architecture work is making the stack simpler (and therefore easier to expose) and there is already an open pr to directly expose raw vulkan functionality: gfx-rs/wgpu#1415

Bevy Render Rework: Initial Framing and Proof of Concept #2265

cart May 28, 2021 Maintainer

bevy_render: The Current State of main

Potential Paths Forward

bevy_render Rework: Initial Proof of Concept

Render App Model

SubApps

New Abstractions

Removals and Tweaks

Drawing Sprites

Pipelined BevyMark

Next Steps for PoC

Next Steps for Productionizing PoC

Discussion Kickoff

Replies: 10 comments · 23 replies

aevyrie May 28, 2021 Collaborator

cart May 28, 2021 Maintainer Author

superdump May 28, 2021 Collaborator

cart May 31, 2021 Maintainer Author

superdump May 28, 2021 Collaborator

SubApps

New Abstractions

Removals and Tweaks

Drawing Sprites

Pipelined BevyMark

Next Steps for PoC

Next Steps for Productionizing PoC

Discussion Kickoff

bjorn3 May 28, 2021 Collaborator

cart May 29, 2021 Maintainer Author

superdump May 28, 2021 Collaborator

cart May 29, 2021 Maintainer Author

superdump May 28, 2021 Collaborator

cart May 29, 2021 Maintainer Author

alice-i-cecile May 28, 2021 Maintainer

bjorn3 May 28, 2021 Collaborator

alice-i-cecile May 28, 2021 Maintainer

mtsr May 28, 2021

alice-i-cecile May 29, 2021 Maintainer

cart May 29, 2021 Maintainer Author

alice-i-cecile May 28, 2021 Maintainer

cart May 28, 2021 Maintainer Author

cart May 29, 2021 Maintainer Author

superdump May 29, 2021 Collaborator

superdump May 31, 2021 Collaborator

cart Jun 1, 2021 Maintainer Author

aclysma May 29, 2021 Collaborator

Benchmark

Responding to a few comments

Miscellaneous

Closing thoughts

cart May 31, 2021 Maintainer Author

DavidVonDerau May 31, 2021

DavidVonDerau May 31, 2021

Nilirad Jun 2, 2021 Collaborator

kvark Jun 2, 2021

aclysma Jun 3, 2021 Collaborator

Neo-Zhixing Jul 13, 2021

cart Jul 13, 2021 Maintainer Author

cart
May 28, 2021
Maintainer

bevy_render: The Current State of `main`

`bevy_render` Rework: Initial Proof of Concept

Replies: 10 comments 23 replies

aevyrie
May 28, 2021
Collaborator

cart May 28, 2021
Maintainer Author

superdump May 28, 2021
Collaborator

cart May 31, 2021
Maintainer Author

superdump
May 28, 2021
Collaborator

bjorn3 May 28, 2021
Collaborator

cart May 29, 2021
Maintainer Author

superdump
May 28, 2021
Collaborator

cart May 29, 2021
Maintainer Author

superdump
May 28, 2021
Collaborator

cart May 29, 2021
Maintainer Author

alice-i-cecile
May 28, 2021
Maintainer

bjorn3 May 28, 2021
Collaborator

alice-i-cecile May 28, 2021
Maintainer

alice-i-cecile May 29, 2021
Maintainer

cart May 29, 2021
Maintainer Author

alice-i-cecile
May 28, 2021
Maintainer

cart May 28, 2021
Maintainer Author

cart
May 29, 2021
Maintainer Author

superdump May 29, 2021
Collaborator

superdump May 31, 2021
Collaborator

cart Jun 1, 2021
Maintainer Author

aclysma
May 29, 2021
Collaborator

cart May 31, 2021
Maintainer Author

Nilirad
Jun 2, 2021
Collaborator

aclysma Jun 3, 2021
Collaborator

Neo-Zhixing
Jul 13, 2021

cart Jul 13, 2021
Maintainer Author