-
Notifications
You must be signed in to change notification settings - Fork 279
2019 Toronto Monday
(gw, kvark)
- we intern sizes but not positions of clips and primitives
- gecko bakes the scroll offsets
- new API now allows us to ask for the scroll offsets and “unbake” them
- however, hit testing still is a problem (TODO: clarify)
- need
get_relative_transform
to be used consistently - blocked on flattening rework (TODO: resolve)
(nical, gw, kvark)
Problem 1: a task is dependent on my multiple other tasks Case: multiple drop shadows on the same text item Opportunity: only downscale the text once, use for all shadow tasks
Note: render task cache can't be used, since it doesn't handle dependencies well (scheduling issue).
- solution: schedule the RT cache as late as possible
- when dependencies are in the texture cache, we'd need to render in a pass and blit back to the texture cache
Concern: render task rect rect allocation assumes that the source is in the previous pass.
- solution: blit contents of a task across passes
- schedule as late as possible
- retain some of the render task slices/rects as opposed to clearing
Blits are expensive:
- bounds are not tight
- the perf on Intel scales with the number of pixels we touch
TODO: check ARM/Mali for when the tiles are resolved:
- does it happen if the tile is unchanged?
- what if it was just cleared?
Motivation:
- reduce redundant shadow tasks
- remove "mark for saving"
- SVG filters are expressed as graphs
Q: retain across frames? A: currently, not retaining any shadow tasks
Q: debugging tools for RT graph resolver? A: fun thing to write, given that the thing is fairly standalone N: need tooling to find out the best scheduling off-line, compare with the run-time by the number of pixels
Current "best" strategy:
- ping-pong as current WR
- schedule late
Q: incremental deployment of the new alloctor?
- first, integrate with existing behavior
- enable for shadows and other things
- play with strategies
Future:
- Since texture array, sub-manage slices. Try work with slices, not rects.
- Render pass as the whole render target, select the slice in VS.
- Try identifying the 1:1 pass work, use sub-passes. - tile cache memory limits? know how many mask slices are there - can provide the whole frame as one giant render pass
Q: can we exploit the axises and auto-rotate things?
- would be good!
- segmentation solves the problem to some extent
- could also exploit the symmetry
Rounded corners optimizations:
- only render the corners into the mask
- exploit the symmetry
- more precise bounding/geometry to reduce fill rate
- can't apply the local clip rect in this case!
- quad tree subdivision (or a regular grid) - still draw rects
- can't multiply the clip mask (TODO: discuss)
- can't apply the local clip rect in this case!
(gw, kvark, Gankro)
Ideas:
- use a test case that doesn't rely on tile elimination
- avoid unorm <-> f32 convertions
- bind as write-only more often (requires 32-bit chunks to be written)
- don't multiply clip values early, only do in the combination pass
TODO: compile a list of questions for ARM
- how exactly we can get advantage of tile elimination?
- does it work for off-screen targets?
- what are the supported formats?
- what are the states that affect it?
- is anything happening to a tile we don't touch by geometry?
- TODO
(kats, kvark, jgilbert)
Process of vendoring wgpu-rs:
- move into the tree
- improve the remote layer
- establish scripts to/from GitHub
Gecko will have 2 implementations as well, in the shape of different structs with the same virtual interface: local and remote. Differences are:
-
Client
parameter in all functions of the remote layer - Swapchain integration (unknown on Gecko side)
- Pass dependencies collection in the remote layer
Q: how do we reduce the JS calls in client apps?
Moving into Gecko:
- copy into tree as "gfx/webgpu"
- connect it into libgkrust's Cargo.toml and lib.rs
- Run
./mach vendor rust
, check complains about licensing. This will add dependencies to third_party/rust, make sure it looks sane - add build option "--enable-webgpu" similar to WR here. make the libgkrust integration stuff from step 2 behind a feature flag controlled by the build option - JG: Why allow non-webgpu builds? We don't let you build without webgl. - DM: switch ON when ready at least in some form? No need to slow down everyone for now - JG: This is done with a pref, not a build option, usually. - DM: WR was a build option at the beginning, before it was able to consider shipping anywhere - JG: We know we're going to be shipping it, and that we want a prototype to play with sooner rather than later. To that end, it seems like all downside to have this be a build option. Just leave it as a pref, if that's acceptable. (which I think it is!) - DM: OK, sounds reasonable - DM: A tricky part is selecting which backend to build with. If it's optional, we can straightforwardly enable Vulkan build on Linux CI. If it's mandatory, we'll need to resolve the backend selection logic right here, which complicates the integration a bit.
- To add a taskcluster job, first decide if you want a full Firefox build with webgpu enabled, or a standalone webgpu build. I think the former might make more sense if you just need to catch build regressions. For the latter, copy and modify the existing webrender standalone jobs such as this one - copy it into a new
taskcluster/ci/webgpu/kind.yml
file. You won't need the wrench-deps stuff, just runcargo build
in the gfx/webgpu folder
(gw)
Display Lists:
- Items
- text
- box shadow
- image
- Stacking contexts
- filters
- Clip chains
- Reference frames
Scene ("model"):
- picture tree
- spatial tree
- clip chains
Q: "scene" term vs WR capturing Q: internation? (see picture caching) Q: tile caching?
can be scrolled around
Frame ("view"):
- update spatial tree
- update picture tree
- update visibility
- update primitives
- generate render tasks
- assign passes
- batch
Submit:
- apply resource updates
- for each pass (see GPU work topic)
Q: stuff moved but not marked as changed by the debug overlay? A: could be fixed-position element that isn't cached, drawn on top
Interning key:
- item itself
- clipping
- transform
- animated properties (e.g. opacity)
Picture = (prim uuid, uuid, uuid, ..) Tiles 1024x256. Identifying the dirty regions and updating them with a scissor rect.
Q: what happens with complex regions? A: draw the whole thing
- should set the Z on tiles to reject the pixels over the valid tiles (TODO: verify)
- blog-post-like pages are still the bad case
Q: tile coordinate space? A: world. If stuff is scrolled, the positions are adjusted, so we get the same world results.
Q: how does the valid content gets into the new frame A: copied through the texture cache
New API in development to expose the scroll offsets to WR from Gecko, allows removing hacks in WR and caching more surfaces.
Clusters are build during flattening:
- bounds
- spatial node
draw_tile_frame
:
- for each pass
- bind pass n-1 as input
- for each A8 target
- draw clips
- draw blurs
- for each RGBA8 target
- draw borders
- draw alpha batches
- draw blurs
- draw scalings
Most drawing looks like:
- bind textures
- bind shader
- update VAO/instances
- draw instanced
How the shader looks: brush_solid -> brush.glsl
main() { // VS
fetch_brush();
brush_vs();
}
main() { // FS
brush_fs();
do_clip();
}
Data is passed to shaders
-
PrimitiveInstance
written to the instance buffer - 16 bytes with prim address, clip address, flags - brush common and specific data is written to the GPU cache
- read by
fetch_brush
List of all brush and scene shaders:
Rows associated with block count per element(16, 64, 512, etc).
Simple slab allocator to find a next entry after the user provided all the data via request()
.
TODO: validation could be more comprehensive
Q: do we have i32 textures? A: segmentation! primitive header: color, UVS, and GPU cache address - are written into the f32 prim header
(mstange, gw, kvark, jrmuizel, dan, ..)
Client storage on Mac:
- alignment
- don't use texture storage
- don't use it with texture in flight
- don't use it with texture data
Use as an upload vector only, not as direct texture storage.
Problems:
- stalls! no proper PBO renames
- forced format conversion: no BGRA8 internal format on Mac
Potential path to fight stalls:
- switch to Scatter
- re-initialize GPU cache texture
Q: remove GPU cache texture in favor of vertex data only?
Idea: small test suite to figure out what works well and what not on a platform (texture uploads, UBOs, depth testing, etc)