-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A hybrid, safe approach #10
Comments
Couple of immediate questions:
|
Aside: I was initially really liking the zerocopy idea (it's just like HarfBuzz and we know that works) but after the discussion today I began to think I undervalued the safety (no reinterpret_cast) of the view approach. Both still seem like good options to me. Do we have any insight into whether https://github.com/rust-lang/project-safe-transmute is likely to come through? |
yes, in this case if you need one scalar you would pay for all scalars on that structure. What I'm curious about is how often, in practice, this is a problem. Most structures are small: a handful of scalars. We only pay this cost once, and we can reuse the result. Perhaps at the leaves we end up needing to hit most fields anyway, and doing the reads all at once keeps us in L1 more? Maybe if we want to have a 100% safe API, doing the reads all at once lets us do less bounds checking? I don't know if any of these are good arguments, I'm just thinking out loud.
That last fvar field is particularly hairy, and maybe wasn't the best example. It will need to be special-cased somehow. I think my preferred implementation would be something where you just provide a custom function that calculates the parameters that describe the array (start offset, n_items, item size) and then it's a common array impl that handles calculating offsets for individual members.
I agree that there's a lot to like about the zerocopy struct model, but the various little paper cuts are pushing me back towards the view approach. It's feasible to have a no-unsafe version of the API, and we should end up with essentially the same ASM as we'd get from current HarfBuzz; modulo bounds checking, (and that is a dial we can turn) it's all going to end up being array access and BE/LE conversions.
I don't have any special insight, but I've poked around the discussion and it seems like there is some interest and a basic design, and some initial PRs have landed. It doesn't look like there are any huge blockers, but it's definitely a ways away from stabilizing. |
IMO it makes sense to prototype both view and zerocopy style. Hopefully a macro (or script or whatever generator ... just exploration for now) can be made to spit out both so we can play with them a little? One could even do a hybrid with zerocopy struct access and methods to access the fields in friendly types. IMHO the appeal of the zerocopy style goes up if a blessed implementation seems likely to come.
Isn't that only true if we hang onto the result? - for example, iiuc if I read ... idk, upem, ... repeatedly in the most naive way I might be surprised how many bytes I'm accessing |
With my perf cap on, I have no real objection to this. The largest offender here is the OS/2 table at 100 bytes for version 5. In reality, any high performance rasterizer or shaper is going to read and cache the pertinent values up front. Even if you don't persist the values across rasterization/shaping operations, the cost of reading these tables will be dominated by that of decoding outlines or applying layout subtables. I should also say that fast shaping absolutely requires auxiliary data structures such as the accelerators in HarfBuzz (swash does something different but similar; I perhaps should have described some of this during the meeting and I'm happy to do so next time).
I feel okay classifying something like |
I feel like writing is the big open question here. HarfBuzz and pinot (+swash) are functional examples of the zerocopy and view methods of reading, respectively, and I think both provide nice APIs and good performance. How does writing look with this hybrid method? I assume the |
I agree re reading upem in a tight loop. If my font root object bsearches for head every access ... maybe it needs an accelerator :D IIUC the implications are:
Is it fair to say that implies a starting preference of zerocopy > view > hybrid, and while we can argue the cost isn't enough to matter it's not fair to claim it's nothing at all? I could see choosing view anyway for safety but I'm failing to grasp an advantage that would lead us to favor this hybrid? |
Definitely one advantage of the 'just structs' model that I've been underselling (and this applies in theory both to the idea sketched out here as well as the general HarfBuzz approach) is that you have a single set of types that you can (hopefully) use when both reading and writing. Although there are still parts I'm not at all sure about (what is an
The only slight complication here is that there are two different types of macro we might use:
Agreed, although it's an unknown how that core implementation will relate to existing implementations, or what the timeframe would be.
the struct is on the stack, so no allocation in the sense of no
Slices are generally a pointer + a length, so two words. In practice that may get optimized away, but holding on to a particular view into some part of the font bytes should not be more expensive than holding on to the font bytes directly, I don't think?
This is true, and maybe I just need to actually do some measuring here. My hunch is that this cost is going to be low relative to overall runtime costs, but how about I try to get some real data and get back to you. :)
definitely don't want to make that claim! I'm just trying to tune my intuition for exactly how much weight this cost should be given.
The main advantage is it gives us the simplest API, the most idiomatic code (declaring a struct using native types and slapping a #[derive(..)]` on it) and also that it opens the possibility to reusing most types for reading and writing. Similarly the types that we generate this way are a 1:1 representation of the types in the spec, hopefully. There's one issue with zerocopy/bytemuck that we haven't discussed that much, and that's that they won't play nicely with our array types. Traits like One partial solution here is what chad did in https://github.com/dfrg/pinot/blob/eff5239018ca50290fb890a84da3dd51505da364/src/fvar1.rs, and split this up into zerocopy structs (like the header type) and then having more view-like methods for accessing collections. This means you don't have the full "this struct maps exactly to the spec" benefit, but it's closer than the view-only approach? |
It would be interesting to see in godbolt. My impression is you'd get quite a few more instructions in some models vs others. I do agree it's likely to be low relative to total ... but we're trying to beat a C++ codebase that was tuned for 10+ years so I suspect - but have not yet shown - this could prove to matter.
I was imagining this to pretty much be true for all 3 options, is that not true? - for example, I would think I can allocate a vec of some zerocopy type (perhaps when compiling) or a vec of pointers to a zerocopy type (perhaps when subsetting) and memcpy their bytes into an output buffer. |
+1
FWIW, swash is competitive with HarfBuzz here, with big wins in some cases. See: harfbuzz/harfbuzz#3221 It might be useful to enumerate priorities to make sure we’re on the same page? As I see it:
I specifically put ergonomics last because I don’t think the additional accessor calls on the endian swapping types should discredit the zerocopy model. I assume most developers will be interacting with this library through a higher level interface: compiler, subsetter, sanitizer, shaper, etc. |
zerocopy is the only one that gives us this for free for aggregates with purely blittable elements but the others can be handled by code generation. |
Wrt priorities I don't think it's quite that clean, we're balancing tradeoffs and they don't get a crisp absolute priority like that. Here's a first attempt to sketch how I see it, higher priority items listed first:
I will note that my hope is that Rust will prove to offer us a path where we don't have to make all that much of a tradeoff, we can have it all! Both zerocopy and views strike me as capable of delivering all of this .
Shouldn't we penalize view for this too? - calling .somefield() is an accessor. A zerocopy struct with pinot-style accessors that get inlined kind of appeals to me. |
Your more nuanced take looks good to me. Although I was defining hackability as being able to modify the code and ergonomics as a sort of fluency when consuming the code. In my mind, I'm drawing a sharp distinction between "I want the (possibly variable) advance width for character 'x'" and "I want to pick at the data in the hmtx/HVAR tables." IMO the former should be provided by a higher level library that presents a more comprehensive model of the font and the scope of this library should be limited to the latter. I'm not sure if we're all in agreement on this.
Agreed! I think we can get there.
My thought is that we shouldn't penalize either for this. Coincidentally, Rust permits having a field and method with the same name, so we can have our cake and eat it too. |
edit: didn't see the later replies. Glad to see the priorities expressed clearly, this is helpful. The take away for me at this point is that I need to write some code. :) Raph also mentioned potentially wanting the ability to run in 'safe mode' which I'd think of as basically 'zero unsafe code'. Do we want this to be a design goal? It does handcuff us somewhat: it rules out zerocopy, and it also means, in the view case, that each field access would likely need to be bounds checked. In any case I'm going to write up some code that generates a few different implementations, and we can compare them. One QQ: anyone have some nice ideas for some access patterns / toy programs that i can try with various implementations, to get a sense for actual perf/codegen? preferably something that is realistic, part of the hot path, and doesn't involve implementing all of GPOS/GSUB. I was thinking of maybe something like walking a string and getting the bbox for each glyph? so hitting
In terms of generated code, I think that the zerocopy struct model and the view model are very nearly equivalent, so how we weigh them against each other is going to come down to other differences like readability, reusability (for compilation, say) safety, and other secondary things like that. |
Makes sense to me, barring some compelling reason to do otherwise. |
Filed #11 to track finding out how safe transmute is going. https://github.com/jswrenn/typic, a proof of concept implementation that led to https://github.com/jswrenn/project-safe-transmute/blob/rfc/rfcs/0000-safe-transmute.md#safe-transmute-rfc, looks kind of promising. |
If we’re using macros, we should be able to feature gate on this and have the macros emit the appropriate code. This also gives us the ability to measure the performance difference between the two and potentially drop the unsafe entirely.
This sounds good to me. The format switching in |
the important question part
How much, exactly, do we dislike copying? Below I am going to outline an approach that copies/converts scalar fields greedily, as objects are accessed. With this approach, we would still avoid a lot of copying (we would only ever copy bytes on tables that we accessed, and there still quite minimally) and in exchange for spending a few cycles up front, we end up with a much more natural API.
the explanation part
So far, we have been discussing two different take on a zerocopy API: first is the HarfBuzz model, where we take some chunk of bytes and directly reinterpret them as some type
T
that has fields we can access; and second is the pinot model, where instead of a struct with fields we have a 'view' type that generates methods for each field, and those methods reach into the bytes and interpret them as a given scalar value.I like different parts of each of these approaches. For the HarfBuzz approach, I like that we end up with a struct that acts like any other (modulo the fact that you can't directly read the fields; you need to call getters) and which also is a more-or-less literal transcription of the items in the spec. For the pinot version, I like that there is no
transmute
(which is more heavily restricted in Rust than in cpp, as I understand it) and perhaps also that it lets the overall API (perhaps) be more consistent, because some things will probably make more sense as methods anyway.I've been thinking about an approach I'll call 'copy leaves'. In this approach, in a given object, we distinguish between fields that are scalars and those that are collections/references. We always copy scalars, and we always create views into references.
In this world, a declaration of fvar would look something like,
maybe this wasn't the best example; I'll include a bit more code below, but I mainly want to highlight the basics:
axes
andinstances
are 'array' types; they wrap a blob of data, and provide an array-like API for random access. Most of the time, there can be shared types to do this (theArray
, type, here) but in the specific case of the instances field on fvar we bail and use a custom type; we specify in the macro some arguments that should be passed to that type's constructor.Is this a non-starter?
For the interested, this is just me sketching out how this might work implementation-wise:
The text was updated successfully, but these errors were encountered: