-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Succinct Tree support? #15
Comments
I haven't made much progress since then, instead focussing on improving the overall API and adding WaveletMatrices. There is an unpushed branch where I began with a BP implementation, but that hasn't gone very far yet. There is also the dev_dfuds branch which contains another attempt at Depth-First-Unary-Degree-Sequence encoding of a succinct tree, but I am not happy with that codebase and its performance, which is why I switched to BP (it's simpler and sufficient for most usecases). Maybe I will continue with it in the future. Technically the dfuds branch already implements fwd_search and bwd_search, but the implementation contains bugs iirc. Thanks for the paper references, I will have a look, especially at the last one because I am very unhappy with my current min-max tree attempts. |
Thanks for your reply! I'll take a look at the branches. I'm out of my depth here as I only learned about the existence of the whole field of succinct data structures two days ago. :) I ended up here because I was looking at optimizing xpath queries over XML of all things. BP trees seem to offer good trade-offs in that domain. Note that the code referenced by the last paper is unfortunately not available. I asked Navarro about that yesterday - if you don't know anything about succinct data structures you don't know he's super important yet and just drop him an email. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
I'm going to give some feedback based on various papers I read. This is not criticism of the code, but a bit of analysis to see whether we can learn from these papers for future improvements, or if we deviate from it, do so consciously. I'm sure you're already aware of much of this, but it is also for my own greater understanding. First the "Succinct Trees in Practice" paper. In section 2.3, it mentions the block size is "b = w / 2", where "w" is the bit length of a machine pointer (w is defined in section 1). So on 64 bit machine this would mean a block size of 32 bits. It also mentions a "superblock". A group of k blocks is a superblock. I'm having trouble figuring out what
So is b 32 or is b 512? I'm confused. I'm curious to explore the motivations for shorter and longer block sizes with this algorithm. Now to go to paper "Simple and Efficient Fully-Functional Succinct Trees": In section 2.3 "range min-max trees" it describes the simple binary tree with a base block as being the practical solution implemented for this structure (referring to "Succinct Trees in Practice" by Arroyuelo et al), so that's I think the current vers implementation as well, right? In section 2.2 it mentions primitives:
We currently implement fwdsearch and bwdsearch, but not rmq, rMQ, mincount and minselect.
It also mentions that required are:
I don't think vers has these over 10. It's used to implement I'll note that at the present time I don't think any of these additional operations vers doesn't have yet are particularly useful for my XML purposes, and they may be more trouble to implement that they're worth. The possible exceptions are One interesting section in 2.3 is:
It doesn't specify how in detail it could be done, and at first sight it would imply quite small block sizes in order to be able for the precomputed tables to be small. But it suggests a possibly more compact and/or efficient way to arrive at this information. An 8 bit lookup table looks doable. I'm also curious whether we couldn't calculate this information using bit twiddling - count_zeros() and count_ones() would seem useful in order to calculate excess for instance, and using rsvector's get_bits(). In section 2.3 it also has:
In section 4.1 it has:
I see In section 3 it then goes into a O(lg lg n) time solution. It immediately says in section 4 that:
My understanding of section 3 is that multiple mM-trees are maintained, one per bucket. The block size b is different again. Now it's
which by my calculations comes down to very small block sizes compared to 512, like 36 bits for a 1 million node tree. Then in 3.1 it describes how to do inter-bucket forward and backward searching, the details of which I haven't digested yet. In 4.1 it says the bucket size is 4 kilobytes, which allows 16 bit integers to store the excess in the nodes in each tree. v.e (total excess) is not stored, and instead v.m. and .vM (min and max excess) are stored in absolute form relative to their current bucket. So this would allow the tree to shrink a bit. Ah, more information about b: 512 and 1024 are practical and finally another hint about lookup table size.
There are also interesting details on how to implement section 3.1. The measurement sections demonstrate that the bucket approach is usually faster than the simple block-based approach, but that also depends a lot on the specifics of implementation, I think. What do you think? The lowest hanging fruit I think is in the lookup tables or other methods to determine excess, maximum and minimum excess for a block. |
not sure what they mean by that, it doesnt make a lot of sense to make blocks this small in practise (unless you're building a k-ary tree without optimization of the excess search in leaf nodes i suppose?)
The current implemenation allows the user to choose that value anyway, so I can postpone this decision until I can benchmark.
I need to think about that idea, because currently it isn't worth it to store values in the tree with less than 64 bit, since the first two nodes of the minmax tree span half the tree and so in theory could have excess of 2^63 and -2^63 respectively. If I use a multilevel approach i may be able to save bits in lower blocks where the block spans less bits in the bitvector and thus the excess must be smaller.
The missing primitives are what I mentioned above with "There are still a bunch of operations that I need to implement". It specifically needs the rmq values in the min-max tree and I have to see if it's worth to store them and how.
That is true, however support for this is planned for a future 2.0 release for other reasons anyway. But I am not actively working on that right now.
I have an optimization planned that sounds a bit like this, so maybe this resolves itself once I get to that optimization. When I tried to implement dfuds, I already implemented that optimization, but it got me into a lot of trouble with off-by-one errors, so I skipped it for now and plan to add once I have a working implementation.
If that isn't a misunderstanding, I think this gets into theoretical territory and is probably not worth it. Ideally the benchmarks find a fixed value for the block size that is a good tradeoff for all tree sizes, or we leave it to the user to decide, with some good defaults or heuristics as fallback. I don't think something this involved is worth implementing: The bitvector has a block size of 512 for its rank/select structures because in practise that is way more worth it than adjusting it by the vector length (even though the vector is technically no longer succinct with a fixed block size).
Yes, that is also already planned to be the first optimization that I will implement |
As to implementing additional operations, it depends on whether they are of much practical use to implement them. I can see practical use for The note in "Succinct Trees in Practice" is interesting concerning rmq:
It could be simpler ways to implement rmq exist now, but it may still not be worth it. I found Practical Range Mininum Queries Revisited and Faster range minimum queries but I haven't gone into this stuff.
Yes, in the implementation section it immediately goes back to using 512 and 1024 anyway! So while the buckets approach may potentially be worthwhile, we can ignore the theoretical sizes of blocks which are really all over the place. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
A terminology note: what the paper calls a "superblock" is not entirely clear to me, I think they're the same as a block in practice. What "Simple and Efficient Fully-Functional Succinct Trees" calls a bucket is something else though, as trees are stored in separate buckets, which can then allow a the representation of the min-max tree to be smaller - they're skipping the excess altogether, and store an absolute min and max in i16. If you're interested in cache locality this may be worthwhile then? |
The latter approach seems viable, the superblocks are just the higher tree levels, yes. |
Plan of action: Currently, I am quite sure I can support the standard navigation primitives (except So here is what I am planning to do: I will begin benchmarking the current implementation, especially normal navigation queries. I don't think I will focus on the level-order operations, since using BP and then traversing the tree in level order sounds dumb, just use LOUDS at that point. So I am fine if level-order traversal isn't a focus for optimization. I will then implement the lookup table optimization, because that should be relatively independent of any other optimizations and provide me with an idea of how big the block size can be. Next, I will try to implement the bucket approach of the Simple and Efficient fully-functional Trees paper. They explain how they chose bucket sizes, as you mentioned earlier, but then later fix the bucket size to 2^15 in practice, which is obviously the best choice because it lets you fix the variable sizes in the mM-tree to 16 bit, so I will be doing that as well. After I did that, I will see how I can support |
On level order: it may make sense to support this in a general DOM library; I had a need for it and added it to my Xot XML library for instance, even though its main axes are preorder. But it's definitely not a performance concern and I don't use it for XPath so that's fine. On getting access to the nth child of a node and getting the total number of children of a node: sounds good to defer this to later. I need to change my xpath bytecode engine where I go through tree axes to use iterators for other reasons too. There is a use in xpath to get a count of descendants or children quickly, but since filters are in play anyway (queries like "all nodes with this node name") the tree level support is of limited utility I think. All this is a long way of saying: your plan sounds great! Focusing on optimization what you have over additional features sounds like the right way forward. PS fm-index is now released using vers so I will retire my fork. I made progress in my thinking about how to best use it too. But I am only going to be active in a limited way over the holidays, and will pick it all up again in January. Happy to chat here though! |
I am reading through commits, enjoying seeing the progress made. One question I have is about the benchmarks: you mention benchmarking all the important operations and I see parent, previous sibling and next sibling, but to my surprise I saw last child rather than first child. I am used to implementing forward axes (children, descendants, following) using first child and next sibling, and backwards axes (ancestors, preceding) using parent and previous sibling. Maybe I read it wrong, but it made me curious - why is last child important rather than first child? |
First child is trivial because in BP the nodes are ordered depth-first, so the first child of node |
I implemented lookup tables of 8 bit, which yields 20-40% speedup depending on tree size and operation. Larger trees get less speedup because the query runtime is less dependent on the scanning of blocks in the vector, and more on the search in the mM tree. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Well so, I have implemented an 8-bit lookup scheme that can lookup all possible excess queries in an 8-bit block, as well as a 16-bit scheme that can only lookup if a 16-bit block contains the query result but then has to search the block. The 8-bit scheme requires 4 KiB for the table, the 16-bit scheme requires 128 KiB. The 16-bit lookup is faster, and may allow for larger block sizes, which will reduce the size of the mM trees, however this reduction doesn't seem to affect runtime itself, so this is a consideration of memory footprint. I have gated the 16 bit scheme behind a crate feature and intend to keep it this way to reduce binary size if the feature isn't required. At the moment there is also code to support the 16-bit scheme for 8-bit blocks but that is slower than the current 8-bit scheme and isn't possible to activate with crate features, the code just exists (it's just a few constants that are currently disabled by the 16-bit crate feature) |
Cool! I saw various commits in this direction. I am going to sit down and try to build the XML tree library with it next week. I have all the pieces to do so now I think. The XPath accelerations which brought me here are going to be more involved and require a significant rework of the XPath engine but using the new tree implementation for read only requirements should be a smaller step. I am really curious about the implications of that: compared to the non succinct approach where each node has a previous sibling, next sibling, parent and first child reference (plus a hashmap for preorder info) some core operations will be slower. But more will fit in CPU cache plus I can get preorder without additional storage. The real benefits have to do with the additional indexes I can provide on node name and text but as I said using those is more involved. |
I will work on a bucket approach for the MMTs next, maybe optimize the MMT struct size, and then look into rmq/mincount |
I recently became interested in succinct trees, in particular the balanced parentheses (BP) version with an eye on fully-functional succinct trees. I've been staring at various ways to implement fwd-search, bwd-search and rmq which appear fundamental to support various operations efficiently. I am daunted as I cannot find Rust implementations of this.
I saw an old reddit discussion where you mentioned wanting to look into succinct trees. This is basically just an inquiry about where your thinking about this is now, in the hopes that there's been secret progress on this. :) Thank you!
A brief literature reference:
The text was updated successfully, but these errors were encountered: