-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow new allocation of atoms by overriding "previously referenced but now unreferenced" atoms #205
Comments
Narsese_AtomicTermIndex can only support a max # (ex: ~2^16 for the unsigned short) of unique "atoms" in the runtime of a NAR. you can argue that this is unlikely to be exhausted for whatever use case you imagine, but: it is an incomplete imitation of AIKR. actual AIKR implementations do exist, ex: in Java and likewise entirely possible with C/C++. the use of these interned int's in ONA's "Compound" Term seems limited to 1-level deep. will include a fast and efficient alternative solution here soon. |
basically: a "radix trie bag" which is a radix trie with priority accumulators for each leaf link in a node. these would get added and subtracted when leaves in the radix trie are added or removed. the priority table in a radix trie node enables roulette selection, like sampling a markov chain. then just implement a serializer for any Term, Concept, or Task instance. there may be an existing C/C++ concurrent radix trie implementation to support safe multithreaded access. |
I agree! I didn't bother to resolve this yet since, as you said, 65K atoms are a lot so doesn't appear when the input alphabet of atoms is fixed and within that number. But I want to fix it in the long-term, but with a simple solution if possible and without performance impact. |
how were negated subterms supposed to work if compounds can only be 1-level deep? 💀 probably an immutable Term*[] array of pointers to subterms, like in Java, is inevitable. none of that 128 pre-allocated unsigned-short stuff. just an array of exact size as the number of subterms. interning terms (not just Atoms) is good for memory and can accelerate equality tests. i use a 'lossy' bag-based best-effort term interning strategy in narchy which offers > 95% hitrate while still being fully AIKR by size limit and can run indefinitely, even as new terms are introduced. for whatever reason this project avoids use of C++, there are still ways to develop OOP-like API's with C structs. this is where methods for Term classes would be helpful in accessing Compound subterms. |
Compound terms only 1 level deep? No idea where you got that from. |
So if this is supposed to be a Compound, and atoms[] are the subterms, how can a Compound have a Compound subterm? I double checked to see if atoms actually meant atoms here and yeah it is that unsigned short repr. |
how can a Compound have a Compound subterm? By having the tree structure of the compound being encoded in the array. Of course no example would work without ability to have nested compounds. Even the most simple behavior representation, |
i knew there must be some encoding that wasn't immediately obvious from the code. maybe add some comments to the Term structure explaining this 'atoms' array includes (non-atom) operators, and maybe rename the field to 'subs'. as for efficiency comparison, i don't know if you have actually measured it but given proper deduplication i think there's a chance for a pointer representation to use less memory than these 256-byte+ compounds. cache line size is typically 64 bytes, so that's already 4 cache lines in an attempt to remain cache efficient. instead the equivalent pointer graph representation could use less memory and thus less cache lines, and though the access order is not as sequential, it may cache miss less... a real performance comparison of these two representations would be interesting to see and it would be nice if ONA somehow supported both. |
No description provided.
The text was updated successfully, but these errors were encountered: