Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow new allocation of atoms by overriding "previously referenced but now unreferenced" atoms #205

Open
automenta opened this issue Jun 8, 2022 · 8 comments
Milestone

Comments

@automenta
Copy link

No description provided.

@automenta automenta changed the title Term Representation is not ully AIKR Term Representation is not Fully AIKR Jun 8, 2022
@automenta
Copy link
Author

Narsese_AtomicTermIndex can only support a max # (ex: ~2^16 for the unsigned short) of unique "atoms" in the runtime of a NAR. you can argue that this is unlikely to be exhausted for whatever use case you imagine, but: it is an incomplete imitation of AIKR. actual AIKR implementations do exist, ex: in Java and likewise entirely possible with C/C++.

the use of these interned int's in ONA's "Compound" Term seems limited to 1-level deep.

will include a fast and efficient alternative solution here soon.

@automenta
Copy link
Author

basically: a "radix trie bag" which is a radix trie with priority accumulators for each leaf link in a node. these would get added and subtracted when leaves in the radix trie are added or removed. the priority table in a radix trie node enables roulette selection, like sampling a markov chain. then just implement a serializer for any Term, Concept, or Task instance. there may be an existing C/C++ concurrent radix trie implementation to support safe multithreaded access.

@patham9
Copy link
Member

patham9 commented Jun 8, 2022

I agree! I didn't bother to resolve this yet since, as you said, 65K atoms are a lot so doesn't appear when the input alphabet of atoms is fixed and within that number.

But I want to fix it in the long-term, but with a simple solution if possible and without performance impact.
I think the easiest fix is to see if an atom isn't referenced anymore (for instance via ref counter), which allows it to be overridden with a new one. I have an implementation of this in a branch actually but it wasn't totally complete yet.
Before allowing merging I would have needed to extend it to also capture atoms currently referenced by temporal implication links, which took me some time and then I was sidetracked.

@patham9 patham9 changed the title Term Representation is not Fully AIKR Allow new allocation of atoms by overriding "previously referenced but now unreferenced" atoms Jun 8, 2022
@patham9 patham9 added this to the v0.9.2 milestone Jun 8, 2022
@automenta
Copy link
Author

how were negated subterms supposed to work if compounds can only be 1-level deep? 💀

probably an immutable Term*[] array of pointers to subterms, like in Java, is inevitable. none of that 128 pre-allocated unsigned-short stuff. just an array of exact size as the number of subterms.

interning terms (not just Atoms) is good for memory and can accelerate equality tests. i use a 'lossy' bag-based best-effort term interning strategy in narchy which offers > 95% hitrate while still being fully AIKR by size limit and can run indefinitely, even as new terms are introduced.

for whatever reason this project avoids use of C++, there are still ways to develop OOP-like API's with C structs. this is where methods for Term classes would be helpful in accessing Compound subterms.

@patham9
Copy link
Member

patham9 commented Jun 9, 2022

Compound terms only 1 level deep? No idea where you got that from.
Regarding the other: not it's not inevitable, I have an alternative solution in my branch.
Regarding C++: don't get me started on that, please stay with your Java impl if you have an issue with language choice.

@automenta
Copy link
Author

typedef struct
{
    bool hashed;
    HASH_TYPE hash;
    Atom atoms[COMPOUND_TERM_SIZE_MAX];
}Term;

So if this is supposed to be a Compound, and atoms[] are the subterms, how can a Compound have a Compound subterm? I double checked to see if atoms actually meant atoms here and yeah it is that unsigned short repr.

@patham9
Copy link
Member

patham9 commented Jun 11, 2022

how can a Compound have a Compound subterm?

By having the tree structure of the compound being encoded in the array.
This is why term hashing, comparison and unification is 100% cache friendly and an order of magnitude faster in that regard than an implementation which relies on pointer/reference indirections to handle subterms.

Of course no example would work without ability to have nested compounds. Even the most simple behavior representation, <(a &/ ^opname) =/> b> is already a nested compound of a sequence within a temporal implication.
Its array representation is [=/> &/ b a ^opname]

@automenta
Copy link
Author

automenta commented Jun 11, 2022

i knew there must be some encoding that wasn't immediately obvious from the code. maybe add some comments to the Term structure explaining this 'atoms' array includes (non-atom) operators, and maybe rename the field to 'subs'.

as for efficiency comparison, i don't know if you have actually measured it but given proper deduplication i think there's a chance for a pointer representation to use less memory than these 256-byte+ compounds. cache line size is typically 64 bytes, so that's already 4 cache lines in an attempt to remain cache efficient. instead the equivalent pointer graph representation could use less memory and thus less cache lines, and though the access order is not as sequential, it may cache miss less... a real performance comparison of these two representations would be interesting to see and it would be nice if ONA somehow supported both.

@patham9 patham9 modified the milestones: v0.9.2, v0.9.3 Sep 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants