-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embed nesting information in Keys #3
Comments
Not sure this is a good idea.
Marc
… Le 6 sept. 2017 à 13:29, Paolo ***@***.***> a écrit :
Currently, a Key is mainly a typed label without information about the position of the referenced object within a hierarchy of nested objects.
It would be useful to have Keys embed this information, so that, for instance, when writing to register("B") which was retrieved from map_aw("A") it doesn't write to a top-level register B but to its original, nested one.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
My point is: if Antidote exposes an API that allows embedding objects into other objects thus forming a hierarchical data structure, it'd be better to have a way of traversing this hierarchical data structure. public void visit(Key<?> mapkey) {
MapReadResult res = bucket.read(antidote.noTransaction(), mapkey);
for (Key<?> k : res.keySet()) {
if (k.getType().equals(CRDT_type.AWMAP)) {
System.out.println("Map " + k.getKey().toStringUtf8());
visit(k);
} else if (k.getType().equals(CRDT_type.LWWREG)) {
System.out.println("Register " + k.getKey().toStringUtf8());
}
}
} |
Probably this is not the best example, since reading recursively is easier to do on the public void visit(MapKey mapkey) {
MapReadResult res = bucket.read(antidote.noTransaction(), mapkey);
visit2(res);
}
public void visit2(MapReadResult res) {
for (Key<?> k : res.keySet()) {
if (k.getType().equals(CRDT_type.AWMAP)) {
System.out.println("Map " + k.getKey().toStringUtf8());
visit2(res.get((MapKey) k));
} else if (k.getType().equals(CRDT_type.LWWREG)) {
System.out.println("Register " + k.getKey().toStringUtf8());
}
}
} The example we discussed on Slack about updating a nested value makes more sense. I would not change the concept of |
Right. |
I am dubious of this proposal.
If keys are hierachical, this raises a lot of questions.
how deep is the hierarchy? Unlimited?
this has an overhead on every key lookup, is this reasonable?
what is the semantics of deleting a key, does it delete all its descendants?
Assuming we can agree on reasonable answers to these questions today, do we really want to bake such complex semantics into what should be a basic storage system?
Furthermore, this proposal conflicts with Dimitri's recent idea to use (fixed-size) sub-keys for the attributes or metadata of an object.
Marc
… Le 6 sept. 2017 à 17:19, Paolo ***@***.***> a écrit :
My point is: if Antidote exposes an API that allows embedding objects into other objects thus forming a hierarchical data structure, it'd be better to have a way of traversing this hierarchical data structure.
The following code doesn't work because at every recursion it keeps reading the mapkey from the top level.
public void visit(Key<?> mapkey) {
MapReadResult res = bucket.read(antidote.noTransaction(), mapkey);
for (Key<?> k : res.keySet()) {
if (k.getType().equals(CRDT_type.AWMAP)) {
System.out.println("Map " + k.getKey().toStringUtf8());
visit(k);
} else if (k.getType().equals(CRDT_type.LWWREG)) {
System.out.println("Register " + k.getKey().toStringUtf8());
}
}
}
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#3 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGWtn2bai4vMI6RqzoST2Kw04KSVF6GXks5sfrgWgaJpZM4POOtD>.
|
This is not a proposal: the hierarchy is there in the data model, since objects can be nested. So the keys are already "hierarchical".
Currently: yes.
Well, log(n) for nested lookup and constant for shallow lookup. Sounds reasonable to me.
Currently: yes.
It's already there. So, IMO, either we remove this embedding capabilities from the API (except, maybe, for managing metadata), or we implement a suitable way of tracking nested objects.
mh no, we could always reserve special sub-keys for metadata and make it inaccessible to the data API. |
Paolo,
I must be misunderstanding something. AFAIK currently there is a one-to-one mapping from key to object. If a map object contains an embedded object, first you look up the map key, then the map entry; if that object itself contains a key, it's up to the application to iterate. Thus, store-level keys are flat; any hierarchy is at the object/application level.
1) Is my interpretation correct?
2) If so, is this what you want to change?
Le 6 sept. 2017 à 19:18, Paolo ***@***.***> a écrit :
This is not a proposal: the hierarchy is there in the data model, since objects can be nested. So the keys are already "hierarchical".
The proposal is about acknowledging and managing this by changing the API accordingly in the most convenient way.
how deep is the hierarchy? Unlimited?
Currently: yes.
If my interpretation is correct, then no, the current store is flat, there is only one level.
this has an overhead on every key lookup, is this reasonable?
Well, log(n) for nested lookup and constant for shallow lookup. Sounds reasonable to me.
Again, if my interpretation is correct, currently looking up a top-level object does not pay an overhead for the existence of a hierarchy somewhere. If I understand your proposal correctly, on every lookup the system will have to parse a hierarchical key, even if the app has no use for it.
what is the semantics of deleting a key, does it delete all its descendants?
Currently: yes.
I don't get it: currently there is no hierarchy.
do we really want to bake such complex semantics into what should be a basic storage system?
It's already there.
Please explain this to me. Where does it appear in the API or in the lookup algorithm?
So, IMO, either we remove this embedding capabilities from the API (except, maybe, for managing metadata), or we implement a suitable way of tracking nested objects.
this proposal conflicts with Dimitri's recent idea to use (fixed-size) sub-keys for the attributes
mh no, we could always reserve special sub-keys for metadata and make it inaccessible to the data API.
Hmmm.... A hierarchical subkey must be distinguishable from an attribute subkey, but we don't want to introduce any artificial limitations for either kind. I guess we could say that a hierarchy subkey starts with 0, and 1 for attribute... maybe... I just hope we won't regret this kind of decision later...
Marc
|
You are both right.
I like @peterzeller's suggestion of providing paths for navigation in nested maps. |
Yet there are keys that are not accessible from the root level, so I wouldn't say it's a flat keyspace.
+1 |
Le 6 sept. 2017 à 21:27, Annette Bieniusa ***@***.***> a écrit :
You are both right.
We currently have a flat key space.
OK
We have nested CRDTs because of the Map CRDTs where the keys in the Map refer to CRDTs as values.
Is this mandatory? Can't the map contain a CRDT object directly? in which case the word "nested" is correct. Then it's a programmer choice to either use nesting, or to use indirection as you describe.
[I probably wouldn't call them embedded, but rather nested.] In essence, Map CRDTs are recursive data structures.
nesting vs. indirection
IMHO the question boils down to how much CRDT logic we want to have on the server side. Currently, the data store already needs to know about the type to materialise a version. Do we want to add more functionality there to reduce the complexity on programmer side? As an alternative, we could try to hide this complexity in the client library.
I think the current design has a fundamental flaw in that (AFAIU) an object does not identify its type. It's up to the application accessing an object to know its type. If the app passes the wrong type parameter, who knows what happens. This is calling for bugs.
I like @peterzeller <https://github.com/peterzeller>'s suggestion of providing paths for navigation in nested maps.
I guess this is in the map interface, not in the store API, correct?
Marc
|
Does B have its own key? If so, then it's an indirection, and I don't see why you couldn't access B by its key directly if you happen to know it.
If B does not have a key independent of A, then B is embedded in A, and the store knows nothing about it. So this does not contradict the flat keyspace design.
Of course I could be misinterpreting the design since I don't have any personal experience with Antidote. Please correct me if I'm wrong.
Marc
… Le 6 sept. 2017 à 22:07, Paolo ***@***.***> a écrit :
store-level keys are flat; any hierarchy is at the object/application level.
We currently have a flat key space.
Yet there are keys that are not accessible from the root level, so I wouldn't say it's a flat keyspace.
For instance if I create A, and nest B into A, there's no way I can access B without passing by A. And if I delete A, B is removed too.
Take a look at this simple test case https://gist.github.com/pviotti/048db37dbef3fb70e6c01905675735dc <https://gist.github.com/pviotti/048db37dbef3fb70e6c01905675735dc>
But please do correct me if I'm wrong.
Do we want to add more functionality there to reduce the complexity on programmer side?
+1
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#3 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGWtn6haZXKlpN7lNbWpaeSfJIGcyBNLks5sfvtqgaJpZM4POOtD>.
|
On Wed, Sep 6, 2017 at 9:07 PM, Paolo ***@***.***> wrote:
store-level keys are flat; any hierarchy is at the object/application
level.
We currently have a flat key space.
Yet there are keys that are not accessible from the root level, so I
wouldn't say it's a flat keyspace.
For instance if I create A, and nest B into A, there's no way I can
access B without passing by A. And if I delete A, B is removed too.
Take a look at this simple test case https://gist.github.com/pviotti/
048db37dbef3fb70e6c01905675735dc
But please do correct me if I'm wrong.
We have a flat space. A key may point to a map... what goes inside a map
stays inside a map :-)
Still, I agree that it might be interesting to have an optimized way to
read or update an object inside a map. A path seems to solve the problem.
A quick question: what are you using nested objects for? Why do you need to
access an inner object independently?
… Do we want to add more functionality there to reduce the complexity on
programmer side?
+1
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AJZR8h_kf7aK5wwzK8JUcTIHj-1LBx2-ks5sfvtqgaJpZM4POOtD>
.
|
Am 06.09.2017 um 23:45 schrieb Marc Shapiro ***@***.***>:
> Le 6 sept. 2017 à 21:27, Annette Bieniusa ***@***.***> a écrit :
>
> You are both right.
>
> We currently have a flat key space.
OK
> We have nested CRDTs because of the Map CRDTs where the keys in the Map refer to CRDTs as values.
Is this mandatory? Can't the map contain a CRDT object directly? in which case the word "nested" is correct. Then it's a programmer choice to either use nesting, or to use indirection as you describe.
I have no clue what you mean.
We have nested CRDTs as I described; you can access this object by accessing the top-level parent in the nesting hierarchy and then following the keys down in the hierarchy again.
> [I probably wouldn't call them embedded, but rather nested.] In essence, Map CRDTs are recursive data structures.
nesting vs. indirection
> IMHO the question boils down to how much CRDT logic we want to have on the server side. Currently, the data store already needs to know about the type to materialise a version. Do we want to add more functionality there to reduce the complexity on programmer side? As an alternative, we could try to hide this complexity in the client library.
I think the current design has a fundamental flaw in that (AFAIU) an object does not identify its type. It's up to the application accessing an object to know its type. If the app passes the wrong type parameter, who knows what happens. This is calling for bugs.
If you try to apply the wrong type of update, you get a type error.
The semantics is pretty deterministic.
The client can always make locally some key alias to ensure that it always uses the same type.
We had a lengthy discussion in the beginning of Antidote that we need to have the keys include the type.
Otherwise, two keys with the same name but referring to CRDTs of different types could be inserted at different DCs.
What should we do in such a situation? They cannot be merged; should we keep one, none, both?
Now, this is calling for bugs.
> I like @peterzeller <https://github.com/peterzeller>'s suggestion of providing paths for navigation in nested maps.
>
I guess this is in the map interface, not in the store API, correct?
Both designs are possible. If we have it in the store API, we can probably optimise some operations, e.g. direct access to an embedded CRDT and as such transmitting less data to the client.
I haven’t thought this through yet, whether this is a safe scheme under concurrent modifications, but my gut feeling is that it should work.
|
One thing that might be confusing the issue is that the store itself is like a map but does not have the same interface as a map. We should unify that.
Marc
… Le 6 sept. 2017 à 23:32, preguica ***@***.***> a écrit :
On Wed, Sep 6, 2017 at 9:07 PM, Paolo ***@***.***> wrote:
> store-level keys are flat; any hierarchy is at the object/application
> level.
>
> We currently have a flat key space.
>
> Yet there are keys that are not accessible from the root level, so I
> wouldn't say it's a flat keyspace.
> For instance if I create A, and nest B into A, there's no way I can
> access B without passing by A. And if I delete A, B is removed too.
> Take a look at this simple test case https://gist.github.com/pviotti/
> 048db37dbef3fb70e6c01905675735dc
> But please do correct me if I'm wrong.
>
We have a flat space. A key may point to a map... what goes inside a map
stays inside a map :-)
Still, I agree that it might be interesting to have an optimized way to
read or update an object inside a map. A path seems to solve the problem.
A quick question: what are you using nested objects for? Why do you need to
access an inner object independently?
> Do we want to add more functionality there to reduce the complexity on
> programmer side?
>
> +1
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#3 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AJZR8h_kf7aK5wwzK8JUcTIHj-1LBx2-ks5sfvtqgaJpZM4POOtD>
> .
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#3 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGWtn-ZLpGhvM24fTXOt_YYq4M4ULNtWks5sfw-LgaJpZM4POOtD>.
|
For a file system. As a first, naive implementation, given the similarities with files and directories, I thought of using registers and maps to build a tree. And it makes some sense, as long as the API reflects and fully supports the nesting in the data model. Hence this issue.
AFAIK, there is no way of accessing a nested key from the top level keyspace. So no, it's not an indirection: it's nesting. But I'd welcome a confirmation of this from @bieniusa or other core developers.
But this is no good: a database should know about, track and manage each and every independent data item. |
On 7. Sep 2017, at 10:33, Paolo ***@***.***> wrote:
what are you using nested objects for?
For a file system <https://github.com/SyncFree/antidote-fs>. As a first, naive implementation, given the similarities with files and directories, I thought of using registers and maps to build a tree. And it makes some sense, as long as the API reflects and fully supports the nesting in the data model. Hence this issue.
Does B have its own key? If so, then it's an indirection, and I don't see why you couldn't access B by its key directly if you happen to know it.
AFAIK, there is no way of accessing a nested key from the top level keyspace. So no, it's not an indirection: it's nesting. But I'd welcome a confirmation of this from @bieniusa <https://github.com/bieniusa> or other core developers.
(Here <https://gist.github.com/pviotti/8af4a49c17061672290afb7a76cbae3e> is a simple nesting and its result on the ops_cache ets table. )
At the moment, you can only access the top-level keys via the API, which gives you the (top-level) CRDT value and then it is up to you to deal with.
If B does not have a key independent of A, then B is embedded in A, and the store knows nothing about it.
But this is no good: a database should know about, track and manage each and every independent data item.
And yes, as I see it, B is an independent data item, so it deserves a proper way of being addressed.
(I may start a union or an embedded keys liberation front movement... 😀
I disagree: If you store into a sql database a binary representing a table, there is no way that you can update this table via SQL.
Further, it is not only a question of what the API supports; it must also be implementable in a performant way on the data store.
I have some experience with paths languages and let’s say: it is tricky, though not impossible, unless we need to deal with aliasing which is probably avoidable by design.
When implementing the SwiftCloudFS, I was implementing in essence an inode data structure. This worked pretty well with non-recursive CRDTs.
|
So there's no indirection. At least we got this clear.
This is a straw man: an opaque binary representing a table embedded in a relational db is not as a non-opaque CRDT nested in Antidote.
Ok, right.
I'm sure there are better designs for a file system, yet this looks like the simplest so I thought of giving it a try. |
On 7. Sep 2017, at 11:17, Paolo ***@***.***> wrote:
At the moment, you can only access the top-level keys via the API, which gives you the (top-level) CRDT value and then it is up to you to deal with.
So there's no indirection. At least we got this clear.
I disagree: If you store into a sql database a binary representing a table, there is no way that you can update this table via SQL.
This is a straw man: an opaque binary representing a table embedded in a relational db is not as a non-opaque CRDT nested in Antidote.
Damn, got me ;)
However, it is necessary to make this very strong distinction between CRDTs and non-CRDT objects.
Further, it is not only a question of what the API supports; it must also be implementable in a performant way on the data store. I have some experience with paths languages and let’s say: it is tricky, though not impossible, unless we need to deal with aliasing which is probably avoidable by design.
Ok, right.
When implementing the SwiftCloudFS, I was implementing in essence an inode data structure. This worked pretty well with non-recursive CRDTs.
I'm sure there are better designs for a file system, yet this looks like the simplest so I thought of giving it a try.
The matter of acknowledging and supporting nesting stands regardless of my file system project.
Well, the idea of Antidote is to have a research data store that can be modified as needed.
You go ahead make a design and if people agree, we can talk about the tradeoffs and if somebody is volunteering to implement, we get it into the codebase.
|
The interface is already similar, but there are some differences:
I agree with this. The cost of an operation should be transparent in the API. |
Paolo,
I suspect you misunderstand the store model. The problem boils down to the fact that we use the term "key" with several meanings. To clarify, hereafter I use key only as primary index into the Antidote store, only.
Antidote is a key-CRDT store. Basically it is a flat memory that stores a single value (a typed CRDT) at a given address (a key -- a bitstring). IN other words it is a map, mapping a key to a CRDT object.
That CRDT may have different types.
(1) A register type can contain anything, to be interpreted by the application, say a sound file or a JPG image. It might also contain something that the application chooses to interpret as a Antidote key. This is what I call indirection. The application looks up a first key, which returns the content of the corresponding register, and the application now interprets its contents as a second key, which it looks up, etc.
(2) That CRDT might also be a container type. Let's say its a set. Every element in the set has its own ID, relative to that set. The ID is also a bitstring of course, but it's not an Antidote key, because it is relative to the set, not to the Antidote store.
Similarly, the container might be a map; similarly to Antidote itself, a map stores a single CRDT object at a given map ID. Again, this ID is relative to the map, so it shouldn't be confused with an Antidote key.
(3) This second-level mapped CRDT itself can be a register, and contain anything, possibly an ID in this map (indirection within the map), an ID in another map (indirection in another map), or an Antidote key (indirection in Antidote). This is the indirection case I mentioned in my previous message.
The upshot is that you can either *embed* a CRDT object in a map entry, or you can embed a register containing an Antidote key, in which case you have indirection.
(4) The standard design of a file system is as a key-inode store. Instead of arbitrary CRDTs a file system stores inodes. Each inode can either be a file, i.e. a register containing an arbitrary application-interpreted bitstring, or a directory, i.e. a map of (string) IDs to (indirect) inode numbers.
(5) The initial design of SwiftFS was similar to the one you describe: files and directories were directly embedded into their parent directories. We quickly realised this was too inflexible, and we moved to the indirection approach discussed in the previous paragraph. Vinh's system also uses the same design (using CRDT maps for directories in order to support concurrent updates); his thesis shows that the "embedded" design is not desirable.
Marc
… Le 7 sept. 2017 à 10:33, Paolo ***@***.***> a écrit :
what are you using nested objects for?
For a file system <https://github.com/SyncFree/antidote-fs>. As a first, naive implementation, given the similarities with files and directories, I thought of using registers and maps to build a tree. And it makes some sense, as long as the API reflects and fully supports the nesting in the data model. Hence this issue.
Does B have its own key? If so, then it's an indirection, and I don't see why you couldn't access B by its key directly if you happen to know it.
AFAIK, there is no way of accessing a nested key from the top level keyspace. So no, it's not an indirection: it's nesting. But I'd welcome a confirmation of this from @bieniusa <https://github.com/bieniusa> or other core developers.
(Here <https://gist.github.com/pviotti/8af4a49c17061672290afb7a76cbae3e> is a simple nesting and its result on the ops_cache ets table. )
If B does not have a key independent of A, then B is embedded in A, and the store knows nothing about it.
But this is no good: a database should know about, track and manage each and every independent data item.
And yes, as I see it, B is an independent data item, so it deserves a proper way of being addressed.
(I may start a union or an embedded keys liberation front movement... 😀)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#3 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGWtn9q26QAMZfbnhn0KxGHDuxvoJ1t6ks5sf6pSgaJpZM4POOtD>.
|
Sure, ok. Although this thread was about what you call embedding, it was indeed useful to mention and discuss indirection. Thanks for the clarification.
Yes, I'm pretty sure I'm coding myself into a corner, yet trial and error is one's best chance when previous trials and errors are not documented. 😉 So, to recap:
You can close this issue. |
We've converged :-)
Marc
… Le 8 sept. 2017 à 10:12, Paolo ***@***.***> a écrit :
The upshot is that you can either embed a CRDT object in a map entry, or you can embed a register containing an Antidote key, in which case you have indirection.
Sure, ok. Although this thread was about what you call embedding, it was indeed useful to mention and discuss indirection. Thanks for the clarification.
(5) The initial design of SwiftFS was similar to the one you describe: files and directories were directly embedded into their parent directories. We quickly realised this was too inflexible, and we moved to the indirection approach discussed in the previous paragraph. Vinh's system also uses the same design (using CRDT maps for directories in order to support concurrent updates); his thesis shows that the "embedded" design is not desirable.
Yes, I'm pretty sure I'm coding myself into a corner, yet trial and error is one's best chance when previous trials and errors are not documented. 😉
So, to recap:
one can either embed/nest a CRDT into container CRDT or create an indirection by storing the key of another top-level CRDT
indirection is based on arbitrary use of registers by the application. Therefore it does not entail support by the store
embedding is supported, yet there is no support for hierarchical naming and lookup. This goes beyond the initial design of Antidote and entails rather profound changes in both client and server code. Although one may argue that those changes would differentiate Antidote from the competitors, it is not clear which and how many other applications would benefit from this feature.
You can close this issue.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#3 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGWtn7c8fMK-HP3giMKXcZ5BJ-z9iXi1ks5sgPbogaJpZM4POOtD>.
|
Currently, a
Key
is mainly a typed label without information about the position of the referenced object within a hierarchy of nested objects.It would be useful to have
Key
s embed this information, so that, for instance, when writing toregister("B")
which was retrieved frommap_aw("A")
it doesn't write to a top-level registerB
but to its original, nested one.The text was updated successfully, but these errors were encountered: