Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed nesting information in Keys #3

Open
pviotti opened this issue Sep 6, 2017 · 22 comments
Open

Embed nesting information in Keys #3

pviotti opened this issue Sep 6, 2017 · 22 comments

Comments

@pviotti
Copy link
Member

pviotti commented Sep 6, 2017

Currently, a Key is mainly a typed label without information about the position of the referenced object within a hierarchy of nested objects.
It would be useful to have Keys embed this information, so that, for instance, when writing to register("B") which was retrieved from map_aw("A") it doesn't write to a top-level register B but to its original, nested one.

@marc-shapiro
Copy link

marc-shapiro commented Sep 6, 2017 via email

@pviotti
Copy link
Member Author

pviotti commented Sep 6, 2017

My point is: if Antidote exposes an API that allows embedding objects into other objects thus forming a hierarchical data structure, it'd be better to have a way of traversing this hierarchical data structure.
The following code doesn't work because at every recursion it keeps reading the mapkey from the top level.

public void visit(Key<?> mapkey) {
    MapReadResult res = bucket.read(antidote.noTransaction(), mapkey);
    for (Key<?> k : res.keySet()) {
        if (k.getType().equals(CRDT_type.AWMAP)) {
            System.out.println("Map " + k.getKey().toStringUtf8());
            visit(k);        
        } else if (k.getType().equals(CRDT_type.LWWREG)) {
            System.out.println("Register " + k.getKey().toStringUtf8());
        }
    }
}

@peterzeller
Copy link
Member

peterzeller commented Sep 6, 2017

Probably this is not the best example, since reading recursively is easier to do on the MapReadResult, which already includes all the data. So it is not necessary to contact the database a second time:

    public void visit(MapKey mapkey) {
        MapReadResult res = bucket.read(antidote.noTransaction(), mapkey);
        visit2(res);
    }
    public void visit2(MapReadResult res) {
        for (Key<?> k : res.keySet()) {
            if (k.getType().equals(CRDT_type.AWMAP)) {
                System.out.println("Map " + k.getKey().toStringUtf8());
                visit2(res.get((MapKey) k));
            } else if (k.getType().equals(CRDT_type.LWWREG)) {
                System.out.println("Register " + k.getKey().toStringUtf8());
            }
        }
    }

The example we discussed on Slack about updating a nested value makes more sense.

I would not change the concept of Keys, but maybe think about adding a concept like a Path to a nested entry.

@pviotti
Copy link
Member Author

pviotti commented Sep 6, 2017

Right.
(As a side note, as you were mentioning on Slack, a shallow/one-level read API for maps instead of reading everything would be great - but I guess that requires changes all the way down to the data model)

@marc-shapiro
Copy link

marc-shapiro commented Sep 6, 2017 via email

@pviotti
Copy link
Member Author

pviotti commented Sep 6, 2017

This is not a proposal: the hierarchy is there in the data model, since objects can be nested. So the keys are already "hierarchical".
The proposal is about acknowledging and managing this by changing the API accordingly in the most convenient way.

how deep is the hierarchy? Unlimited?

Currently: yes.

this has an overhead on every key lookup, is this reasonable?

Well, log(n) for nested lookup and constant for shallow lookup. Sounds reasonable to me.

what is the semantics of deleting a key, does it delete all its descendants?

Currently: yes.

do we really want to bake such complex semantics into what should be a basic storage system?

It's already there. So, IMO, either we remove this embedding capabilities from the API (except, maybe, for managing metadata), or we implement a suitable way of tracking nested objects.

this proposal conflicts with Dimitri's recent idea to use (fixed-size) sub-keys for the attributes

mh no, we could always reserve special sub-keys for metadata and make it inaccessible to the data API.

@marc-shapiro
Copy link

marc-shapiro commented Sep 6, 2017 via email

@bieniusa
Copy link

bieniusa commented Sep 6, 2017

You are both right.

  1. We currently have a flat key space.
  2. We have nested CRDTs because of the Map CRDTs where the keys in the Map refer to CRDTs as values. [I probably wouldn't call them embedded, but rather nested.] In essence, Map CRDTs are recursive data structures.
  3. IMHO the question boils down to how much CRDT logic we want to have on the server side. Currently, the data store already needs to know about the type to materialise a version. Do we want to add more functionality there to reduce the complexity on programmer side? As an alternative, we could try to hide this complexity in the client library.

I like @peterzeller's suggestion of providing paths for navigation in nested maps.

@pviotti
Copy link
Member Author

pviotti commented Sep 6, 2017

store-level keys are flat; any hierarchy is at the object/application level.

We currently have a flat key space.

Yet there are keys that are not accessible from the root level, so I wouldn't say it's a flat keyspace.
For instance if I create A, and nest B into A, there's no way I can access B without passing by A. And if I delete A, B is removed too.
Take a look at this simple test case https://gist.github.com/pviotti/048db37dbef3fb70e6c01905675735dc
But please do correct me if I'm wrong.

Do we want to add more functionality there to reduce the complexity on programmer side?

+1

@marc-shapiro
Copy link

marc-shapiro commented Sep 6, 2017 via email

@marc-shapiro
Copy link

marc-shapiro commented Sep 6, 2017 via email

@preguica
Copy link

preguica commented Sep 6, 2017 via email

@bieniusa
Copy link

bieniusa commented Sep 6, 2017 via email

@marc-shapiro
Copy link

marc-shapiro commented Sep 7, 2017 via email

@pviotti
Copy link
Member Author

pviotti commented Sep 7, 2017

what are you using nested objects for?

For a file system. As a first, naive implementation, given the similarities with files and directories, I thought of using registers and maps to build a tree. And it makes some sense, as long as the API reflects and fully supports the nesting in the data model. Hence this issue.

Does B have its own key? If so, then it's an indirection, and I don't see why you couldn't access B by its key directly if you happen to know it.

AFAIK, there is no way of accessing a nested key from the top level keyspace. So no, it's not an indirection: it's nesting. But I'd welcome a confirmation of this from @bieniusa or other core developers.
(Here is a simple nesting and its result on the ops_cache ets table. )

If B does not have a key independent of A, then B is embedded in A, and the store knows nothing about it.

But this is no good: a database should know about, track and manage each and every independent data item.
And yes, as I see it, B is an independent data item, so it deserves a proper way of being addressed.
(I may start a union or an embedded keys liberation front movement... 😀)

@bieniusa
Copy link

bieniusa commented Sep 7, 2017 via email

@pviotti
Copy link
Member Author

pviotti commented Sep 7, 2017

At the moment, you can only access the top-level keys via the API, which gives you the (top-level) CRDT value and then it is up to you to deal with.

So there's no indirection. At least we got this clear.

I disagree: If you store into a sql database a binary representing a table, there is no way that you can update this table via SQL.

This is a straw man: an opaque binary representing a table embedded in a relational db is not as a non-opaque CRDT nested in Antidote.

Further, it is not only a question of what the API supports; it must also be implementable in a performant way on the data store. I have some experience with paths languages and let’s say: it is tricky, though not impossible, unless we need to deal with aliasing which is probably avoidable by design.

Ok, right.

When implementing the SwiftCloudFS, I was implementing in essence an inode data structure. This worked pretty well with non-recursive CRDTs.

I'm sure there are better designs for a file system, yet this looks like the simplest so I thought of giving it a try.
The matter of acknowledging and supporting nesting stands regardless of my file system project.

@bieniusa
Copy link

bieniusa commented Sep 7, 2017 via email

@peterzeller
Copy link
Member

One thing that might be confusing the issue is that the store itself is like a map but does not have the same interface as a map. We should unify that.

The interface is already similar, but there are some differences:

  1. The store is organized into buckets. Maybe the bucket can be seen as a map.
  2. The store/bucket cannot list all keys, remove a key, or check if a key exists. It is also not possible to read a whole bucket at once as a single value.

Further, it is not only a question of what the API supports; it must also be implementable in a performant way on the data store.

I agree with this. The cost of an operation should be transparent in the API.
We could add a method to the Java client to read/update a CRDT nested in a map, but with the current implementation in Antidote the cost would be at least linear in the size of the complete map.

@marc-shapiro
Copy link

marc-shapiro commented Sep 7, 2017 via email

@pviotti
Copy link
Member Author

pviotti commented Sep 8, 2017

The upshot is that you can either embed a CRDT object in a map entry, or you can embed a register containing an Antidote key, in which case you have indirection.

Sure, ok. Although this thread was about what you call embedding, it was indeed useful to mention and discuss indirection. Thanks for the clarification.

(5) The initial design of SwiftFS was similar to the one you describe: files and directories were directly embedded into their parent directories. We quickly realised this was too inflexible, and we moved to the indirection approach discussed in the previous paragraph. Vinh's system also uses the same design (using CRDT maps for directories in order to support concurrent updates); his thesis shows that the "embedded" design is not desirable.

Yes, I'm pretty sure I'm coding myself into a corner, yet trial and error is one's best chance when previous trials and errors are not documented. 😉

So, to recap:

  • one can either embed/nest a CRDT into container CRDT or create an indirection by storing the key of another top-level CRDT
  • indirection is based on arbitrary use of registers by the application. Therefore it does not entail support by the store
  • embedding is supported, yet there is no support for hierarchical naming and lookup. This goes beyond the initial design of Antidote and entails rather profound changes in both client and server code. Although one may argue that those changes would differentiate Antidote from the competitors, it is not clear which and how many other applications would benefit from this feature.

You can close this issue.

@marc-shapiro
Copy link

marc-shapiro commented Sep 8, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants