Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove header from document #36

Open
inzanez opened this issue Sep 21, 2020 · 31 comments
Open

Remove header from document #36

inzanez opened this issue Sep 21, 2020 · 31 comments

Comments

@inzanez
Copy link
Contributor

inzanez commented Sep 21, 2020

Hi

does it really make sense to have the Header included in the Document<T> structure? Wouldn't it make more sense to let the user of the driver decide whether or not that part of the response is going to be deserialized? Simply by including the header properties (merged or not) in the struct T?

I currently use AQL queries all over the place, and this works quite well as it just serializes/deserializes T. So I get my headers automatically, and I can send them along to the client.

If I were to use the document level driver functions instead, I would have to copy the deserialized header parts over to my T if I needed them. And as far as I can tell, the Arango driver never actually uses the Header structure. So it should be safe to remove the Header fields and just deserialize to T.

@ManevilleF
Copy link
Contributor

You can put the _id , _key and _rev in your T structure and it would work, but it means that you can forget the automatic creation of these values and they become unreliable. The state of your application can be unsafe.

for aragog I use the header to allow the user to wrap its document in a DatabaseRecord<T> base element of the ODM/OGM and this way the user doesn't have to declare and use the vales in a weird way.

@inzanez
Copy link
Contributor Author

inzanez commented Oct 8, 2020

@EpiFouloux I think it's a difference talking from the perspective of aragog as an ODM and the driver itself. Removing the _id, _key and _rev from the default Document in arangors would only mean that users of the driver would need to specify them. So for instance, aragog would specify them (as it already does; currently it just seems to move them over to the own structure by implementation of From). So that wouldn't really make a difference to aragog as far as I'm concerned.

As I personally don't use any ODM, I'm handling with the driver directly. Which means that my structures require the definition of _id, _key and _rev as I use different AQL queries and need to deserialize these values. And now there's an inconsistency between using the AQL query directly (which doesn't deserialize any of the three attributes into a driver provided structure) and the direct document access on collections (which does deserialize the three attributes into the Header defined by the driver).
So I could also declare another From implementation for all my structures, which would work; or one could argue whether or not the driver itself should show this different behavior, as I think from a driver perspective, the values of _id, _key and _rev are not really relevant. Which means that they could be omitted and the left to the user.

Does that make sense?

@ManevilleF
Copy link
Contributor

Yeah, I guess it does and it doesn't change anything for me.
But since these values are mandatory that means that you need to set them as Option<String> in your structure and set them as None when you create a document for the first time ? Or do you need two structs? I think arangors would become weird to use

@inzanez
Copy link
Contributor Author

inzanez commented Oct 8, 2020

@EpiFouloux I would leave that as an exercise to the user of arangors. I think it is not weirder in any way than what you would do using Diesel for instance. I found many examples that actually do use two separate structures for new instances of a type (omitting the ID) and existing instances. Using an Option would be an option too ;-) And not serializing on None.

If you were to use AQL queries (instead of direct document access / insert), you were required to handle this case anyway, as arangors doesn't deserialize to Document<T> but to T.

Just assuming that I have:

struct User {
  first_name: String,
  last_name: String,
}
...
...
let u = db.aql_str("FOR u in users RETURN u");
...

that would not deserialize _id, _key or _rev. So I would need to specify them myself, or be aware of the difference between direct access (collection.document(...) which does deserialize the fields into Header) and aql_str (which doesn't deserialize them) and maybe wrap my own T (User in the example above) in the Document struct provided by arangors. Which is an option too, but that will lead to many type conversions in my application code, as I do not want to work on the Document<T> but on T usually.

So what it comes down to:

  • AQL queries as well as aql_str do not wrap the requested type in Document, so the header values are not deserialized automatically
  • Direct document access does wrap the requested type in Document

That's inconsistent in my opinion. I've been working with AQL queries mostly, and it does the job, feels nice, no conversion required (except the handling of _id, _key and _rev). So my assumption was that direct document access should work the same. But it doesn't. And that feels kind of weird really.

@ManevilleF
Copy link
Contributor

I did not notice that there was this inconstency while I did implement aql_str for my custom query system.
I think you are right, arangors is a ArangoDB driver, not a ODM and maybe 'Document' should not exist.
What does @fMeow think?

@inzanez
Copy link
Contributor Author

inzanez commented Oct 8, 2020

@EpiFouloux Just as a quick update. Assuming I created a struct like this:

#[derive(Debug, Default, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
struct User {
    #[serde(rename="_id")]
    id: String,
    #[serde(rename="_key")]
    key: String,
    #[serde(rename="_rev")]
    rev: String,
    first_name: String,
    last_name: String,
    email_address: String,
}

I could use that structure (and I currently do use it in a similar form) with aql_str and AqlQuery in general. But using it with collection.document(_key) results in a deserialization error, as arangors is doing the header part, which leaves my own header fields empty. And making them optional is not an option, as it's a pain to work with and I know that existing instances will have these fields (they actually must). So no need for option there.

I think I can perfectly well live without the restructuring of the direct document access and removal of header. But as I realized the above 'issue', I thought I'd bring it up here, as it might result in a cleaner concept in the driver itself.

@ManevilleF
Copy link
Contributor

ManevilleF commented Oct 8, 2020

When arangors builds the Document::header it takes the values and doesn't leave them in the serde_json::Value ? Are you sure about that?

Edit: Also why not use a wrapper for that? why don't you use a ODM?

@inzanez
Copy link
Contributor Author

inzanez commented Oct 8, 2020

@EpiFouloux I will paste a sample, just to make sure we talk about the same thing:

#[macro_use]
extern crate serde_derive;

use arangors::Document;

#[derive(Debug, Default, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
struct User {
    #[serde(rename="_id")]
    id: String,
    #[serde(rename="_key")]
    key: String,
    #[serde(rename="_rev")]
    rev: String,
    first_name: String,
    last_name: String,
    email_address: String,
}

fn main() {
    let conn: arangors::Connection = arangors::Connection::establish_basic_auth("http://localhost:8529", "root", "password").unwrap();

    let db = conn.db("Demo").unwrap();
    let col = db.collection("users").unwrap();
    // I know that user with _key 100000 exists!
    let x: Document<User> = col.document("100000").unwrap();

    println!("{:?}", x);
}

This will lead to:
thread 'main' panicked at 'called Result::unwrap()on anErrvalue: Serde(Error("missing field_id", line: 0, column: 0))', src/main.rs:30:52

So no. They are deserialized into the header.

If I change the struct User to the following:

#[derive(Debug, Default, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
struct User {
    first_name: String,
    last_name: String,
    email_address: String,
}

everything works fine.

@ManevilleF
Copy link
Contributor

Yeah that's what I was talking about and indeed it's a bit weird, the Document struct should be removed

@ManevilleF
Copy link
Contributor

Hey @inzanez I can make a pull request for this if you are not already doing one

@arn-the-long-beard
Copy link
Contributor

Hey guys !

Thank you for your messages 😃 . I think it is me who wrote the Header and the Document. I saw your message yesterday but I was so busy.

I wrote the API by using the official documentation and source code in the Python & Typescript version of the driver.

Can you explain me shortly what is wrong with what we have ?

@inzanez
Copy link
Contributor Author

inzanez commented Oct 9, 2020

@arn-the-long-beard Well, I don't really want to talk about 'right' and 'wrong' here, as it's not that simple. I see that also the Golang driver does the same thing we have here. Still, I ask whether or not that's good choice.

Basically we have the following situation:

  1. Using an AqlQuery will not deserialize the document header fields separately, which allows one to specify the header fields in one's own struct and retrieving them that way. This is inconvenient on one hand as one has to define the header parts, but convenient as one receives a result of T which is not wrapped in Document.
  2. Using an AqlQuery deserialzing to Document<T> will allow one not to specify the headers, as they are specified by the document. But one doesn't get T as a return value, instead one get's Document<T>. And one cannot specify the header values as part of one's own struct, because this will lead to deserialization errors.
  3. If one defines the header structure to be part of one's own struct as written 1., the direct access functions on a collection do not work, as the header information is parsed into the wrapping Document structure.

So either one is supposed to always deserialize to Document<T> when using the AQL query functions, or the API delivers inconsistent behavior.

@EpiFouloux No, I haven't started on anything yet.

@inzanez
Copy link
Contributor Author

inzanez commented Oct 9, 2020

Quick addition: Another option would be to implement Deref and DerefMut for Document. This would be convenient as it would allow for things like this:

use std::ops::Deref;

struct Document<T> {
    data: T
}

impl<T> Deref for Document<T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        &self.data
    }
}

struct User {
    name: String,
}

impl User {
    fn greet(&self) -> String {
        format!("Hello, I'm {}", self.name)
    }
}


fn main() {
    let x = Document { data: User { name: "Peter Pan".to_string() } };
    println!("{}", x.greet());
}

Without the Deref, one could not work directly on T that is encapsulated in Document. And DerefMut would be required to work with mutable reference. But then again, I'm no expert on the whole Deref topic,...but that way, we would possibly be more compatible with other implementations.

And if this is the way to go, it might make sense to update the documentation to always use the Document structure provided by arangors? This ensures that users of the driver don't create their own Header fields, which will lead to problems.

@arn-the-long-beard
Copy link
Contributor

arn-the-long-beard commented Oct 9, 2020

Thank you for your message 👍

Okay, well my brain is not fresh on arangors since I am working on other stuff now. That is why I do not see the issue right now.

But I remember that @fMeow talked about the possibility to have something similar like this :

Wouldn't it make more sense to let the user of the driver decide whether or not that part of the response is going to be deserialized?

Simply by including the header properties (merged or not) in the struct T?

I did not know how to make this happen that is why it is missing also. Can you explain me this part ?

Anyway, I think you are very welcomed make it happen 👍

PS : you posted a new message . I do not have the knowledge for DerefMut bit I feel your concern regarding consitensy

@inzanez
Copy link
Contributor Author

inzanez commented Oct 9, 2020

@arn-the-long-beard Well, I guess I would wait for @EpiFouloux as he's working on a proposal. But in general, it would be similar to what the current AQL query interface does. It would not deserialize the response to Document<T>, but deserialize to T directly. So if T contains the Header values, it will automatically be deserialized. If it doesn't contain them, they will just be dropped. But either way, you will receive T.

As an example:

struct User {
  name: String,
}
...
let u: User = collection.document("10000");

This would return a User, but as the struct doesn't contain any of the header fields, they will not be deserialized. So if one were to use the header fields, one would need to specify them in the struct.

Regarding Deref/DerefMut: This would allow for automatic dereferencing so that working on a Document<T> instance would feel the same as working on T directly, without dereferencing everytime. As it's done automatically. As far as I understand.

@ManevilleF
Copy link
Contributor

@inzanez That is exactly what my proposal is, I'll make a pull request soon.

@inzanez
Copy link
Contributor Author

inzanez commented Oct 9, 2020

@EpiFouloux Which one? Deref/DerefMut or removing the document struct?

@ManevilleF
Copy link
Contributor

Removing the document struct: https://github.com/EpiFouloux/arangors/commit/f03492bde18d0e6e135c0aa83b8c3a30bb355f3b

Though I don't remove DocumentResponse since it can be silent I do remove its Header. I still have to work on the tests before making a pull request.

I don't know if @inzanez proposition for Deref/DerefMut is ideal since I never worked with this

@arn-the-long-beard
Copy link
Contributor

arn-the-long-beard commented Oct 9, 2020 via email

@fMeow
Copy link
Owner

fMeow commented Oct 23, 2020

Sorry for the long wait.

Personally speaking, I think Document type is a wrapper that differentiate a user data structure and a arango Document.

When creating data structure, I don't want to mingle it with database stuff like _id or _rev, which can made my data structure dirty. And with Docuemnt I don't need to have write the id, key, rev field for every struct. When I do need them, I can just wrap it in Docuemnt<T>.

When come into serialization and deserialization, Document<T> is the same as T. That means you can desrialize into T if you don't care about _id and etc, but when you do want to have a control over _id, _rev or so, you can desrialize it into Document<T>.

Let me give an example:

struct User{
    field1: bool
    //more useful field ...
}
// let's pretend we got a working User
let u =User::default(); 

// You can create a new document with doc and do not need to care about id, key and rev. They are skipped when serializing cause they are empty
let doc =Document::new(u);
// If you don't want to move, you can use a reference to create doc
let doc = Document::new(&u);
// When you need header
let x: Document<User> = col.document("100000").unwrap()
let u:Document<User> = db.aql_str("FOR u in users RETURN u");

// When you don't care about header
let u: User = db.aql_str("FOR u in users RETURN u");

The point is that, NEVER embed _id, _key and _rev in your own data structure, and use 'Document' when you need it.

@inzanez
Copy link
Contributor Author

inzanez commented Oct 23, 2020

@fMeow Ok, that's definitely a valid choice, as I already pointed out above. But I think it would make sense to emphasize that in the doucmentation / Readme as a core concept? So that it's clear that the user is supposed to use the Document wrapper when working with AQL queries, in case the _id, _key and _ref are important to them.

On the other hand, I still see some inconveniences floating around. Assuming you retrieved something as follows:

...
pub fn user_magic(users: Vec<User>) {
  // Whatever happens in here
  ...
}

let users: Vec<Document<User>> = db.aql_str("FOR u in users RETURN u");

// Now what? I do not want to convert that into Vec<User> to be honest. Seems like a waste of resources.
...

Of course, that function could be accepting Document<User>, but somehow I get the feeling that this database implementation detail will leak into pretty much every part of my application code.

@fMeow
Copy link
Owner

fMeow commented Oct 24, 2020

Yes, I should emphasize this in readme/doc since it is a hidden breaking change. So bad I didn't come up with Document type when implementing AQL. Plus, in my memory, an AQL query can return anything. By anything, I mean it does not need necessarily to be a doc. So the AQL query function according use T instead of Document<T>.

It is possible to change the signature of Collection::document from Document<T> to T, but in this way the users are divided into two group, ones include header fields in their own struct, and the other use Document<T> to handle header. I don't think it a good idea.

As for the user function to handle Vec<User>, if we don't need header anymore, just convert it to Vec<User>.

But that's not enough. Some times we do something complicated with the data structure, and finally need to update the doc.

In this case, a not so good solution is to use Vec<Document<User>> , and implement Deref<T> and AsRef<T> for Docuemnt<T> in arangors, which minimize modification of existing codes. But I do agree that using Vec<Document<User>> is a bad idea when we don't need header, since it give an illusion that we are working with an arango document object instead of a plain rust struct.

Another solution is to keep data in Document<T> and use reference Vec<&T>.

let users: Vec<Document<User>> = db.aql_str("FOR u in users RETURN u");
user_magic(users.iter().map(|u| &u.document).collect::<Vec<&T>>());

The conversion can be facilitate with a handy function implemented in Vec<Document<T>> so that we only need to use something like users.as_inner() to get a Vec<&T>. Also an additional mutable version handy method should be provided, maybe as_mut_inner().

To summarize, I am reluctant to change the signature of document retrieval related function from Docuemnt<T> to T, since I would like user not to include _id, _rev and _key in their own struct and use Document instead. But I would make these changes to tackle this issue(#36):

  1. add AsRef<T> and Deref<T> for Document so that we can access field in T easier.
  2. Add function in Vec<Document<T>> to facilitate conversion to Vec<&T> and Vec<T>.
    An Into<Vec<T>> aw well as as_inner and as_mut_inner should be enough.
  3. An update to doc to emphasize Document and Header, and warn the user not to include _id, _key and _rev in their own struct.

Is this solution sound acceptable to you?

@inzanez
Copy link
Contributor Author

inzanez commented Oct 24, 2020

Just as an idea: couldn‘t you just leave the Document<T> structure in arangors, and change serialization of all collection methods to T? That way, every user of arangors could choose to continue as you outlined (using Document provided by arangors) or creating their own structs with _id, _key and _rev.

Or is there a drawback I dont see? I think this would not force any of the two concepts on anyone and give users of arangors the most freedom of choice. And existing code would not need to be adapted very much.

Would that be a viable option too?

@ManevilleF
Copy link
Contributor

One addition to @fMeow proposition would be to stop deleting _id, _rev and _key from the returned json to put them in the header, so people have these in their structure if they want

@inzanez
Copy link
Contributor Author

inzanez commented Oct 26, 2020

@EpiFouloux I think they (_id, _rev and _key) are being consumed as part of the deserialization process through serde. That's why I proposed to leave Document<T> in the arangors crate, but let the user decide about deserialization (don't force Document on the users of the library).

@ManevilleF
Copy link
Contributor

@inzanez From what I see in the source

            let _id = json.remove("_id").ok_or(DeError::missing_field("_id"))?;
            let _key = json.remove("_key").ok_or(DeError::missing_field("_key"))?;
            let _rev = json.remove("_rev").ok_or(DeError::missing_field("_rev"))?;
            let header: Header = Header {
                _id: serde_json::from_value(_id).map_err(DeError::custom)?,
                _key: serde_json::from_value(_key).map_err(DeError::custom)?,
                _rev: serde_json::from_value(_rev).map_err(DeError::custom)?,
            };

It seems they are explicitely deleted from the json

@inzanez
Copy link
Contributor Author

inzanez commented Oct 26, 2020

Oh, that‘s interesting. Well, but we would still have a deserialization to Document. Which would need to be unpacked...

@fMeow
Copy link
Owner

fMeow commented Nov 15, 2020

Now I manually deserialize Document, and it should be fine when you have _id, _rev, _key on user struct when deserialize into Document<T>. Both the header struct doc.header and user field doc.document.key can work.

@inzanez
Copy link
Contributor Author

inzanez commented Nov 15, 2020

But that still forces a user to unpack that Document,...couldn‘t that be optional? Or maybe add a second kind of functions on the collection that allows to use the functionality without using the Document struct?

@fMeow
Copy link
Owner

fMeow commented Nov 15, 2020

I am considering a more OOP liked design for document, like db.update_doc(&doc). In that case a standard way to store _key is a must.

@inzanez
Copy link
Contributor Author

inzanez commented Nov 15, 2020

Could you not define a trait that exposed key, accept impl trait instead of Document struct, and just implement the trait for the internal Document type?
That way, everyone that wished could implement the ‚Doc‘ trait on whatever struct they want and it should work out...? Using something like typetag. Just an idea...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants