Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(structured data): query DB v0 #2499

Merged
merged 22 commits into from
Nov 16, 2023
Merged

feat(structured data): query DB v0 #2499

merged 22 commits into from
Nov 16, 2023

Conversation

fontanierh
Copy link
Contributor

@fontanierh fontanierh commented Nov 13, 2023

#2211

Not particularly optimized, no caching etc..

@fontanierh fontanierh changed the base branch from main to feat/get-db-schema November 13, 2023 11:21
@fontanierh fontanierh changed the title WIP Feat/query db feat(structured data): query DB v0 Nov 13, 2023
Copy link
Contributor

@spolu spolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a first pass, will need to review more once addressed 👍

core/src/databases/database.rs Outdated Show resolved Hide resolved
core/src/databases/database.rs Outdated Show resolved Hide resolved
core/src/databases/database.rs Outdated Show resolved Hide resolved
core/src/databases/database.rs Outdated Show resolved Hide resolved
core/src/databases/database.rs Outdated Show resolved Hide resolved
core/src/databases/database.rs Outdated Show resolved Hide resolved
core/src/databases/database.rs Outdated Show resolved Hide resolved
core/src/databases/table_schema.rs Outdated Show resolved Hide resolved
Base automatically changed from feat/get-db-schema to main November 13, 2023 14:55
@fontanierh fontanierh requested a review from spolu November 13, 2023 16:47
spolu
spolu previously approved these changes Nov 13, 2023
Copy link
Contributor

@spolu spolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Not a big fan of the ok_or_else I generally go with match cases for that, it's a bit more pedantic but I find slightly easier to read. But no action required. This is valid Rust 👍

@fontanierh
Copy link
Contributor Author

Not a big fan of the ok_or_else I generally go with match cases for that, it's a bit more pedantic but I find slightly easier to read

Ah I was starting to enjoy them !
I can remove

@spolu
Copy link
Contributor

spolu commented Nov 13, 2023

Really feel free to keep them! 👍

@fontanierh fontanierh force-pushed the feat/query-db branch 2 times, most recently from e6f6646 to 2a97361 Compare November 14, 2023 16:43
Copy link
Contributor

@spolu spolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks overall good. Happy to make a pass on it as well once ready. I think we can optimize a bunch of things a bit more to align with the experiment we did?

core/src/databases/database.rs Show resolved Hide resolved
core/src/databases/database.rs Show resolved Hide resolved
core/Cargo.toml Outdated Show resolved Hide resolved
@fontanierh fontanierh requested a review from spolu November 15, 2023 17:42
Copy link
Contributor

@spolu spolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting to look good. I left some comments but overall this looks great.

I'm quite bearish on the untyped jump through Value on your wayt to sql Would much rather have the kind of types we had in the experiment + you can probably handle the Boolean <-> 1/0 in the ToSql FromSql implementation of that which would be much much cleaner?

core/Cargo.toml Outdated Show resolved Hide resolved
core/src/databases/database.rs Outdated Show resolved Hide resolved
core/Cargo.toml Outdated Show resolved Hide resolved
core/src/databases/table_schema.rs Show resolved Hide resolved
core/src/databases/database.rs Outdated Show resolved Hide resolved
core/src/databases/database.rs Outdated Show resolved Hide resolved
core/src/databases/database.rs Outdated Show resolved Hide resolved
core/src/databases/database.rs Outdated Show resolved Hide resolved
@fontanierh
Copy link
Contributor Author

I'm quite bearish on the untyped jump through Value on your wayt to sql Would much rather have the kind of types we had in the experiment + you can probably handle the Boolean <-> 1/0 in the ToSql FromSql implementation of that which would be much much cleaner?

What we had in the experiment is equally typed as a Value AFAICT ? Not sure to understand what would be cleaner about having an extra enum that is basically just a custom version of Value ?

@spolu
Copy link
Contributor

spolu commented Nov 16, 2023

Not really this would be much smaller right. The idea would be to go from JSON (in DB) to that type which is much more restricted as soon as you get out of the DB and have typed interactions everywhere instead of Value which from a typed perspective could be nested etc...

@fontanierh
Copy link
Contributor Author

Ok so you are suggesting we add an intermediary type that is kind of a "narrowed" value and that implements toSQL ? I'm not necessarily against it, but IMO we don't gain much from it, because even if it can't be nested or can't be a type that we don't support in databases, it can still be of the wrong type for a given query so there is no type safety per se

@fontanierh
Copy link
Contributor Author

having a custom type will let us handle the true/false <-> 1/0 conversion in ToSql / FromSql which is exactly where you would expect the translation

@spolu will go with the narrowed enum. I feel this one is a much stronger argument than the type safety one

@fontanierh fontanierh requested a review from spolu November 16, 2023 11:41
Copy link
Contributor

@spolu spolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.

One final thing that I think is important to figure out now rather than later.

core/src/databases/database.rs Outdated Show resolved Hide resolved
.map(|r| table_schema.schema.get_insert_params(&field_names, r))
.collect::<Result<Vec<_>>>()?
.iter()
.map(|values| match stmt.execute(params_from_iter(values)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want to run that at the end out of the par_iter. Requires benchmarking.

I suspect this inherently prevents parallelization because the pool will get filled with rows waiting for execution on the lock that protects that stmt.execute.

I would instead create all params in parallalel and then run the stmt.execute sequentially which is locked anyway.

Can we benchmark the two approach to convince ourselves of the best approach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worth the time since this code is likely to be final

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already tried that, params_from_iter complains when it is in the par_iter. I can try again again but I think it's not possible because it needs to own the value (so we'd have to copy which I believe is worse ?).

creating and executing the smt cannot be done in the par_iter because conn cannot go through threads.

I am happy to benchmark, but I don't see how copying every single row's data can be faster than calling params from iter sequentially ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No copying involved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to clarify:
Screenshot 2023-11-16 at 13 08 21

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I'd have to clone() the params (sequentially).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works with into_iter instead of iter 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the former copying the data ? Or merely saying that from this point only the new interator can reference it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that actually makes sense

Copy link
Contributor

@spolu spolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM \o/

Thanks for bearing with my feedback

@fontanierh
Copy link
Contributor Author

🙌 stoked to merge

@fontanierh fontanierh merged commit 82c3fe2 into main Nov 16, 2023
1 check passed
@fontanierh fontanierh deleted the feat/query-db branch November 16, 2023 12:50
@spolu
Copy link
Contributor

spolu commented Nov 16, 2023

💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants