Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New deserialization api #112

Closed
wants to merge 13 commits into from
Closed

Conversation

Gor027
Copy link
Contributor

@Gor027 Gor027 commented May 26, 2023

Pre-review checklist

Fixes #94

This PR Integrates the new deserialization API from the Rust driver which is a WIP.
It is dependent upon the execution profiles' PR #107 and can be merged only after the latter is already done.
The lazy deserialization allows to also implement binder makers for decimal and duration types. The test_cassandra_types tests for these types are also enabled.

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have enabled appropriate tests in .github/workflows/build.yml in gtest_filter.
  • I have enabled appropriate tests in .github/workflows/cassandra.yml in gtest_filter.

@Gor027 Gor027 force-pushed the new_deserialization_api branch 3 times, most recently from 89dd2be to c63a70c Compare May 28, 2023 14:50
@Gor027 Gor027 requested review from Lorak-mmk, piodul and wprzytula May 28, 2023 15:13
scylla-rust-wrapper/src/cass_error.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/cass_error.rs Show resolved Hide resolved
scylla-rust-wrapper/src/future.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/query_result.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/query_result.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/query_result.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/query_result.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/session.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/query_result.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/binding.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@wprzytula wprzytula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general idea to make the logic more type-safe (which Rust promotes):
replace groups of Options with custom types that encode the logic inside themselves (their names, variants, etc.). E.g. in case of CassResultIterator, position is None iff row is None. This dependency could be represented as an enum:

pub struct CassResultIterator {
    result: Arc<CassResult>,
    curr_row_info: CassResultIteratorCurrentRowInfo,
}

enum CassResultIteratorCurrentRowInfo {
    NoRows,
    Rows {
        current_row: CassRow,
        position: usize,
    }
}

scylla-rust-wrapper/src/query_result.rs Outdated Show resolved Hide resolved
@Gor027 Gor027 force-pushed the new_deserialization_api branch 3 times, most recently from 2b4e74f to 6d6c117 Compare June 11, 2023 15:07
@Gor027 Gor027 force-pushed the new_deserialization_api branch from 6d6c117 to e842229 Compare June 15, 2023 14:03
@Gor027
Copy link
Contributor Author

Gor027 commented Jun 20, 2023

@wprzytula @Lorak-mmk Ping for review.

Comment on lines +241 to +247
pub fn get_tuple_types(&self) -> &Vec<Arc<CassDataType>> {
match self {
CassDataType::Tuple(type_defs) => type_defs,
_ => panic!("Cannot get tuple out of non-tuple data type"),
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scylla-rust-wrapper/src/query_result.rs Outdated Show resolved Hide resolved
Comment on lines +241 to +247
pub fn get_tuple_types(&self) -> &Vec<Arc<CassDataType>> {
match self {
CassDataType::Tuple(type_defs) => type_defs,
_ => panic!("Cannot get tuple out of non-tuple data type"),
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it the case that we should use panic='abort' ?

@Gor027
Copy link
Contributor Author

Gor027 commented Jun 22, 2023

A general idea to make the logic more type-safe (which Rust promotes): replace groups of Options with custom types that encode the logic inside themselves (their names, variants, etc.). E.g. in case of CassResultIterator, position is None iff row is None. This dependency could be represented as an enum:

pub struct CassResultIterator {
    result: Arc<CassResult>,
    curr_row_info: CassResultIteratorCurrentRowInfo,
}

enum CassResultIteratorCurrentRowInfo {
    NoRows,
    Rows {
        current_row: CassRow,
        position: usize,
    }
}

I like the idea to express the state of the fields of iterators through enums, but that will make this PR a lot more complicated. Do you mind doing this refactor in another PR? I will open an issue to not forget it.

@wprzytula
Copy link
Collaborator

A general idea to make the logic more type-safe (which Rust promotes): replace groups of Options with custom types that encode the logic inside themselves (their names, variants, etc.). E.g. in case of CassResultIterator, position is None iff row is None. This dependency could be represented as an enum:

pub struct CassResultIterator {
    result: Arc<CassResult>,
    curr_row_info: CassResultIteratorCurrentRowInfo,
}

enum CassResultIteratorCurrentRowInfo {
    NoRows,
    Rows {
        current_row: CassRow,
        position: usize,
    }
}

I like the idea to express the state of the fields of iterators through enums, but that will make this PR a lot more complicated. Do you mind doing this refactor in another PR? I will open an issue to not forget it.

I'd prefer to have the refactor done in this PR. You can put it in a separate commit.

@wprzytula wprzytula mentioned this pull request Jun 22, 2023
5 tasks
@Gor027 Gor027 force-pushed the new_deserialization_api branch 5 times, most recently from b218b20 to 4e9a138 Compare June 25, 2023 22:05
@Gor027
Copy link
Contributor Author

Gor027 commented Jun 25, 2023

A general idea to make the logic more type-safe (which Rust promotes): replace groups of Options with custom types that encode the logic inside themselves (their names, variants, etc.). E.g. in case of CassResultIterator, position is None iff row is None. This dependency could be represented as an enum:

pub struct CassResultIterator {
    result: Arc<CassResult>,
    curr_row_info: CassResultIteratorCurrentRowInfo,
}

enum CassResultIteratorCurrentRowInfo {
    NoRows,
    Rows {
        current_row: CassRow,
        position: usize,
    }
}

I have applied the idea in separate commits. Perhaps this approach is more expressive and makes it easy to understand the iterator's state, although it looks overhead to me.

@Gor027 Gor027 requested a review from wprzytula June 27, 2023 10:38
Copy link
Collaborator

@wprzytula wprzytula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very happy to see the enum-based iterator states. This is much more readable than combinations of Options.

scylla-rust-wrapper/src/query_result.rs Show resolved Hide resolved
scylla-rust-wrapper/src/query_result.rs Outdated Show resolved Hide resolved
@Gor027 Gor027 force-pushed the new_deserialization_api branch 2 times, most recently from 9dff452 to 00a8657 Compare June 27, 2023 16:53
position: position + 1,
},
CassIteratorStateInfo::PositionNoValue { .. } => {
panic!("Cannot advance the iterator. Value is empty!")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we even allowed to panic in bindings? I still haven't got any answer from @piodul and @Lorak-mmk, and I believe that letting an unwinding panic cross the C/Rust boundary is awful UB. IIUC, we are only allowed to panic if we call catch_unwind() on any path from bindings' entrypoint to any panic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, we use aborting panics in bindings, those are not unwinding panics and hence cannot be even caught by catch_unwind if I am not mistaken.

Copy link
Collaborator

@wprzytula wprzytula Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find any code in our codebase that would turn panics into aborting. Can you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, cass_types:244 where for a wrong match arm Rust will stop the execution and refuse to continue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, how can I know that such behaviour occur?
Moreover, aborting on unrecognized pattern is something completely different that aborting on explicit panic. Isn't it?

@Gor027 Gor027 force-pushed the new_deserialization_api branch from 00a8657 to a4e6e61 Compare June 27, 2023 17:20
@Gor027 Gor027 force-pushed the new_deserialization_api branch from a4e6e61 to c6c8c1d Compare June 27, 2023 17:54
Copy link
Collaborator

@wprzytula wprzytula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the maze of ifs, I'm convinced that a set of unit tests for deserialization is absolutely required.

@Gor027 Gor027 force-pushed the new_deserialization_api branch from c6c8c1d to eb8fc59 Compare June 27, 2023 19:43
Gor027 added 11 commits June 28, 2023 00:52
The new deserialization api in the Rust driver is still a WIP, so the
version for the moment references to my fork. The `scylla-proxy` crate
version also relies on my repo to avoid version conflicts.
Added error conversion for `ParseError` into `CassError`, albeit we will
need to have correct and more proper mappings.
For quick migration, legacy query result type is used.
The new deserialization API allows to deserialize rows and values on
fly, so the query result consuming structures will be more similar to
C++ driver behavior.

`CassResult` stores bare `QueryResult` received from the Rust driver to
support metadata related queries, and `first_row: CassRow` to make it
easier to return the result of a single-row query.

`CassRow` has a reference on `CassResult` to get raw bytes on request
and store them in `CassValue` objects. This will be added in future
commits.

`CassValue` stores the raw bytes and optionally deseriialized column
value. A `cass_value_get_*` call will ensure the deserialization of the
raw bytes. This will also allow to return raw bytes in case one of
`cass_value_get_decimal`, `cass_value_get_duration` or
`cass_value_get_bytes` is called.
As `cass_iterator_next` returns bool, `decode_next_row` ignores all
errors through `unwrap_or_return` macro. In the future, logging those
errors may be useful. Similar to the C++ driver, decoding next row means
to store raw bytes in vector collection of `CassValue` objects.

As query result storing structs have lifetime bounded by the lifetime
of the `CassResult` or `CassIterator`, references are stored with
'static lifetime to avoid parametrizing structs and functions. This
implied 'static lifetime parameter for the `QueryResult::rows` and
`QueryResult::first_row` functions. So, `decode_first_row` function
gets `cass_result` as `*mut CassResult` to create a reference with
'static lifetime.
Updating edition in Cargo.toml to 2021 was necessary so that the `:pat`
fragment specifier can match `A | B` patterns.
Fixed `cass_value_get_bytes` to return bytes for all types.
Make `cass_value_is_collection` independent from value field of
`CassValue` as the field is redundant after the addition of lazy
deserialization.
The same is done for `cass_value_item_count` and all collection
iterators which are dependent on that field.
Prepared collection iterator structs to add lazy deserialization of
collection values.
Introduced `RawValue` struct so that collection iterators' `next` instead
of deserialization will return raw bytes.
Added `column_type` field to `CassValue` to create sequence, map, and
udt iterators without converting `CassDataType` to `ColumnType`.
C++ driver enables to iterate over a map sequentially like lists or sets.
The `CassCollectionIterator` is modified to enum to contain either
`SequenceIterator` or `MapIterator` for supporting this feature.
Added implementation for `cass_value_get_decimal` and
`cass_value_get_duration`.
Fixed CassandraTypesTests.Map for Duration type.
Modify CassResult and CassResultIterator row fields to avoid option and
unnecessary checkings while operating on them.
Refactored result and row iterators to wrap value and position fields in
state info enum.
@Gor027 Gor027 force-pushed the new_deserialization_api branch from eb8fc59 to 4829464 Compare June 29, 2023 15:23
@Gor027
Copy link
Contributor Author

Gor027 commented Jun 29, 2023

Looking at the maze of ifs, I'm convinced that a set of unit tests for deserialization is absolutely required.

I added some relevant tests for collection iterators as Cassandra tests may not cover everything.

Comment on lines +1567 to +1603
unsafe {
let mut bytes_mut = BytesMut::new();
bytes_mut.put_i32(1); // Number of values
let bytes: *const Bytes = &Bytes::from(bytes_mut);
let slice_frame = FrameSlice::new(bytes.as_ref().unwrap());
let column_type: *const ColumnType = &ColumnType::List(Box::new(ColumnType::Text));
let sequence_iterator =
SequenceIterator::deserialize(column_type.as_ref().unwrap(), Some(slice_frame))
.unwrap();
let mut collection_iterator = CassIterator::CassCollectionIterator(
CassCollectionIterator::SequenceIterator(CassSequenceIterator {
sequence_iterator,
count: 1,
state_info: CassIteratorStateInfo::NoValue,
}),
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code shows the problem we discussed before: as CassSequenceIterator requires 'static lifetime out the held references, the lifetime of the &Bytes reference must by extended to 'static in a hacky way (by casting it to pointer and back again).
The solution is to make the iterators parametrised by a lifetime.

Copy link
Contributor Author

@Gor027 Gor027 Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create an issue out of it to not forget #119

scylla-rust-wrapper/src/query_result.rs Outdated Show resolved Hide resolved
Comment on lines 1661 to 1683
// Put serialized (string, bool) pair in bytes
bytes_mut.put_i32(text.len() as i32);
bytes_mut.put_slice(text.as_bytes());
bytes_mut.put_i32(1);
bytes_mut.put(true_bytes);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is such by-hand crafting necessary? Couldn't we use our serialisation framework to create deserialisation tests by asserting identity of the types fed and got back?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would require adding scylla-cql in dependencies, I do not want to add it for a couple of serialization purposes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust driver already has scylla-cql as dependency, so I believe this shouldn't hurt us. @piodul Can you confirm that? Will Cargo be intelligent enough to compile scylla-cql only once?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, provided that we use the same version of scylla-cql as the driver uses.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I insist on using scylla-cql for aid here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piodul Are we sure that the following way of serializing and deserializing should always work for all types:

// Serialize
let mut bytes = Vec::new();
let mut map = HashMap::new()
map.insert("key", "value");
map.serialize(&mut bytes);
let frame_slice = FrameSlice::new(bytes);
// Deserialize
let column_type = ColumnType::Map(Box::new(ColumnType::Text), Box::new(ColumnType::Text));
let (key, value) = MapIterator::deserialize(column_type, frame_slice).unwrap().next().unwrap();

assert_eq!(key, "key");
assert_eq!(value, "value");

This is a pseudocode example so it may not compile.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it should work. By the way, the impls for the Value trait are in scylla-cql/src/frame/value.rs, looks like they do the same thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, what was missing is that before the serialized value the buffer contains also the number of bytes of it, but the deserialization API expects a frame slice without the number of bytes in the beginning.

scylla-rust-wrapper/src/query_result.rs Outdated Show resolved Hide resolved
Wrapped value and position fields in `CassIteratorStateInfo` to convey
more information about the collection iterators' state.
@Gor027 Gor027 force-pushed the new_deserialization_api branch 2 times, most recently from 5a88dce to 4be63a8 Compare June 29, 2023 17:09
Added unit tests to test the behavior of collection iterators for thoses
cases that cassandra tests may not cover.
@Gor027 Gor027 force-pushed the new_deserialization_api branch from 4be63a8 to 77f37fd Compare June 30, 2023 10:41
@Lorak-mmk
Copy link
Collaborator

I think we can safely close this.

@Lorak-mmk Lorak-mmk closed this Jan 2, 2024
@Lorak-mmk
Copy link
Collaborator

Oops, wrong PR

@Lorak-mmk Lorak-mmk reopened this Jan 12, 2024
@Lorak-mmk
Copy link
Collaborator

After all I think we can close this - applying new deserialization to cpp-rust-driver will require at least some changes to this PR, so it will have to be done by someone else.

@Lorak-mmk Lorak-mmk closed this Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for cass_value_get_decimal and cass_value_get_bytes
4 participants