-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New deserialization api #112
Conversation
89dd2be
to
c63a70c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general idea to make the logic more type-safe (which Rust promotes):
replace groups of Option
s with custom types that encode the logic inside themselves (their names, variants, etc.). E.g. in case of CassResultIterator
, position
is None
iff row
is None
. This dependency could be represented as an enum:
pub struct CassResultIterator {
result: Arc<CassResult>,
curr_row_info: CassResultIteratorCurrentRowInfo,
}
enum CassResultIteratorCurrentRowInfo {
NoRows,
Rows {
current_row: CassRow,
position: usize,
}
}
2b4e74f
to
6d6c117
Compare
6d6c117
to
e842229
Compare
@wprzytula @Lorak-mmk Ping for review. |
pub fn get_tuple_types(&self) -> &Vec<Arc<CassDataType>> { | ||
match self { | ||
CassDataType::Tuple(type_defs) => type_defs, | ||
_ => panic!("Cannot get tuple out of non-tuple data type"), | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @Lorak-mmk @piodul
pub fn get_tuple_types(&self) -> &Vec<Arc<CassDataType>> { | ||
match self { | ||
CassDataType::Tuple(type_defs) => type_defs, | ||
_ => panic!("Cannot get tuple out of non-tuple data type"), | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it the case that we should use panic='abort'
?
I like the idea to express the state of the fields of iterators through enums, but that will make this PR a lot more complicated. Do you mind doing this refactor in another PR? I will open an issue to not forget it. |
I'd prefer to have the refactor done in this PR. You can put it in a separate commit. |
b218b20
to
4e9a138
Compare
I have applied the idea in separate commits. Perhaps this approach is more expressive and makes it easy to understand the iterator's state, although it looks overhead to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very happy to see the enum-based iterator states. This is much more readable than combinations of Option
s.
9dff452
to
00a8657
Compare
position: position + 1, | ||
}, | ||
CassIteratorStateInfo::PositionNoValue { .. } => { | ||
panic!("Cannot advance the iterator. Value is empty!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we even allowed to panic in bindings? I still haven't got any answer from @piodul and @Lorak-mmk, and I believe that letting an unwinding panic cross the C/Rust boundary is awful UB. IIUC, we are only allowed to panic if we call catch_unwind()
on any path from bindings' entrypoint to any panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, we use aborting panics in bindings, those are not unwinding panics and hence cannot be even caught by catch_unwind
if I am not mistaken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find any code in our codebase that would turn panics into aborting. Can you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, cass_types:244
where for a wrong match arm Rust will stop the execution and refuse to continue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, how can I know that such behaviour occur?
Moreover, aborting on unrecognized pattern is something completely different that aborting on explicit panic. Isn't it?
00a8657
to
a4e6e61
Compare
a4e6e61
to
c6c8c1d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the maze of if
s, I'm convinced that a set of unit tests for deserialization is absolutely required.
c6c8c1d
to
eb8fc59
Compare
The new deserialization api in the Rust driver is still a WIP, so the version for the moment references to my fork. The `scylla-proxy` crate version also relies on my repo to avoid version conflicts. Added error conversion for `ParseError` into `CassError`, albeit we will need to have correct and more proper mappings. For quick migration, legacy query result type is used.
The new deserialization API allows to deserialize rows and values on fly, so the query result consuming structures will be more similar to C++ driver behavior. `CassResult` stores bare `QueryResult` received from the Rust driver to support metadata related queries, and `first_row: CassRow` to make it easier to return the result of a single-row query. `CassRow` has a reference on `CassResult` to get raw bytes on request and store them in `CassValue` objects. This will be added in future commits. `CassValue` stores the raw bytes and optionally deseriialized column value. A `cass_value_get_*` call will ensure the deserialization of the raw bytes. This will also allow to return raw bytes in case one of `cass_value_get_decimal`, `cass_value_get_duration` or `cass_value_get_bytes` is called.
As `cass_iterator_next` returns bool, `decode_next_row` ignores all errors through `unwrap_or_return` macro. In the future, logging those errors may be useful. Similar to the C++ driver, decoding next row means to store raw bytes in vector collection of `CassValue` objects. As query result storing structs have lifetime bounded by the lifetime of the `CassResult` or `CassIterator`, references are stored with 'static lifetime to avoid parametrizing structs and functions. This implied 'static lifetime parameter for the `QueryResult::rows` and `QueryResult::first_row` functions. So, `decode_first_row` function gets `cass_result` as `*mut CassResult` to create a reference with 'static lifetime.
Updating edition in Cargo.toml to 2021 was necessary so that the `:pat` fragment specifier can match `A | B` patterns. Fixed `cass_value_get_bytes` to return bytes for all types.
Make `cass_value_is_collection` independent from value field of `CassValue` as the field is redundant after the addition of lazy deserialization. The same is done for `cass_value_item_count` and all collection iterators which are dependent on that field. Prepared collection iterator structs to add lazy deserialization of collection values.
Introduced `RawValue` struct so that collection iterators' `next` instead of deserialization will return raw bytes. Added `column_type` field to `CassValue` to create sequence, map, and udt iterators without converting `CassDataType` to `ColumnType`.
C++ driver enables to iterate over a map sequentially like lists or sets. The `CassCollectionIterator` is modified to enum to contain either `SequenceIterator` or `MapIterator` for supporting this feature.
Added implementation for `cass_value_get_decimal` and `cass_value_get_duration`.
Fixed CassandraTypesTests.Map for Duration type.
Modify CassResult and CassResultIterator row fields to avoid option and unnecessary checkings while operating on them.
Refactored result and row iterators to wrap value and position fields in state info enum.
eb8fc59
to
4829464
Compare
I added some relevant tests for collection iterators as Cassandra tests may not cover everything. |
unsafe { | ||
let mut bytes_mut = BytesMut::new(); | ||
bytes_mut.put_i32(1); // Number of values | ||
let bytes: *const Bytes = &Bytes::from(bytes_mut); | ||
let slice_frame = FrameSlice::new(bytes.as_ref().unwrap()); | ||
let column_type: *const ColumnType = &ColumnType::List(Box::new(ColumnType::Text)); | ||
let sequence_iterator = | ||
SequenceIterator::deserialize(column_type.as_ref().unwrap(), Some(slice_frame)) | ||
.unwrap(); | ||
let mut collection_iterator = CassIterator::CassCollectionIterator( | ||
CassCollectionIterator::SequenceIterator(CassSequenceIterator { | ||
sequence_iterator, | ||
count: 1, | ||
state_info: CassIteratorStateInfo::NoValue, | ||
}), | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code shows the problem we discussed before: as CassSequenceIterator
requires 'static
lifetime out the held references, the lifetime of the &Bytes
reference must by extended to 'static
in a hacky way (by casting it to pointer and back again).
The solution is to make the iterators parametrised by a lifetime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's create an issue out of it to not forget #119
// Put serialized (string, bool) pair in bytes | ||
bytes_mut.put_i32(text.len() as i32); | ||
bytes_mut.put_slice(text.as_bytes()); | ||
bytes_mut.put_i32(1); | ||
bytes_mut.put(true_bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is such by-hand crafting necessary? Couldn't we use our serialisation framework to create deserialisation tests by asserting identity of the types fed and got back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would require adding scylla-cql
in dependencies, I do not want to add it for a couple of serialization purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust driver already has scylla-cql
as dependency, so I believe this shouldn't hurt us. @piodul Can you confirm that? Will Cargo be intelligent enough to compile scylla-cql
only once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, provided that we use the same version of scylla-cql
as the driver uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I insist on using scylla-cql for aid here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@piodul Are we sure that the following way of serializing and deserializing should always work for all types:
// Serialize
let mut bytes = Vec::new();
let mut map = HashMap::new()
map.insert("key", "value");
map.serialize(&mut bytes);
let frame_slice = FrameSlice::new(bytes);
// Deserialize
let column_type = ColumnType::Map(Box::new(ColumnType::Text), Box::new(ColumnType::Text));
let (key, value) = MapIterator::deserialize(column_type, frame_slice).unwrap().next().unwrap();
assert_eq!(key, "key");
assert_eq!(value, "value");
This is a pseudocode example so it may not compile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it should work. By the way, the impls for the Value
trait are in scylla-cql/src/frame/value.rs
, looks like they do the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, what was missing is that before the serialized value the buffer contains also the number of bytes of it, but the deserialization API expects a frame slice without the number of bytes in the beginning.
Wrapped value and position fields in `CassIteratorStateInfo` to convey more information about the collection iterators' state.
5a88dce
to
4be63a8
Compare
Added unit tests to test the behavior of collection iterators for thoses cases that cassandra tests may not cover.
4be63a8
to
77f37fd
Compare
I think we can safely close this. |
Oops, wrong PR |
After all I think we can close this - applying new deserialization to cpp-rust-driver will require at least some changes to this PR, so it will have to be done by someone else. |
Pre-review checklist
Fixes #94
This PR Integrates the new deserialization API from the Rust driver which is a WIP.
It is dependent upon the execution profiles' PR #107 and can be merged only after the latter is already done.
The lazy deserialization allows to also implement binder makers for decimal and duration types. The
test_cassandra_types
tests for these types are also enabled..github/workflows/build.yml
ingtest_filter
..github/workflows/cassandra.yml
ingtest_filter
.