-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement vector<>
data type
#2
Comments
Link to the specification of the protocol (explains how vector values should be serialized): https://github.com/apache/cassandra/blob/15ed18e9d49f48e88f40b90c156248b8b697c7e2/doc/native_protocol_v5.spec#L1210-L1215 |
I propose structuring the implementation in the following way. The most safe, in my opinion, way to approach developing this would be to go over the phases and implement them in order, perhaps revisiting the previous ones if some adjustments need to be made. The stuff here is bound tightly enough that I don't think splitting into smaller PRs makes sense. Introduce support for representing the vector datatype internally This will be a rather big part which requires thorough lecture of the code in the The goal is to be able to represent vector types internally in Scylla. The most important components are
First, I recommend getting familiar with how this support looks like for some "native" type (i.e. a type that is not a collection) and then look at how lists and sets are supported. Look at the definition and implementation of the following:
At this point, you can implement a Perhaps you will have to implement more stuff after all, but I'm not sure what will be needed, and the above are required for certain. I recommend proceeding with the later steps and add more stuff in the Extend CQL grammar to be able to express the Now that you have an internal representation of the vector type, you can implement necessary syntax so that you can create a table with a column of the vector type. Start by adding the syntax and work your way down the abstractions, implementing what is needed. After this point, you should be able to create a table with a vector datatype and, most likely, be able to write to / read from the table (by using the bind markers, i.e. Extend CQL grammar to be able to express vector literals This will require delving into the The first thing that should be done there is changing the name Then, go over all occurrences of
Tests Some tests that use the python driver would be appreciated. For now, you can just substitute the python driver for the upstream driver if Scylla fork does not support vector types. These tests could actually be developed in parallel to other steps and, for now, only ran against Cassandra - running them against a valid implementation will make sure that the tests make sense. There is also an option to write boost unit tests. There are some tests of this kind for |
Boost test is a good way to check vector type implementation, especially in the first stage when CQL layer doesn't support vector type yet. Types module is independent of the rest of database systems, so you can validate the implementation without spinning up the whole system (for instance, test cases in |
Add support for vector type.
The vector is a fixed-length collection with specified type of elements:
VECTOR<INT, 5>
.The implementation should:
As a result, a user should be able to use vector type in the same way as any other data type.
Note:
None of Scylla's drivers support vector type. Until the driver team (or we) adds this functionality, we're probably forced to use Cassandra's driver.
Apache Cassandra issue: https://issues.apache.org/jira/browse/CASSANDRA-18504
Patch adding other data type to Scylla (some code might be outdate): scylladb@509626f
The text was updated successfully, but these errors were encountered: