Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added handling of the USING clause in CREATE INDEX for dummy-vector-backend #3

Conversation

smoczy123
Copy link

We haven't added support for CREATE CUSTOM INDEX yest, but it shouldn't be much work.

@smoczy123 smoczy123 requested a review from Jadw1 November 15, 2024 14:22
@smoczy123 smoczy123 requested a review from piodul as a code owner November 15, 2024 14:22
@smoczy123 smoczy123 linked an issue Nov 15, 2024 that may be closed by this pull request
Copy link

@Jadw1 Jadw1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't have a CI in this repo, please make sure the tests are passing. Until the PR reaches the final version, you can check only index-related tests.
Once you consider the PR to be ready to merge, you should run full test.py

@piodul
Perhaps we should setup CI for the fork? It can be simpler (maybe it'll be enough to run only unit tests and skip dtests for now?) than Scylla's one but it will make merging the fork to OSS easier in the future for us.

@@ -23,8 +23,8 @@ void cql3::statements::index_prop_defs::validate() {
throw exceptions::invalid_request_exception("CUSTOM index requires specifying the index class");
}

if (!is_custom && custom_class) {
throw exceptions::invalid_request_exception("Cannot specify index class for a non-CUSTOM index");
if (!is_custom && custom_class && *custom_class != "dummy-vector-backend") {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine this validation should looks like this:

if (!is_custom && custom_class) {
  throw "cannot specify index class for a non-custom index"
}

if (*custom_class in _supported_custom_classes) {
  throw "unsupported custom index class {*custom_class}"
}

This way the code will be easier to maintain when we'd like to add a new custom class.

Also, this gives us a better control over supported custom index classes. Most likely we want to protect a newly added index class with a cluster feature (see gms/feature_service.hh), so the node starts using it once all of the nodes are aware of the feature.

@@ -99,6 +99,7 @@ public:
bool is_index(view_ptr) const;
bool is_index(const schema& s) const;
bool is_global_index(const schema& s) const;
std::optional<sstring> vector_index_class(const schema& s) const;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's return general custom index class and not distinguish vector index from other custom indexes (even though they don't exist) at this level.

schema/schema.hh Outdated
@@ -541,6 +541,7 @@ class schema_describe_helper {
public:
virtual bool is_global_index(const table_id& base_id, const schema& view_s) const = 0;
virtual bool is_index(const table_id& base_id, const schema& view_s) const = 0;
virtual std::optional<sstring> vector_index_class(const table_id& base_id, const schema& view_s) const = 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here: vector_index_class -> custom_index_class

@smoczy123 smoczy123 force-pushed the 1-implement-schema-management-layer-for-vector-search-indexes branch from 997e78b to a5113d1 Compare November 21, 2024 21:31
Copy link

@piodul piodul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of general comments:

  • The PR lacks a description of the changes and the motivation. Please add it.
  • None of the commits have descriptions. Titles of some commits are not obvious at all, for example:
    cql3/statements: also set index_options in create_index_statement when custom class is provided
    Why "also"? Why do we need to set it? I, as a reviewer who does not have a good overview of how you intend to implement what you were asked to do (partially because you didn't describe it in the PR description), have no idea whether this change is needed, whether it is implemented well, etc. The commits need to provide necessary context.
  • A submodule update has snuck into the PR. Is it needed for anything?

@@ -70,7 +70,8 @@ create_index_statement::validate(query_processor& qp, const service::client_stat
throw exceptions::invalid_request_exception("Only CUSTOM indexes can be created without specifying a target column");
}

_properties->validate();
gms::feature_service fs = gms::feature_service(gms::feature_config_from_db_config(qp.db().get_config()));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is absolutely not the right way to use feature_service. There should be only one instance of the feature service on each shard, and this instance is initialized in main.cc. You should not create your own instances, except during initialization code.

I suggest doing the validation in validate_while_executing instead - there, you can get access to the feature service via the data_dictionary::database object.

@@ -396,4 +396,12 @@ future<> feature_service::enable(std::set<std::string_view> list) {
});
}

std::set<std::string_view> feature_service::supported_custom_classes() const {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feature_service is not the right place to put this. This module is used for tracking features which need to be supported by all nodes in a Scylla cluster before they can be used by any node. This is not the right way to use this class.

If you want to have a bool which is only be set to true after all nodes support the feature (which is the right approach for making support conditional on this), then you should just add a new gms::feature field in the definition of the gms::feature_service class and it will work automatically.

@@ -347,7 +347,7 @@ bool secondary_index_manager::is_global_index(const schema& s) const {
});
}

std::optional<sstring> secondary_index_manager::vector_index_class(const schema& s) const {
std::optional<sstring> secondary_index_manager::custom_index_class(const schema& s) const {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not name this function custom_index_class in the commit where you add this function in the first place? In general, you should avoid changes like this, i.e. doing something in one commit and then "fixing" it in another place - if you can introduce the thing "correctly" in the first place.

@piodul
Copy link

piodul commented Nov 22, 2024

Ah, I would have forgot - there are no tests! Please add tests which demonstrate that your changes work and do what is needed for the task.

@Balwancia Balwancia force-pushed the 1-implement-schema-management-layer-for-vector-search-indexes branch from c087ffe to 821697e Compare November 25, 2024 19:57
@smoczy123 smoczy123 marked this pull request as draft November 25, 2024 20:00
@Balwancia Balwancia force-pushed the 1-implement-schema-management-layer-for-vector-search-indexes branch 4 times, most recently from fc83537 to c3f91e5 Compare November 25, 2024 21:19
@smoczy123 smoczy123 force-pushed the 1-implement-schema-management-layer-for-vector-search-indexes branch from c3f91e5 to c087ffe Compare November 25, 2024 21:38
@Balwancia Balwancia force-pushed the 1-implement-schema-management-layer-for-vector-search-indexes branch 7 times, most recently from a584a26 to 373d3df Compare November 25, 2024 23:08
syuu1228 and others added 9 commits November 26, 2024 00:11
After merged 5a470b2, we found that scylla_raid_setup fails on offline mode
installation.
This is because pkg_install() just print error and exit script on offline mode, instead of installing packages since offline mode not supposed able to connect
internet.
Seems like it occur because of missing "policycoreutils-python-utils"
package, which is the package for "semange" command.
So we need to implement the relabeling patch without using the command.

Fixes scylladb#21441
Since Amazon Linux 2 has different package name for semange, we need to
adjust package name.

Fixes scylladb#21351
…fix on merge PRs

In cc71077, i have added check for the
last line in pr body looking for `closes` prefix.

It seems that this is wrong, since in a merge PR, the `closes` prefix is
not the last line

Instead, changing the search for the last line contains `closes` prefix

Closes scylladb#21545
this change was created in the same spirit of aebb532, which
included the fmt/iostream.h and iostream when appropriate so that
the tree can build with seastar submodule including e96932b0.
in the seastar change, we stopped including unused `fmt/ostream.h`
in a public header in seastar, so the parent projects relying on
the header to indirectly include fmt/ostream.h and iostream would
have to include these headers explicitly.

Signed-off-by: Kefu Chai <[email protected]>

Closes scylladb#21525
The test is only sending a subset of the running servers for the rolling
restart. The rolling restart is checking the visibility of the restarted
node agains the other nodes, but if that set is incomplete some of the
running servers might not have seen the restarted node yet.

Improved the manager client rolling restart method to consider all the
running nodes for checking the restarted node visibility.

Fixes: scylladb#19959

Closes scylladb#21477
* seastar fba36a3d...1b0a3087 (9):
  > program-options: add missing include <memory>
  > reactor: Always retry waitpid
  > treewide: include core/format.hh when appropriate
  > print: remove unused fmt/ostream.h
  > print: extract format() into format.hh
  > net: route error messages to logger instead of to stderr
  > net: stop printing when reaching unreachable branch
  > reactor: Mark drain() private
  > rpc: optimize tuple deserialization when the types are default-constructible

Closes scylladb#21520
We recently added a "--release <version>" option to test/cql-pytest/run
to run a cql-pytest test against a released version of Scylla, downloaded
automatically from ScyllaDB's precompiled binary repository. This patch
adds the same capability also to test/alternator/run - allowing to run
a current test/alternator test on older releases of Scylla. The
implementation in this patch reuses the same implementation from the
cql-pytest patch.

Here is an example use case: the pull request scylladb#19941 claimed that
a certain bug fix was backported to release 6.0. Was it? Let's run
the test reproducing that bug on two releases:

test/alternator/run --release 6.0 test_streams.py::test_stream_list_tables
test/alternator/run --release 6.1 test_streams.py::test_stream_list_tables

It shows that the test passes on 6.1 (so the bug is fixed there) but the
test fails 6.0. It turns out that although the fix was backported to
branch-6.0, this happened shortly after 6.0.4 was released and no later
6.0 minor release came afterwards! So the bug wasn't actually fixed
on any official release of 6.0.

Signed-off-by: Nadav Har'El <[email protected]>

Closes scylladb#21343
…hardcoding

cql3/statements

cql3/statements: allow indexes to be created with dummy custom class

cql3/statements: also set index_options in create_index_statement when custom class is provided
index/secondary_index_manager: change

index secondary_index_manager
@Balwancia Balwancia force-pushed the 1-implement-schema-management-layer-for-vector-search-indexes branch from 373d3df to 6228e3b Compare November 25, 2024 23:11
@Balwancia Balwancia closed this Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement schema management layer for vector search indexes
9 participants