-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add list_fields api #4242
add list_fields api #4242
Conversation
13d3418
to
cbed13d
Compare
// The field name | ||
string field_name = 1; | ||
// The tantivy field type | ||
int32 field_type = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment on how it is mapped? Why signed int?
int32 field_type = 2; | |
int32 field_type = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment (Type::to_code()
) and switched to u32
// False means the field is not searchable in any indices. | ||
bool searchable = 4; | ||
// True means the field is aggregatable (fast) in at least some indices. | ||
// False means the field is not aggregatable in any indices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you really mean indices in all of this proto, or did you mean split ids?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes indices. There is no use for split ids in the API currently
index_ids: vec![index_id.to_string()], | ||
searchable: metadata.indexed, | ||
aggregatable: metadata.fast, | ||
non_searchable_index_ids: Vec::new(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this contract.
So you return io::Result and lose the info of the missing split ids?
What are non_searchable_index_ids ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When do we return an ioError as an item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'm not sure what to do with the error, currently we abort and return. This is happening in the merge function. I guess we want to return the failed split_ids for a retry. There's no retry currently.
let entry = entry.map_err(|err| crate::error::SearchError::Internal(format!("{:?}", err)))?; // TODO: No early return on error
non_searchable_index_ids
are exemption list of indexes if searchable: true
. I took this from the elastic search API.
quickwit/quickwit-serve/src/elastic_search_api/model/field_capability.rs
Show resolved
Hide resolved
c7b5f2c
to
e33a614
Compare
quickwit/quickwit-search/Cargo.toml
Outdated
@@ -50,6 +50,7 @@ quickwit-opentelemetry = { workspace = true } | |||
quickwit-proto = { workspace = true } | |||
quickwit-query = { workspace = true } | |||
quickwit-storage = { workspace = true } | |||
quickwit-indexing = { workspace = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quickwit-indexing contains the code to write and read the split fields. I could move it, but not sure where to
let aggregatable = current_group.iter().any(|entry| entry.aggregatable); | ||
let field_name = metadata.field_name.to_string(); | ||
let field_type = metadata.field_type; | ||
let mut non_searchable_index_ids = if searchable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the type here?
let mut non_searchable_index_ids = if searchable { | |
let mut non_searchable_index_ids = if searchable { |
Also if perf do not matter, BTreeSet<&str>
is a simple way to inform users and maintain the contract that this is sorted and unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is building the protobuf API response, which requires Vec
229484d
to
c2ba338
Compare
# Test date _field_caps API | ||
method: [GET] | ||
engines: | ||
- quickwit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how far are we from elastic's api?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most important features are supported, some minor parameters are not implemented yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we drop
engines:
- quickwit
which part breaks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it works without, seems to pick quickwit as default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean running it against elasticsearch without it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main differences are:
- Mixed types are not supported in elastic, so this test doesn't work in elastic. We get multiple entries for mixed types. Support elasticsearch field capabilities endpoint #3527 (comment)
- We have different coercion rules (e.g. i64, u64, f64 preference order)
- Mapping of text types. Currently we map text fields with fast to
keyword
and indexed totext
.keyword
is not on the same fieldname in elastic but a new one. - We map our floats to
double
, but elastic seems to usefloat
when the type is detected
// The field name | ||
string field_name = 1; | ||
// The tantivy field type mapped via `to_code`. | ||
uint32 field_type = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make it a proto enum to remove the dependency in quickwit-serve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather reuse the existing code, tantivy is in the dependency tree anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code goes
- serialize list of fields as a file artifact with code living in quickwit-indexing, and using tantivy's types.
- convert artifact to protobuf
Would using protobuf to begin lead to better code?
We could remove code, and drop the quickwit-search -> quickwit-indexing dependency
I don't think this would be better code, since this would be creating duplicated code. The existing type does exactly what we want and tantivy is already in the dependency tree. |
a38df39
to
29b4f51
Compare
}; | ||
let list_fields_iter = list_fields_iter.filter(|field| { | ||
if let Ok(field) = field { | ||
// We don't want to leak the _dynamic hack to the user API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to remove the _dynamic
from the field names as we don't want to leak that to the users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so too. I wasn't completely sure if there's a case where we need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
96b349c
to
4aab7d9
Compare
handle wildcards handle _dynamic special case
0f18dcb
to
f163317
Compare
{index_id}/_field_caps
URL/_field_caps
URLfields
filter (my-index-000001/_field_caps?fields=rating
)