Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add list_fields api #4242

Merged
merged 17 commits into from
Dec 18, 2023
Merged

add list_fields api #4242

merged 17 commits into from
Dec 18, 2023

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented Dec 6, 2023

  • Retry
  • List fields from tantivy
  • Fallback to schema if field list does not exist
  • {index_id}/_field_caps URL
  • /_field_caps URL
  • fields filter (my-index-000001/_field_caps?fields=rating)
  • API tests
  • Caching (very simple list field cache)
  • Parallel fetch of field list file
➜  quickwit-indices curl -XGET http://localhost:7280/api/v1/_elastic/hdfs/_field_caps -H 'Content-Type: application/json' -d ''
{
  "indices": [
    "hdfs"
  ],
  "fields": {
    "_dynamic.attributes.class": {
      "keyword": {
        "metadata_field": false,
        "searchable": true,
        "aggregatable": true,
        "indices": [
          "hdfs"
        ]
      },
      "text": {
        "metadata_field": false,
        "searchable": true,
        "aggregatable": true,
        "indices": [
          "hdfs"
        ]
      }
    },
    "_field_presence": {
      "long": {
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false,
        "indices": [
          "hdfs"
        ]
      }
    },
    "body": {
      "text": {
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false,
        "indices": [
          "hdfs"
        ]
      }
    },
    "timestamp": {
      "date_nanos": {
        "metadata_field": false,
        "searchable": true,
        "aggregatable": true,
        "indices": [
          "hdfs"
        ]
      }
    },
    "tenant_id": {
      "long": {
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false,
        "indices": [
          "hdfs"
        ]
      }
    },
    "resource.$facet:service": {
      "text": {
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false,
        "indices": [
          "hdfs"
        ]
      }
    },
    "severity_text": {
      "keyword": {
        "metadata_field": false,
        "searchable": true,
        "aggregatable": true,
        "indices": [
          "hdfs"
        ]
      },
      "text": {
        "metadata_field": false,
        "searchable": true,
        "aggregatable": true,
        "indices": [
          "hdfs"
        ]
      }
    }
  }
}⏎                                                                                                                                                                                                                               

// The field name
string field_name = 1;
// The tantivy field type
int32 field_type = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment on how it is mapped? Why signed int?

Suggested change
int32 field_type = 2;
int32 field_type = 2;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment (Type::to_code()) and switched to u32

// False means the field is not searchable in any indices.
bool searchable = 4;
// True means the field is aggregatable (fast) in at least some indices.
// False means the field is not aggregatable in any indices.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you really mean indices in all of this proto, or did you mean split ids?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes indices. There is no use for split ids in the API currently

index_ids: vec![index_id.to_string()],
searchable: metadata.indexed,
aggregatable: metadata.fast,
non_searchable_index_ids: Vec::new(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this contract.
So you return io::Result and lose the info of the missing split ids?
What are non_searchable_index_ids ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we return an ioError as an item.

Copy link
Contributor Author

@PSeitz PSeitz Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm not sure what to do with the error, currently we abort and return. This is happening in the merge function. I guess we want to return the failed split_ids for a retry. There's no retry currently.

let entry = entry.map_err(|err| crate::error::SearchError::Internal(format!("{:?}", err)))?; // TODO: No early return on error

non_searchable_index_ids are exemption list of indexes if searchable: true. I took this from the elastic search API.

@PSeitz PSeitz force-pushed the list_fields_api branch 2 times, most recently from c7b5f2c to e33a614 Compare December 7, 2023 15:10
@PSeitz PSeitz requested a review from fulmicoton December 11, 2023 07:58
@@ -50,6 +50,7 @@ quickwit-opentelemetry = { workspace = true }
quickwit-proto = { workspace = true }
quickwit-query = { workspace = true }
quickwit-storage = { workspace = true }
quickwit-indexing = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quickwit-indexing contains the code to write and read the split fields. I could move it, but not sure where to

let aggregatable = current_group.iter().any(|entry| entry.aggregatable);
let field_name = metadata.field_name.to_string();
let field_type = metadata.field_type;
let mut non_searchable_index_ids = if searchable {
Copy link
Contributor

@fulmicoton fulmicoton Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the type here?

Suggested change
let mut non_searchable_index_ids = if searchable {
let mut non_searchable_index_ids = if searchable {

Also if perf do not matter, BTreeSet<&str> is a simple way to inform users and maintain the contract that this is sorted and unique.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is building the protobuf API response, which requires Vec

@PSeitz PSeitz force-pushed the list_fields_api branch 3 times, most recently from 229484d to c2ba338 Compare December 12, 2023 06:07
# Test date _field_caps API
method: [GET]
engines:
- quickwit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how far are we from elastic's api?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most important features are supported, some minor parameters are not implemented yet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we drop

engines:
- quickwit

which part breaks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it works without, seems to pick quickwit as default

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean running it against elasticsearch without it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main differences are:

  • Mixed types are not supported in elastic, so this test doesn't work in elastic. We get multiple entries for mixed types. Support elasticsearch field capabilities endpoint #3527 (comment)
  • We have different coercion rules (e.g. i64, u64, f64 preference order)
  • Mapping of text types. Currently we map text fields with fast to keyword and indexed to text. keyword is not on the same fieldname in elastic but a new one.
  • We map our floats to double, but elastic seems to use float when the type is detected

// The field name
string field_name = 1;
// The tantivy field type mapped via `to_code`.
uint32 field_type = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it a proto enum to remove the dependency in quickwit-serve.

Copy link
Contributor Author

@PSeitz PSeitz Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather reuse the existing code, tantivy is in the dependency tree anyway

Copy link
Contributor

@fulmicoton fulmicoton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code goes

  • serialize list of fields as a file artifact with code living in quickwit-indexing, and using tantivy's types.
  • convert artifact to protobuf

Would using protobuf to begin lead to better code?
We could remove code, and drop the quickwit-search -> quickwit-indexing dependency

@PSeitz
Copy link
Contributor Author

PSeitz commented Dec 12, 2023

The code goes

* serialize list of fields as a file artifact with code living in quickwit-indexing, and using tantivy's types.

* convert artifact to protobuf

Would using protobuf to begin lead to better code? We could remove code, and drop the quickwit-search -> quickwit-indexing dependency

I don't think this would be better code, since this would be creating duplicated code. The existing type does exactly what we want and tantivy is already in the dependency tree.
The quickwit-search -> quickwit-indexing dependency has nothing to do with the tantivy types, this is because field de/serialziation is in models/list_fields.rs

quickwit/quickwit-proto/protos/quickwit/search.proto Outdated Show resolved Hide resolved
};
let list_fields_iter = list_fields_iter.filter(|field| {
if let Ok(field) = field {
// We don't want to leak the _dynamic hack to the user API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to remove the _dynamic from the field names as we don't want to leak that to the users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so too. I wasn't completely sure if there's a case where we need it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

quickwit/quickwit-search/src/list_fields.rs Outdated Show resolved Hide resolved
@PSeitz PSeitz force-pushed the list_fields_api branch 4 times, most recently from 96b349c to 4aab7d9 Compare December 15, 2023 09:29
@PSeitz PSeitz enabled auto-merge (squash) December 18, 2023 10:25
@PSeitz PSeitz merged commit 13f34b5 into main Dec 18, 2023
4 checks passed
@PSeitz PSeitz deleted the list_fields_api branch December 18, 2023 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants