Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(es): add java date format support #5248

Conversation

etolbakov
Copy link
Collaborator

@etolbakov etolbakov commented Jul 23, 2024

Description

add conversion from elasticsearch/opensearch date format to quickwit expected strptime

How was this PR tested?

Provide curl output below

Related Issue

#5228

The code is take from #5237

@etolbakov
Copy link
Collaborator Author

Testing results

Request 1

curl -X GET "http://localhost:7280/api/v1/_elastic/stackoverflow-schemaless/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "range": {
      "age": {
        "gte": "20230101",
        "lte": "20231231",
        "boost": 2.0,
        "format": "yyyyMMdd"
      }
    }
  }
}
'
{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "hits": []
  }
}%

Request 2

curl -X GET "http://localhost:7280/api/v1/_elastic/stackoverflow-schemaless/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "range": {
      "age": {
        "gte": "20230101",
        "lte": "20231231",
        "boost": 2.0,
        "format": "basic_date"
      }
    }
  }
}
'
{
  "took": 11,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "hits": []
  }
}%

Request 3 (failure case)

curl -X GET "http://localhost:7280/api/v1/_elastic/stackoverflow-schemaless/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "range": {
      "age": {
        "gte": "2023-01-01",
        "lte": "2023-12-31",
        "boost": 2.0,
        "format": "yyyy-MM-dd'T'HH:mm:ss.SSSZ"
      }
    }
  }
}
'
{
  "status": 400,
  "error": {
    "caused_by": null,
    "reason": "failed to parse range query date format. failed to parse date format `yyyy-MM-ddTHH:mm:ss.SSSZ`. Pattern at pos 10 is not recognized.",
    "stack_trace": null,
    "type": null
  }
}%

@etolbakov
Copy link
Collaborator Author

@fulmicoton just a ping about this change.
Could you please take a look when you have time?

Comment on lines 180 to 186
let format = if java_datetime_format.contains('_') {
java_date_format_aliases()
.get(java_datetime_format)
.unwrap_or(&java_datetime_format)
} else {
java_datetime_format
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let format = if java_datetime_format.contains('_') {
java_date_format_aliases()
.get(java_datetime_format)
.unwrap_or(&java_datetime_format)
} else {
java_datetime_format
};
// We check if the date format is present in our alias dictionary.
let java_date_format_resolved: &str = java_date_format_aliases()
.get(java_datetime_format)
.unwrap_or(&java_datetime_format);

@@ -18,10 +18,12 @@ lindera-core = { workspace = true, optional = true }
lindera-dictionary = { workspace = true, optional = true }
lindera-tokenizer = { workspace = true, optional = true }
once_cell = { workspace = true }
regex = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
regex = { workspace = true }

serde = { workspace = true }
serde_json = { workspace = true }
serde_with = { workspace = true }
tantivy = { workspace = true }
time = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
time = { workspace = true }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is required by range_query.rs

use time::format_description::well_known::Rfc3339;

(r#"yy(yy)?"#, build_year_item),
(r#"MM?"#, build_month_item),
(r#"dd?"#, build_day_item),
(r#"''"#, |_| Some(literal(b"'"))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are still missing a lot of the patterns... Hours etc.

@fulmicoton
Copy link
Contributor

We need rest unit tests and to handle more patterns.

JAVA_DATE_FORMAT_ALIASES.get_or_init(|| {
let mut m = HashMap::new();
m.insert("date_optional_time", "yyyy-MM-dd'T'HH:mm:ss.SSSZ");
m.insert("strict_date_optional_time", "yyyy-MM-dd'T'HH:mm:ss.SSSZ");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we handle this format right now.

Copy link

github-actions bot commented Aug 1, 2024

On SSD:

Average search latency is 0.875x that of the reference (lower is better).
Ref run id: 2433, ref commit: be20923
Link

On GCS:

Average search latency is 0.896x that of the reference (lower is better).
Ref run id: 2435, ref commit: be20923
Link

@fulmicoton fulmicoton force-pushed the fix-incompatible-es-java-date-format branch from d7adf92 to c22c245 Compare August 2, 2024 03:33
@kuzaxak
Copy link
Collaborator

kuzaxak commented Sep 28, 2024

@etolbakov Can I reopen it from local branch in QW repo?

@kuzaxak
Copy link
Collaborator

kuzaxak commented Sep 28, 2024

@etolbakov I think we can close this one and continue in a local branch here I refactored you tokenizer to support nested optional and uncommented test cases (they are passing on my local).

@kuzaxak kuzaxak closed this Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants