Skip to content

Commit

Permalink
Accept (but ignore) doc timestamps with nanosecond precision
Browse files Browse the repository at this point in the history
Python's `datetime` library from the stdlib does not support nanosecond
precision in `strptime` [1], but several projects do generate such data
and ElasticSearch supports it.  (E.g. Vault audit logs generate such
timestamps.)

We could use a different library to parse these timestamp, but they seem
to be potentially slower than stdlib and we don't really need the
precision.

So this ignores any doc timestamp with too many digits (or really any
characters) in the sub-second part of the timestamp.

[1]: https://stackoverflow.com/questions/10611328/parsing-datetime-strings-containing-nanoseconds
  • Loading branch information
heyLu committed Mar 7, 2024
1 parent e7268e7 commit 71a9693
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 0 deletions.
8 changes: 8 additions & 0 deletions es_stream_logs.py
Original file line number Diff line number Diff line change
Expand Up @@ -643,6 +643,14 @@ def to_raw_es_query(query):

def parse_doc_timestamp(timestamp: str):
""" Parse the timestamp of an elasticsearch document. """

sub_second_split = timestamp.split(sep=".", maxsplit=1)
if len(sub_second_split) > 1 and len(sub_second_split[1]) > 7:
# sub second part too long, e.g. .1234567Z and strptime supports only
# up to 6 places (plus 'Z' timezone part)
sub_second_shortened = sub_second_split[1][:6] + sub_second_split[1][-1]
timestamp = sub_second_split[0] + "." + sub_second_shortened

try:
parsed = datetime.strptime(timestamp, '%Y-%m-%dT%H:%M:%S.%fZ')
except ValueError:
Expand Down
6 changes: 6 additions & 0 deletions test_es_stream_logs.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,12 @@ def test_full(self):
self.assertEqual(datetime.datetime(1970, 1, 1, 0, 0, 0, 123456),
parse_doc_timestamp('1970-01-01T00:00:00.123456Z'))

def test_too_long(self):
self.assertEqual(datetime.datetime(1970, 1, 1, 0, 0, 0, 123456),
parse_doc_timestamp('1970-01-01T00:00:00.123456999Z'))
self.assertEqual(datetime.datetime(1970, 1, 1, 0, 0, 0, 123456),
parse_doc_timestamp('1970-01-01T00:00:00.1234569999999999999999Z'))

def test_invalid(self):
self.assertRaises(ValueError, lambda: parse_doc_timestamp("not a timestamp"))

Expand Down

0 comments on commit 71a9693

Please sign in to comment.