You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe. #9996 allows us to test the full "valid" range of timestamps (0001-01-01 00:00:00.000000 to 9999-12-31 23:59:59.999999) in Spark. However, Spark can even support several invalid timestamps as well (negative years and 6 digit years). We should allow this full range of inputs to Spark with CPU and GPU support.
The text was updated successfully, but these errors were encountered:
Another aspect to consider: When you pass a Python datetime object with timezone information, then it will be converted to UTC before sending it Spark. This produces an date value out of range Python exception.
However, this also means that the effective range for testing dates for timestamps which have positive offset from UTC is actually restricted. You can only start datetime values at 0001-01-01 00:00:00.000000 UTC time, so a time that is 0001-01-01 00:00:00.000000 in a local timezone ahead of UTC cannot actually be sent back to Spark from Python because it will be converted to UTC before sending to Spark (which will be a Year 0 timestamp in UTC) that is out of range. That value is still valid though for ANSI purposes (0001-01-01 00:00:00.000000 to 9999-12-31 23:59:59.999999 is the valid range) and Spark is okay with those values.
I think the best course of action here is to write tests in Scala to handle multiple non-UTC timezones and dates that are in the invalid (non-positive and >9999 years) instead of trying to do this in Python because it's difficult to change how Python handles things in the PythonRunner on the executor.
Is your feature request related to a problem? Please describe.
#9996 allows us to test the full "valid" range of timestamps (
0001-01-01 00:00:00.000000
to9999-12-31 23:59:59.999999
) in Spark. However, Spark can even support several invalid timestamps as well (negative years and 6 digit years). We should allow this full range of inputs to Spark with CPU and GPU support.The text was updated successfully, but these errors were encountered: