Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support testing beyond the range of Python datetime range #10040

Open
Tracked by #6839
NVnavkumar opened this issue Dec 13, 2023 · 2 comments
Open
Tracked by #6839

[FEA] Support testing beyond the range of Python datetime range #10040

NVnavkumar opened this issue Dec 13, 2023 · 2 comments
Assignees
Labels
test Only impacts tests

Comments

@NVnavkumar
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
#9996 allows us to test the full "valid" range of timestamps (0001-01-01 00:00:00.000000 to 9999-12-31 23:59:59.999999) in Spark. However, Spark can even support several invalid timestamps as well (negative years and 6 digit years). We should allow this full range of inputs to Spark with CPU and GPU support.

@NVnavkumar NVnavkumar added feature request New feature or request ? - Needs Triage Need team to review and classify test Only impacts tests labels Dec 13, 2023
@mattahrens mattahrens removed feature request New feature or request ? - Needs Triage Need team to review and classify labels Dec 19, 2023
@NVnavkumar
Copy link
Collaborator Author

Another aspect to consider: When you pass a Python datetime object with timezone information, then it will be converted to UTC before sending it Spark. This produces an date value out of range Python exception.

However, this also means that the effective range for testing dates for timestamps which have positive offset from UTC is actually restricted. You can only start datetime values at 0001-01-01 00:00:00.000000 UTC time, so a time that is 0001-01-01 00:00:00.000000 in a local timezone ahead of UTC cannot actually be sent back to Spark from Python because it will be converted to UTC before sending to Spark (which will be a Year 0 timestamp in UTC) that is out of range. That value is still valid though for ANSI purposes (0001-01-01 00:00:00.000000 to 9999-12-31 23:59:59.999999 is the valid range) and Spark is okay with those values.

@NVnavkumar NVnavkumar self-assigned this Jan 5, 2024
@NVnavkumar
Copy link
Collaborator Author

I think the best course of action here is to write tests in Scala to handle multiple non-UTC timezones and dates that are in the invalid (non-positive and >9999 years) instead of trying to do this in Python because it's difficult to change how Python handles things in the PythonRunner on the executor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
Development

No branches or pull requests

2 participants