-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Can't handle "date" format from json schema #77
Comments
hi @balmasi , do you mind sending an example of what the tap is exporting that is causing this to fail? I mean, not just the schema but also the data. Basically a complete example I can pipe into this target. I'll fix that and add it to the tests |
This should be easy to fix if #86 is merged. |
Sure, here's some minimal stream output for you to {"type": "SCHEMA", "stream": "my_stream", "schema": {"properties": {"id": {"type": ["integer"]},"some_date": {"format": "date", "type": ["string", "null"]}}, "type": "object"}, "key_properties": ["id"], "bookmark_properties": ["id"]}
{"type": "RECORD", "stream": "my_stream", "record": {"id": 1, "some_date": null}, "time_extracted": "2022-02-22T18:25:53.171486Z"}
{"type": "RECORD", "stream": "my_stream", "record": {"id": 2, "some_date": "2021-03-24"}, "time_extracted": "2022-02-22T18:25:53.186510Z"}
{"type": "STATE", "value": {"bookmarks": {"my_stream": {"replication_key": "id", "replication_key_value": 2}}}} |
thanks for that @balmasi ! This is almost ready and I'm just doing another round of testing to see if there's something I missed. |
I ran into this issue today when developing a custom tap for quickbase. Plenty of date-only fields there. |
sorry about that. I just merged the fix for this. do you mind checking the newer code and letting me know how it goes for you? thanks! |
@jmriego - Looks like that commit is mapping date fields to BQ timestamp fields. Can you fix it so it maps them to regular date fields so that it better aligns with the source data? |
hi @gnilrets ! do you mind testing with the |
@jmriego . I tried it, and it is loading dates, but it's got some strange behavior. Dates that are missing in the source system are being parsed as the maximum allowable date? Here's the warning I'm getting:
|
my guess is that the tap you are developing outputs an empty string instead of a date in the case of nulls. Copying a message from above, this is what a null value in a date field looks like and can be processed correctly
As you have found out anyway, if a date is not possible to process it defaults to the max value. This should appear very rarely. I've seen it mostly in Postgres that accepts dates way over 9999-12-31 which is the maximum BigQuery can read |
@jmriego - Yeah, that's definitely it. I'm developing a tap and trying to feed it into BigQuery. I was able to correct it in my tap so that dates get output as null and when I do that it works as expected. However, I'm new to using singer taps/targets and I know that I'm not always going to have control over what the taps do, particularly when it comes to these kinds of edge cases. I wouldn't ever expect a blank string to be interpreted as the largest date available in the database (why not the smallest?). Is there some reason you would want to handle a null vs an empty string differently? Or is this just something I should file as an issue to be resolved? |
basically what is happening is that I'm using the standard Python libraries for dates. Those libraries don't accept the same range of dates as some databases. MySQL for example can have things like So we end up with two questions:
In my opinion it's the job of the tap, as I don't think adding new taps to the Singer ecosystem should mean all targets need to account for quirks in the new database/API we are reading from. But this sounds to me like a great discussion to have around Singer standards. Are you in the Meltano slack community? It would be great if we had this discussion either there or in their repo so we can get more input from them as they are working on defining the Singer standard |
I guess I'm just of the opinion that if you're going to fallback to some value when a date string can't be parsed, using NULL would be more appropriate than the largest date a db can support. I'm on Meltano slack and would be happy to continue that conversation there. |
the fix for the date type is already in prod. when a tap sends a date, it will no longer appear as a timestamp. the other point about wrong dates... we could make the parsing for wrong dates configurable. So far I'm going to leave it as defaulting to max date the same way they are doing for other targets in Meltano. please let me know if the date parsing is fixed for you after this. thanks! |
Issue
it seems like the target can't handle a field with the following schema
it does. however, handle the
date-time
format.Here's the trace-back:
Expected Outcome
Well, the writer should just convert that to the DATE type and happily write it to BQ
Workaround
using the
date-time
orstring
types in the schema message for a streamThe text was updated successfully, but these errors were encountered: