Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aatams acoustic qc time zone handling tos7 #744

Closed
jonescc opened this issue May 27, 2019 · 12 comments
Closed

aatams acoustic qc time zone handling tos7 #744

jonescc opened this issue May 27, 2019 · 12 comments

Comments

@jonescc
Copy link
Contributor

jonescc commented May 27, 2019

Running aatams_acoustic_qc on 9-nec-hob gives different results to 6-nec-hob for atf_acoustic_qc_detections_map:

--- /home/craigj/git-repos/aodn/utilities/pipeline/talend_test/harvest_results/6-nec-hob.emii.org.au/aatams_acoustic_qc/action_0_ADD /aatams_acoustic_qc/atf_acoustic_qc_detections_map.csv
+++ /home/craigj/git-repos/aodn/utilities/pipeline/talend_test/harvest_results/9-nec-hob.emii.org.au/aatams_acoustic_qc/action_0_ADD /aatams_acoustic_qc/atf_acoustic_qc_detections_map.csv
@@ -1,2 +1,2 @@
 id,file_id,url,transmitter_id,tag_id,release_id,tag_project_name,installation_name,station_name,no_detections,Detection_QC,common_name,sex,first_detection_date,last_detection_date,geom
-1,1,IMOS/AATAMS/acoustic_tagging/A69-1105-102_5354094_7579005.csv,A69-1105-102,5354094,7579005,MQ Bronte-Coogee Aquatic reserve (BCAR),BCAR,BCAR 8,2,2,Eastern Blue Groper,NA,2009-08-26 14:00:00+00,2009-09-03 14:00:00+00,0101000020E61000007A8D5DA27AE86240F37684D382F740C0
+1,1,IMOS/AATAMS/acoustic_tagging/A69-1105-102_5354094_7579005.csv,A69-1105-102,5354094,7579005,MQ Bronte-Coogee Aquatic reserve (BCAR),BCAR,BCAR 8,2,2,Eastern Blue Groper,NA,2009-08-25 14:00:00+00,2009-09-02 14:00:00+00,0101000020E61000007A8D5DA27AE86240F37684D382F740C0
@lbesnard
Copy link
Contributor

@jonescc I don't know if it's a time zone handling issue. The difference is a full 24 hours. Also, the definition of both tables and views seems to be good to me in regards to time handling

@jonescc
Copy link
Contributor Author

jonescc commented May 27, 2019

I suspect its related to this:

date(min(detection_timestamp)) AS first_detection_date,
date(max(detection_timestamp)) AS last_detection_date

"PostgreSQL assumes your local time zone for any type containing only date or time. All timezone-aware dates and times are stored internally in UTC . They are converted to local time in the zone specified by the timezone configuration parameter before being displayed to the client." - https://www.postgresql.org/docs/9.1/datatype-datetime.html

i.e. date's don't have an associated time zone in postgres (they are relative to server time). This is then being passed through a java process which is time zone sensitive. I suspect the jdbc driver behaviour has changed in the upgrade. For Bene she got a better result (there was actually a problem in production).

First thing to check is whether the processing is actually correct on 6-nec-hob/prod or 9-nec-hob.

Either way, processing like this (dates/timestamps relative to server time) are dangerous so I'd look at another way of truncating the timestamp to just a date e.g. date_trunc (they're actually being stored as timestamps with time zones)

@lbesnard
Copy link
Contributor

@jonescc , xavier initially wrote this harvester. I'm quite confused about all these date issues:

atf_acoustic_qc_detections_map table

The atf_acoustic_qc_detections_map table definition is
first_detection_date timestamp with time zone
which seems right.

The query to populate it is:

date(min(detection_timestamp)) AS first_detection_date

As an example (ran on RC)

select first_detection_date from aatams_acoustic_qc.atf_acoustic_qc_detections_map limit 1;
┌────────────────────────┐
│  first_detection_date  │
├────────────────────────┤
│ 2009-08-29 14:00:00+00 │
└────────────────────────┘
(1 row)

In that case, it's not exactly what we would call a date since there is an hour in it. I'm not sure if it' meant to be this way when Xavier wrote this.

However if we run the following query in postgres, we actually get a proper date back:

SELECT date('2008-08-18 03:16:03+00');
┌────────────┐
│    date    │
├────────────┤
│ 2008-08-18 │
└────────────┘

So first of all I'm already quite surprised of the difference of behaviour between the same query giving different results ran within talend and psql directly.

If we replace
date(min(detection_timestamp)) AS first_detection_date

with

date_trunc('day', TIMESTAMP min(detection_timestamp)) AS first_detection_date

I'm not sure this is the intendent result.

since for example:

SELECT date_trunc('day', TIMESTAMP '2008-08-18 03:16:03+00');
┌─────────────────────┐
│     date_trunc      │
├─────────────────────┤
│ 2008-08-18 00:00:00 │
└─────────────────────┘
(1 row)

@jonescc
Copy link
Contributor Author

jonescc commented May 28, 2019

Yes, I guess I'm assuming that the first date is the date of the first detection relative to the UTC time zone. What's currently happening in RC and 6-nec-hob is that the first date is the date of the first detection relative to the AEST time zone? Is this wrong? I thought all our dates/timestamps etc were meant to be UTC. Either way - yes we need to be clear on what the correct result is and ensure that's what's calculated.

@jonescc
Copy link
Contributor Author

jonescc commented Jun 5, 2019

So looking at this further on 6-nec-hob:

In the file (A69-1105-102_5354094_7579005.txt):

transmitter_id installation_name station_name receiver_name detection_timestamp longitude latitude sensor_value sensor_unit FDA_QC Velocity_QC Distance_QC DetectionDistribution_QC DistanceRelease_QC ReleaseDate_QC ReleaseLocation_QC Detection_QC
A69-1105-102 BCAR BCAR 8 VR2W-105861 2009-08-27 07:49:34 151.26497 -33.93368 7.91 m 2 1 1 1 1 1 1 2
A69-1105-102 BCAR BCAR 8 VR2W-105861 2009-09-04 02:40:56 151.26497 -33.93368 5.28 m 2 1 1 1 1 1 1 2

In qc_detections_data:

transmitter_id installation_name station_name receiver_name detection_timestamp longitude latitude sensor_value sensor_unit FDA_QC Velocity_QC Distance_QC DetectionDistribution_QC DistanceRelease_QC ReleaseDate_QC ReleaseLocation_QC Detection_QC geom
A69-1105-102 BCAR BCAR 8 VR2W-105861 2009-08-26 21:49:34+00 151.26497 -33.93368 7.91 m 2 1 1 1 1 1 1 2 0101000020E61000007A8D5DA27AE86240F37684D382F740C0
A69-1105-102 BCAR BCAR 8 VR2W-105861 2009-09-03 16:40:56+00 151.26497 -33.93368 5.28 m 2 1 1 1 1 1 1 2 0101000020E61000007A8D5DA27AE86240F37684D382F740C0

So the dates in the csv file have been assumed to be in the local time zone (i.e. AEST)

And the result when creating the view (min/max dates from this) on 6-nec-hob is:

transmitter_id installation_name station_name common_name first_detection_date last_detection_date
A69-1105-102 BCAR BCAR 8 Eastern Blue Groper 2009-08-26 14:00:00+00 2009-09-03 14:00:00+00

On 9-nec-hob it is:

transmitter_id installation_name station_name common_name first_detection_date last_detection_date
A69-1105-102 BCAR BCAR 8 Eastern Blue Groper 2009-08-25 14:00:00+00 2009-09-02 14:00:00+00

So looks to me like there are some time zone issues to sort out for TOS5/7

@lbesnard
Copy link
Contributor

lbesnard commented Jun 5, 2019

@jonescc I'll definitely need your help.
I don't understand what talend is doing with the times while harvesting text files. It just doesn't make any sense to me.

This harvester harvests 2 files. A static metadata file, as well as a detection file. I just can't get the right time values output via talend.

@jonescc
Copy link
Contributor Author

jonescc commented Jun 5, 2019

No worries. Looks like its coming in as a string and Xavier was using TalendDate to parse as a date.
Probably just need to append the timezone to the string and specify to parse using the supplied time zone. e.g.

row4.embargo_date.equals("NA")?null:TalendDate.parseDate("yyyy-MM-ddZ",row4.embargo_date+"+00:00")

Means any use of TalendDate.parseDate without doing this is probably suspect.

@lbesnard
Copy link
Contributor

lbesnard commented Jun 6, 2019

should be fixed by #757

@jonescc
Copy link
Contributor Author

jonescc commented Jun 6, 2019

My bad, TalendDate.parseDate("yyyy-MM-ddZ",row4.embargo_date+"+00:00") doesn't work as using Z only allows +0000. Needs to be X instead of Z or remove the ':'. See https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html

@ggalibert
Copy link
Contributor

Means any use of TalendDate.parseDate without doing this is probably suspect.

Can we please list all the harvesters that might be impacted?

@lbesnard
Copy link
Contributor

lbesnard commented Jun 6, 2019

Can we please list all the harvesters that might be impacted?

done here
#758

@jonescc
Copy link
Contributor Author

jonescc commented Jun 6, 2019

Resolved by #757

@jonescc jonescc closed this as completed Jun 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants