TRT-1338: WIP: Potential replacement for the remaining 35 minute matview: Test analysis by job #2074

dgoodwin · 2024-10-31T17:24:07Z

In an attempt to ultimately allow the test details page to show more history than 2 weeks, I went after the last slow matview. The theory was we were wasting time recalculating past days that no longer change. I wanted to replace it with a daily summary, calculated by bigquery, stored in a permanent postgresql table.

This is implemented here, the problem is that the insert for a single day is 25 minutes (about 1.5 million rows a day), vs the 35 minutes for 14 days prior. Inserting is very slow.

On a day by day basis, we could probably live with that, there would just be a many hour initial load (we could immediately go back as far as we want). We should then be able to go further back with the charts.

However it is slow even for a day at a time.

Alternatively, with this query implemented, we could just live query from bigquery for that API and begin mixing bigquery into sippy classic. We'd just need to filter down to the jobs sippy knows about. (A variant could actually be helpful there)

openshift-ci · 2024-10-31T17:25:23Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgoodwin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [dgoodwin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

stbenjam · 2024-10-31T17:45:54Z

Did you try any bigger batch sizes? These rows are small, I bet you could insert 50K at once

dgoodwin · 2024-11-01T10:48:31Z

Did you try any bigger batch sizes? These rows are small, I bet you could insert 50K at once

I tried 10k and quickly ran into that parameter size problem we always used to see on postgres. I'll test 5 or so.

stbenjam · 2024-11-01T12:17:58Z

Oh ok, probably not a huge deal, we can just let it run over a weekend for the initial seeding

dgoodwin · 2024-11-01T12:18:37Z

Down to 17m per day by wrapping the whole creation in one transaction instead of many smaller ones. This would only be for one fetchdata per day once we cross the 8am UTC threshold.

dgoodwin · 2024-11-01T14:33:17Z

How do we feel about the general approach? I've got it loading up a weeks worth of data to prod now, if we like it I can try to seed 2 months manually letting it run for approx 18 hours.

dgoodwin · 2024-11-01T15:04:53Z

I've loaded up a week of data in prod db in 2 hours. I then setup testview to do the variant query against this new table and it still returns immediately. I suspect this approach will let us chart much longer.

openshift-ci-robot · 2024-11-14T12:12:26Z

@dgoodwin: This pull request references TRT-1338 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

In an attempt to ultimately allow the test details page to show more history than 2 weeks, I went after the last slow matview. The theory was we were wasting time recalculating past days that no longer change. I wanted to replace it with a daily summary, calculated by bigquery, stored in a permanent postgresql table.

This is implemented here, the problem is that the insert for a single day is 25 minutes (about 1.5 million rows a day), vs the 35 minutes for 14 days prior. Inserting is very slow.

On a day by day basis, we could probably live with that, there would just be a many hour initial load (we could immediately go back as far as we want). We should then be able to go further back with the charts.

However it is slow even for a day at a time.

Alternatively, with this query implemented, we could just live query from bigquery for that API and begin mixing bigquery into sippy classic. We'd just need to filter down to the jobs sippy knows about. (A variant could actually be helpful there)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

dgoodwin · 2024-11-14T12:15:17Z

Added a couple new commits that import one day at a time transactionally, and autocreate partitions for each day. Then will test how queries perform with this in place, hopefully we can do our standard 2 week query and allow the user to select longer ranges and wait if desired, as we discussed.

Creation of the partition table is manual, I'll have to figure that out before this can go in.

CREATE TABLE test_analysis_by_job_by_dates (                                                                                                                             
    date timestamp with time zone,                                                                                                                                       
    test_id bigint,                                                                                                                                                      
    release text,                                                                                                                                                        
    job_name text,                                                                                                                                                       
    test_name text,                                                                                                                                                      
    runs bigint,                                                                                                                                                         
    passes bigint,                                                                                                                                                       
    flakes bigint,                                                                                                                                                       
    failures bigint                                                                                                                                                      
) PARTITION BY RANGE (date);                                                                                                                                             
                                                                                                                                                                         
CREATE UNIQUE INDEX test_release_date                                                                                                                                    
ON test_analysis_by_job_by_dates (date, test_id, release, job_name);

INFO[2024-11-13T13:54:28.094-04:00] loading variants from bigquery...                                                                                                    INFO[2024-11-13T13:54:35.234-04:00] variants loaded from bigquery in 7.140694503s  jobs=15069
INFO[2024-11-13T13:54:46.678-04:00] job cache created with 7714 entries from database 
INFO[2024-11-13T13:54:46.678-04:00] starting 1 loaders...                                                                                                                
INFO[2024-11-13T13:54:46.678-04:00] starting loader "prow" with metrics wrapper                                                                                          
INFO[2024-11-13T13:54:46.753-04:00] importing test analysis by job for dates: [2024-10-29 2024-10-30 2024-10-31 2024-11-01 2024-11-02 2024-11-03 2024-11-04 2024-11-05 20
24-11-06 2024-11-07 2024-11-08 2024-11-09 2024-11-10 2024-11-11 2024-11-12]       
INFO[2024-11-13T13:54:47.398-04:00] job cache created with 7714 entries from database            
INFO[2024-11-13T13:54:52.402-04:00] test cache created with 125532 entries from database         
INFO[2024-11-13T13:54:52.402-04:00] Loading test analysis by job daily summaries  date=2024-10-29
INFO[2024-11-13T13:54:52.402-04:00] CREATE TABLE IF NOT EXISTS test_analysis_by_job_by_dates_2024_10_29 PARTITION OF test_analysis_by_job_by_dates
                FOR VALUES FROM ('2024-10-29') TO ('2024-10-30');  date=2024-10-29
INFO[2024-11-13T13:54:52.433-04:00] partition created                             date=2024-10-29
INFO[2024-11-13T13:55:15.442-04:00] inserting 1406285 rows                        date=2024-10-29                                                                        INFO[2024-11-13T14:27:41.439-04:00] insert complete after 32m25.997654534s        date=2024-10-29
INFO[2024-11-13T14:27:41.439-04:00] Loading test analysis by job daily summaries  date=2024-10-30
INFO[2024-11-13T14:27:41.439-04:00] CREATE TABLE IF NOT EXISTS test_analysis_by_job_by_dates_2024_10_30 PARTITION OF test_analysis_by_job_by_dates
                FOR VALUES FROM ('2024-10-30') TO ('2024-10-31');  date=2024-10-30                                                                                       
INFO[2024-11-13T14:27:41.521-04:00] partition created                             date=2024-10-30
INFO[2024-11-13T14:28:06.475-04:00] inserting 1542754 rows                        date=2024-10-30
        INFO[2024-11-13T14:58:18.866-04:00] insert complete after 30m12.390888535s        date=2024-10-30
INFO[2024-11-13T14:58:18.866-04:00] Loading test analysis by job daily summaries  date=2024-10-31                                                                        INFO[2024-11-13T14:58:18.866-04:00] CREATE TABLE IF NOT EXISTS test_analysis_by_job_by_dates_2024_10_31 PARTITION OF test_analysis_by_job_by_dates
                FOR VALUES FROM ('2024-10-31') TO ('2024-11-01');  date=2024-10-31                                                                                       
INFO[2024-11-13T14:58:18.914-04:00] partition created                             date=2024-10-31
INFO[2024-11-13T14:58:39.345-04:00] inserting 1427864 rows                        date=2024-10-31
INFO[2024-11-13T15:23:49.438-04:00] insert complete after 25m10.093027879s        date=2024-10-31

dgoodwin · 2024-11-15T12:42:49Z

First test, oct 29 - nov 14, 11s with no caching.
Daily imports take 20-30 minutes. I'm manually triggering them each day. Soon we'll have a good bit more than 2 weeks and can see how it would work if someone wanted to extend the date range.

openshift-ci · 2024-11-18T15:06:02Z

@dgoodwin: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/lint	`12c6d5b`	link	true	`/test lint`
ci/prow/e2e	`12c6d5b`	link	true	`/test e2e`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

dgoodwin added 9 commits October 30, 2024 14:02

Define new test_analysis_by_job_for_dates table to replace matview

045c0c9

Fix new table date column

aeff3a4

Running the query and parsing results into go types

94ac7c3

Deal with bigquery vs postgresql Date

1371aed

Create rest of row for postgres insertion

30a59de

Batch insertion

ed4535b

Implement calculation for date range

dc5928f

Use the date ranges

e6ef289

Now functional, but slow

43eb6aa

dgoodwin changed the title ~~append test analysis by job~~ WIP: Potential replacement for the remaining 35 minute matview: Test analysis by job Oct 31, 2024

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 31, 2024

openshift-ci bot requested review from stbenjam and xueqzhan October 31, 2024 17:25

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 31, 2024

Wrap batch insert in one overall transaction for a ~40% reduction

3acca89

dgoodwin added 4 commits November 1, 2024 13:45

temp

55116ff

wip

aa15118

Refactor to import one day at a time for better transactions and timings

1e08fad

Partition the test analysis by job by dates table automatically

1727522

dgoodwin changed the title ~~WIP: Potential replacement for the remaining 35 minute matview: Test analysis by job~~ TRT-1338: WIP: Potential replacement for the remaining 35 minute matview: Test analysis by job Nov 14, 2024

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 14, 2024

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 14, 2024

Add schema to create the partitioned table, but not update it ever

12c6d5b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRT-1338: WIP: Potential replacement for the remaining 35 minute matview: Test analysis by job #2074

TRT-1338: WIP: Potential replacement for the remaining 35 minute matview: Test analysis by job #2074

dgoodwin commented Oct 31, 2024 •

edited

Loading

openshift-ci bot commented Oct 31, 2024

stbenjam commented Oct 31, 2024

dgoodwin commented Nov 1, 2024

stbenjam commented Nov 1, 2024 •

edited

Loading

dgoodwin commented Nov 1, 2024

dgoodwin commented Nov 1, 2024

dgoodwin commented Nov 1, 2024

openshift-ci-robot commented Nov 14, 2024 •

edited by openshift-ci bot

Loading

dgoodwin commented Nov 14, 2024

dgoodwin commented Nov 15, 2024

openshift-ci bot commented Nov 18, 2024

TRT-1338: WIP: Potential replacement for the remaining 35 minute matview: Test analysis by job #2074

Are you sure you want to change the base?

TRT-1338: WIP: Potential replacement for the remaining 35 minute matview: Test analysis by job #2074

Conversation

dgoodwin commented Oct 31, 2024 • edited Loading

openshift-ci bot commented Oct 31, 2024

stbenjam commented Oct 31, 2024

dgoodwin commented Nov 1, 2024

stbenjam commented Nov 1, 2024 • edited Loading

dgoodwin commented Nov 1, 2024

dgoodwin commented Nov 1, 2024

dgoodwin commented Nov 1, 2024

openshift-ci-robot commented Nov 14, 2024 • edited by openshift-ci bot Loading

dgoodwin commented Nov 14, 2024

dgoodwin commented Nov 15, 2024

openshift-ci bot commented Nov 18, 2024

dgoodwin commented Oct 31, 2024 •

edited

Loading

stbenjam commented Nov 1, 2024 •

edited

Loading

openshift-ci-robot commented Nov 14, 2024 •

edited by openshift-ci bot

Loading