-
Notifications
You must be signed in to change notification settings - Fork 660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upstream contributions from Union.ai #5769
Commits on Sep 30, 2024
-
Overlap create execution blob store reads/writes
This change modifies launch paths stemming from `launchExecutionAndPrepareModel` to overlap blob store write and read calls, which dominate end-to-end latency (as seen in the traces below). Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8437865 - Browse repository at this point
Copy the full SHA 8437865View commit details -
Overlap FutureFileReader blob store writes/reads
This change updates `FutureFileReader.Cache` and `FutureFileReader.RetrieveCache` to use overlapped write and reads, respectively, to reduce end-to-end latency. The read path is a common operation on each iteration of the propeller `Handle` loop for dynamic nodes. Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9a874fb - Browse repository at this point
Copy the full SHA 9a874fbView commit details -
I didn't chase down why assumptions changed here and why these tests broke, but fixing them with more explicit checks. Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ea56a96 - Browse repository at this point
Copy the full SHA ea56a96View commit details -
Overlap fetching input and output data
This change updates `GetExecutionData`, `GetNodeExecutionData`, and `GetTaskExecutionData` to use overlapped reads when fetching input and output data. Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3e1f249 - Browse repository at this point
Copy the full SHA 3e1f249View commit details -
Add configuration for launchplan cache resync duration
Currently, the launchplan cache resync duration uses the DownstreamEval duration configuration which is also used for the sync period on the k8s client. This means if we want to configure a more aggressive launchplan cache resync, we would also incur overhead in syncing all k8s resources (ex. Pods from `PodPlugin`). By adding a separate configuration value we can update these independently. Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6f2b2fe - Browse repository at this point
Copy the full SHA 6f2b2feView commit details -
Enqueue owner on launchplan terminal state
This PR enqueues the owner workflow for evaluation when the launchplan auto refresh cache detects a launchplan in a terminal state. Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 85ac321 - Browse repository at this point
Copy the full SHA 85ac321View commit details -
Register a few metric callbacks with the client-go metrics interface so that we can monitor request latencies and rate limiting of kubeclient. ``` ❯ curl http://localhost:10254/metrics | rg k8s_client k8s_client_rate_limiter_latency_bucket{verb="GET",le="0.005"} 84 k8s_client_rate_limiter_latency_bucket{verb="GET",le="0.01"} 87 k8s_client_rate_limiter_latency_bucket{verb="GET",le="0.025"} 89 k8s_client_rate_limiter_latency_bucket{verb="GET",le="0.05"} 99 k8s_client_rate_limiter_latency_bucket{verb="GET",le="0.1"} 114 k8s_client_rate_limiter_latency_bucket{verb="GET",le="0.25"} 117 k8s_client_rate_limiter_latency_bucket{verb="GET",le="0.5"} 117 k8s_client_rate_limiter_latency_bucket{verb="GET",le="1"} 117 k8s_client_rate_limiter_latency_bucket{verb="GET",le="2.5"} 117 k8s_client_rate_limiter_latency_bucket{verb="GET",le="5"} 117 k8s_client_rate_limiter_latency_bucket{verb="GET",le="10"} 117 k8s_client_rate_limiter_latency_bucket{verb="GET",le="+Inf"} 117 k8s_client_rate_limiter_latency_sum{verb="GET"} 1.9358371670000003 k8s_client_rate_limiter_latency_count{verb="GET"} 117 k8s_client_rate_limiter_latency_bucket{verb="POST",le="0.005"} 6 k8s_client_rate_limiter_latency_bucket{verb="POST",le="0.01"} 6 k8s_client_rate_limiter_latency_bucket{verb="POST",le="0.025"} 6 k8s_client_rate_limiter_latency_bucket{verb="POST",le="0.05"} 6 k8s_client_rate_limiter_latency_bucket{verb="POST",le="0.1"} 6 k8s_client_rate_limiter_latency_bucket{verb="POST",le="0.25"} 6 k8s_client_rate_limiter_latency_bucket{verb="POST",le="0.5"} 6 k8s_client_rate_limiter_latency_bucket{verb="POST",le="1"} 6 k8s_client_rate_limiter_latency_bucket{verb="POST",le="2.5"} 6 k8s_client_rate_limiter_latency_bucket{verb="POST",le="5"} 6 k8s_client_rate_limiter_latency_bucket{verb="POST",le="10"} 6 k8s_client_rate_limiter_latency_bucket{verb="POST",le="+Inf"} 6 k8s_client_rate_limiter_latency_sum{verb="POST"} 1.0542e-05 k8s_client_rate_limiter_latency_count{verb="POST"} 6 k8s_client_rate_limiter_latency_bucket{verb="PUT",le="0.005"} 1 k8s_client_rate_limiter_latency_bucket{verb="PUT",le="0.01"} 1 k8s_client_rate_limiter_latency_bucket{verb="PUT",le="0.025"} 1 k8s_client_rate_limiter_latency_bucket{verb="PUT",le="0.05"} 1 k8s_client_rate_limiter_latency_bucket{verb="PUT",le="0.1"} 1 k8s_client_rate_limiter_latency_bucket{verb="PUT",le="0.25"} 1 k8s_client_rate_limiter_latency_bucket{verb="PUT",le="0.5"} 1 k8s_client_rate_limiter_latency_bucket{verb="PUT",le="1"} 1 k8s_client_rate_limiter_latency_bucket{verb="PUT",le="2.5"} 1 k8s_client_rate_limiter_latency_bucket{verb="PUT",le="5"} 1 k8s_client_rate_limiter_latency_bucket{verb="PUT",le="10"} 1 k8s_client_rate_limiter_latency_bucket{verb="PUT",le="+Inf"} 1 k8s_client_rate_limiter_latency_sum{verb="PUT"} 5e-07 k8s_client_rate_limiter_latency_count{verb="PUT"} 1 k8s_client_request_latency_bucket{verb="GET",le="0.005"} 84 k8s_client_request_latency_bucket{verb="GET",le="0.01"} 86 k8s_client_request_latency_bucket{verb="GET",le="0.025"} 89 k8s_client_request_latency_bucket{verb="GET",le="0.05"} 99 k8s_client_request_latency_bucket{verb="GET",le="0.1"} 112 k8s_client_request_latency_bucket{verb="GET",le="0.25"} 117 k8s_client_request_latency_bucket{verb="GET",le="0.5"} 117 k8s_client_request_latency_bucket{verb="GET",le="1"} 117 k8s_client_request_latency_bucket{verb="GET",le="2.5"} 117 k8s_client_request_latency_bucket{verb="GET",le="5"} 117 k8s_client_request_latency_bucket{verb="GET",le="10"} 117 k8s_client_request_latency_bucket{verb="GET",le="+Inf"} 117 k8s_client_request_latency_sum{verb="GET"} 2.1254330859999997 k8s_client_request_latency_count{verb="GET"} 117 k8s_client_request_latency_bucket{verb="POST",le="0.005"} 5 k8s_client_request_latency_bucket{verb="POST",le="0.01"} 5 k8s_client_request_latency_bucket{verb="POST",le="0.025"} 5 k8s_client_request_latency_bucket{verb="POST",le="0.05"} 6 k8s_client_request_latency_bucket{verb="POST",le="0.1"} 6 k8s_client_request_latency_bucket{verb="POST",le="0.25"} 6 k8s_client_request_latency_bucket{verb="POST",le="0.5"} 6 k8s_client_request_latency_bucket{verb="POST",le="1"} 6 k8s_client_request_latency_bucket{verb="POST",le="2.5"} 6 k8s_client_request_latency_bucket{verb="POST",le="5"} 6 k8s_client_request_latency_bucket{verb="POST",le="10"} 6 k8s_client_request_latency_bucket{verb="POST",le="+Inf"} 6 k8s_client_request_latency_sum{verb="POST"} 0.048558582 k8s_client_request_latency_count{verb="POST"} 6 k8s_client_request_latency_bucket{verb="PUT",le="0.005"} 1 k8s_client_request_latency_bucket{verb="PUT",le="0.01"} 1 k8s_client_request_latency_bucket{verb="PUT",le="0.025"} 1 k8s_client_request_latency_bucket{verb="PUT",le="0.05"} 1 k8s_client_request_latency_bucket{verb="PUT",le="0.1"} 1 k8s_client_request_latency_bucket{verb="PUT",le="0.25"} 1 k8s_client_request_latency_bucket{verb="PUT",le="0.5"} 1 k8s_client_request_latency_bucket{verb="PUT",le="1"} 1 k8s_client_request_latency_bucket{verb="PUT",le="2.5"} 1 k8s_client_request_latency_bucket{verb="PUT",le="5"} 1 k8s_client_request_latency_bucket{verb="PUT",le="10"} 1 k8s_client_request_latency_bucket{verb="PUT",le="+Inf"} 1 k8s_client_request_latency_sum{verb="PUT"} 0.002381375 k8s_client_request_latency_count{verb="PUT"} 1 k8s_client_request_total{code="200",method="GET"} 120 k8s_client_request_total{code="200",method="PUT"} 1 k8s_client_request_total{code="409",method="POST"} 6 ``` Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a72c538 - Browse repository at this point
Copy the full SHA a72c538View commit details -
Add abstraction to be able to pass buckets custom defined to histogram vectors. Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 39b249f - Browse repository at this point
Copy the full SHA 39b249fView commit details -
Add org to CreateUploadLocation
Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4752c1e - Browse repository at this point
Copy the full SHA 4752c1eView commit details -
Add config for grpc MaxMessageSizeBytes
We need to make the grpc max recv message size in propeller's admin client configurable to match the server-side configuration we support in admin. Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4996328 - Browse repository at this point
Copy the full SHA 4996328View commit details -
Move storage cache settings to correct location
Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0b935a6 - Browse repository at this point
Copy the full SHA 0b935a6View commit details -
added lock to memstore make threadsafe
Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 69f47dd - Browse repository at this point
Copy the full SHA 69f47ddView commit details -
Add read replica host config and connection
- Add a new field to the postgres db config struct, `readReplicaHost`. - Add a new endpoint in the `database` package to enable establishing a connection with a db without creating it if it doesn't exist Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 071c137 - Browse repository at this point
Copy the full SHA 071c137View commit details -
Fix type assertion when an event is missed while connection to apiser…
…ver was severed Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 05dfd9d - Browse repository at this point
Copy the full SHA 05dfd9dView commit details -
Log and monitor failures to validate access tokens
Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7ce2ca8 - Browse repository at this point
Copy the full SHA 7ce2ca8View commit details -
Dask dashboard should have a separate log config
Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c68d2db - Browse repository at this point
Copy the full SHA c68d2dbView commit details -
adjust Dask LogName to (Dask Runner Logs)
Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d00a159 - Browse repository at this point
Copy the full SHA d00a159View commit details -
I was trying to use `setup_local_dev.sh`, and it wasn't working out of the box. Looks like it expects `k3d-` prefix for the kubecontext Ran `setup_local_dev.sh` Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6db5458 - Browse repository at this point
Copy the full SHA 6db5458View commit details -
Override ArrayNode log links with map plugin
This PR adds a configuration option to override ArrayNode log links with those defined in the map plugin. The map plugin contains it's own configuration for log links, which may differ from those defined on the PodPlugin. ArrayNode, executing subNodes as regular tasks (ie. using the PodPlugin) means that it uses the default PodPlugin log templates. Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0221c54 - Browse repository at this point
Copy the full SHA 0221c54View commit details -
Add histogram stopwatch to stow storage
This change * Adds a new `HistogramStopWatch` to promutils. This [allows for aggregating latencies](https://prometheus.io/docs/practices/histograms/#quantiles) across pods and computing quantiles at query time * Adds `HistogramStopWatch` latency metrics for stow so that we can reason about storage latencies in aggregate. Existing latency metrics remain. - [x] Added unittests Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c763345 - Browse repository at this point
Copy the full SHA c763345View commit details -
Fix metrics scale division in timer
* Fix metrics scale division in timer Signed-off-by: Iaroslav Ciupin <[email protected]> Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b699642 - Browse repository at this point
Copy the full SHA b699642View commit details -
CreateDownloadLink: Head before signing
Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 04c4a04 - Browse repository at this point
Copy the full SHA 04c4a04View commit details -
Unexpectedly deleted pod metrics
* Count when we see unexpectedly terminated pods Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 765ce2e - Browse repository at this point
Copy the full SHA 765ce2eView commit details -
Don't send inputURI for start-node
* send empty `inputUri` for `start-node` in node execution event to flyteadmin and therefore, GetNodeExecutionData will not attempt to download non-existing inputUri as it was doing before this change. * add DB migration to clear `input_uri` in existing `node_executions` table for start nodes. Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1846764 - Browse repository at this point
Copy the full SHA 1846764View commit details -
Fix cluster pool assignment validation
Signed-off-by: Andrew Dye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 895344d - Browse repository at this point
Copy the full SHA 895344dView commit details
Commits on Oct 22, 2024
-
Merge remote-tracking branch 'origin' into union/upstream
Signed-off-by: Eduardo Apolinario <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 12de9a6 - Browse repository at this point
Copy the full SHA 12de9a6View commit details