[Meta][GCS] - Improvements and addition of new features to the GCS input #41107

ShourieG · 2024-10-03T20:28:39Z

elasticmachine · 2024-10-03T20:28:42Z

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

ShourieG · 2024-10-04T04:42:48Z

ShourieG · 2024-10-04T08:32:40Z

@andrewkroh, @efd6 please feel free to expand this issue by suggesting improvements/additions that you would like to see in the input moving foreward.

andrewkroh · 2024-11-20T19:35:48Z

Add support for state tracking via optional startOffset (user configurable with certain ordering limitations)

Can you describe this feature with a bit more detail please. I don't understand what the feature does or what the use case is for it.

andrewkroh · 2024-11-20T20:11:26Z

Add support for SDK level retry mechanism and make it user configurable.

By "SDK level" does this mean API calls into GCS? Those should retry, but I don't see a reason to make this user-configurable. What would a user do with these options? Basically I think it should continuously retry failed list_objects calls with some back-off. And any failed get_object calls will naturally be retried on the next loop over the bucket's contents (assuming the input state does not positively indicate that the object was ingested).

andrewkroh · 2024-11-20T20:15:18Z

Another item, possibly to be taken up in a separate issue, is that the input should not timeout get_object calls where we are downloading and processing the stream of bytes. This operation can be slowed down by a number of factors (e.g. back-pressure) and having an arbitrary maximum operation timeout is not helpful.

ShourieG · 2024-12-10T10:53:20Z

support for state tracking via optional startOffset (user configurab

@andrewkroh, If you look at this doc, state tracking via start offset will allow us to list pages of objects in a lexicographic ordered manner. This will be more efficient and precise for state history if the users keep their bucket objects also lexicographically ordered. So now the users will have two ways to track state, which just gives more options. State switching won't be allowed so if a user picks one state tracking option the other will be disabled. We will have an option in the config to enable this with the necessary warnings about compatibility, then it would be upto the users to choose.

ShourieG · 2024-12-10T10:57:36Z

Another item, possibly to be taken up in a separate issue, is that the input should not timeout get_object calls where we are downloading and processing the stream of bytes. This operation can be slowed down by a number of factors (e.g. back-pressure) and having an arbitrary maximum operation timeout is not helpful.

I agree with this, hence I will be removing the concept of bucket_timeout and let the program context pass through the job object that will directly solve this. Initial philosophy was to give users more control but in this scenario it's proven to be not ideal.

Already have a PR up: #41970

ShourieG · 2024-12-10T11:02:04Z

Add support for SDK level retry mechanism and make it user configurable.

By "SDK level" does this mean API calls into GCS? Those should retry, but I don't see a reason to make this user-configurable. What would a user do with these options? Basically I think it should continuously retry failed list_objects calls with some back-off. And any failed get_object calls will naturally be retried on the next loop over the bucket's contents (assuming the input state does not positively indicate that the object was ingested).

@andrewkroh, This was a direct request from users and product: elastic/integrations#11580, they want configurable retryable policies and options. The current retry was only for failed jobs that we capture on our end and store in state. With a new PR, we are now giving users full control of the sdk retry options. This I feel will let users have more control on how the API level retries behave rather than just relying on our default values.

ShourieG added enhancement Filebeat Filebeat input:GCS needs:triage Team:Security-Service Integrations Security Service Integrations Team labels Oct 3, 2024

ShourieG added the Epic label Oct 3, 2024

ShourieG assigned ShourieG and unassigned ShourieG Oct 3, 2024

ShourieG changed the title ~~[filebeat][Meta]- Improvements and addition of new features to the GCS input~~ [Meta][GCS] - Improvements and addition of new features to the GCS input Oct 4, 2024

This was referenced Nov 4, 2024

[filebeat][GCS] - Improved documentation #41143

Merged

[filebeat][gcs] - Added input metrics #41505

Merged

mergify bot mentioned this issue Nov 18, 2024

[8.x](backport #41505) [filebeat][gcs] - Added input metrics #41663

Merged

4 tasks

ShourieG self-assigned this Nov 19, 2024

ShourieG mentioned this issue Nov 20, 2024

[filebeat][gcs] - Added retry config & refactored the default config options #41699

Closed

5 tasks

ShourieG mentioned this issue Nov 29, 2024

[filebeat][gcs] - Refactor & cleanup with updates to some default values and docs #41834

Merged

5 tasks

mergify bot mentioned this issue Dec 3, 2024

[8.x](backport #41834) [filebeat][gcs] - Refactor & cleanup with updates to some default values and docs #41860

Merged

5 tasks

ShourieG mentioned this issue Dec 3, 2024

[filebeat][gcs] - Added support for retry config #41862

Merged

5 tasks

mergify bot mentioned this issue Dec 4, 2024

[8.x](backport #41862) [filebeat][gcs] - Added support for retry config #41885

Merged

5 tasks

ShourieG mentioned this issue Dec 10, 2024

[filebeat][gcs] - Removed bucket_timeout config option and replaced bucket context with parent program context #41970

Merged

6 tasks

This was referenced Dec 11, 2024

[8.17](backport #41862) [filebeat][gcs] - Added support for retry config #41990

Merged

[8.17](backport #41970) [filebeat][gcs] - Removed bucket_timeout config option and replaced bucket context with parent program context #41997

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Meta][GCS] - Improvements and addition of new features to the GCS input #41107

[Meta][GCS] - Improvements and addition of new features to the GCS input #41107

ShourieG commented Oct 3, 2024 •

edited

Loading

elasticmachine commented Oct 3, 2024

ShourieG commented Oct 4, 2024 •

edited

Loading

ShourieG commented Oct 4, 2024

andrewkroh commented Nov 20, 2024

andrewkroh commented Nov 20, 2024

andrewkroh commented Nov 20, 2024

ShourieG commented Dec 10, 2024 •

edited

Loading

ShourieG commented Dec 10, 2024 •

edited

Loading

ShourieG commented Dec 10, 2024 •

edited

Loading

[Meta][GCS] - Improvements and addition of new features to the GCS input #41107

[Meta][GCS] - Improvements and addition of new features to the GCS input #41107

Comments

ShourieG commented Oct 3, 2024 • edited Loading

elasticmachine commented Oct 3, 2024

ShourieG commented Oct 4, 2024 • edited Loading

ShourieG commented Oct 4, 2024

andrewkroh commented Nov 20, 2024

andrewkroh commented Nov 20, 2024

andrewkroh commented Nov 20, 2024

ShourieG commented Dec 10, 2024 • edited Loading

ShourieG commented Dec 10, 2024 • edited Loading

ShourieG commented Dec 10, 2024 • edited Loading

ShourieG commented Oct 3, 2024 •

edited

Loading

ShourieG commented Oct 4, 2024 •

edited

Loading

ShourieG commented Dec 10, 2024 •

edited

Loading

ShourieG commented Dec 10, 2024 •

edited

Loading

ShourieG commented Dec 10, 2024 •

edited

Loading