-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Scaling options for each integration/data stream #842
Comments
It seems like the scaling model is generally tied to the inputs (Beats) in use by a given integration, and not so much the integration itself. For instance, the S3 input is mentioned as not being horizontally scalable when the I'm sure there are other cases of this with "pull" based inputs like
While I think this makes sense as a path forward, getting broad adoption across integrations in order to source data like this is usually a long-lived challenge. It would require each integration maintainer to provide this information, produce a new version of their integration including an updated This seems like a lot of churn to get something like this done, so I wonder if we should consider a less involved approach, such as adding specific detection in the Fleet codebase where we detect these scalability concerns based on a "hardcoded" mapping of metadata around specific input types or variables. cc @nimarezainia I am going to assign this to you as it's in "Needs PM Prio" as well. |
I agree. It would be simplest if we could identify the scaling model based solely on the input (without other caveats or special cases). I think configuration options we present to users, the agent handlebar config templates, and identifying the scaling model would be easier if we could treat the two aws-s3 input use case as independent inputs. Perhaps we add two alias names to the aws-s3 input in the spec like
|
Does the package spec need to be modified at all? there are only a bunch of integrations/inputs that we would need to consider here, Mainly pub/sub ones we are faced with a conduit that feeds us the events and/or read directly via polling. @lucabelluccini Could we not just document the scaling model for majority of these integrations? I think separating |
Hello @nimarezainia My manifest proposal was more towards taking a declarative approach from integration developers. For declaring the scalability at input level or integration level, I am ok with both options. My suggestion of doing it at integration/data stream level was to "hide" the implementation detail (example: in the future an integration/data stream might change), but the final user rarely knows what input is used for each one. If we're able to expose the scaling model based on the input used, than it is fine for me. |
Discussed with @nimarezainia yesterday
As this subject / topic is related to integrations, I'm putting in the loop also @daniela-elastic for the O11y-owned inputs. |
I think we should try to lean into automation so that we these classifications for each integration don’t require much work to maintain. I would like to see attributes like horizontal/vertical scaling, stateful/stateless, and e2e acknowledgement support being tracked as metadata about the inputs we have (and kept near the input source). Then the reference docs for the inputs (e.g. Filebeat docs) and the integrations docs can derive from this metadata. As an example, the simple tags that Vector adds to their input docs convey a lot of useful information.
gcp-pubsub has the same scaling characteristics as |
As a starter let's modify the package spec to allow for this information to be set by the package owner. And for it to be included in the auto-generated integrations docs/integrations plugin. |
Should we transfer this to package-spec and start creating a meta issue for docs team on #842 ? |
Makes sense, yes, moved. |
Side note: the |
Problem
Users are interested into knowing the scaling model of integrations / data streams.
Examples:
Possible proposal (mitigation)
We manually specify in the manifest what is the scaling model of the integration.
We expose the scaling model in the docs and, if possible, in the Fleet Integrations UI.
Possible proposal (long term)
Each input should have a metadata/spec where it claims its scaling model.
The integration package manifest checks the inputs in order to automatically generate the scaling model of the integration / data stream.
Example:
FYI @lalit-satapathy @jsoriano @zmoog
The text was updated successfully, but these errors were encountered: