Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Standalone] Simplify the required config for inputs #2416

Closed
2 tasks
joshdover opened this issue Mar 29, 2023 · 16 comments · Fixed by elastic/beats#36390 or elastic/ingest-docs#473
Closed
2 tasks

[Standalone] Simplify the required config for inputs #2416

joshdover opened this issue Mar 29, 2023 · 16 comments · Fixed by elastic/beats#36390 or elastic/ingest-docs#473
Assignees
Labels
Team:Elastic-Agent Label for the Agent team

Comments

@joshdover
Copy link
Contributor

Using Elastic Agent with basic inputs has some rough edges that could be smoothed out in order to make simple “tail a file” and related use cases easier to get right. There’s just too many papercuts that make these simple use cases hard to use standalone Agent for.

A very basic “tail a file” configuration for standalone agent looks like this today:

inputs:
  - id: my-file
    name: my-file
    type: logfile
    use_output: default
    data_stream:
      namespace: default
    streams:
      - id: my-file-1
        paths:
          - /var/log/my-file/my.log*

We could simplify this by:

  • Making the streams block optional, only require if multiple streams on the same input are used
  • Support “dot-notation” on all fields - currently data_stream.dataset does not work
  • Adding validation to the configuration to throw errors early on unrecognized keys or invalid values
  • Making use_output optional (maybe already the case?)
  • Beats 8.6 does not require data_stream.* fields and instead will use reasonable defaults based on the input type (eg. logs-generic-default or metrics-generic-default). We should update our example configuration and agent policy generated by Fleet to take advantage of this change.

An ideal minimal configuration for tailing a log should look like this:

inputs:
  - type: logfile
    paths:
      - /var/log/my-file/my.log*
  • data_stream.* field are filled with defaults for the input type
  • use_output defaults to default or, if there is only 1 output, whatever output that is. Otherwise throws an error to force the user to specify an output.
  • name is not required
  • id is not required (unless you have multiple inputs)
  • stream block is not required

Open questions

  • How should we handle name conflicts between input options and the base structure for the inputs array?
  • What validation can we reasonably add to input options? What could we accomplish just for the logfile and filestream inputs?
@jlind23
Copy link
Contributor

jlind23 commented Jul 7, 2023

@belimawr For this issue worth setting up a kick off call with the o11y folks in order to agree on the scope and outcomes.

@belimawr
Copy link
Contributor

belimawr commented Aug 1, 2023

After talking with the Observability folks, @cmacknz and @joshdover we got to the following minimal config and considerations:

inputs:
  - type: filestream
    id: unique-id-per-input
    paths:
      - /var/log/my-file/my.log*
  • This simpler input configuration will be handled as a special case
  • It will not support the streams key.
  • The id needs to be unique per policy and will act as both:
    • The Elastic-Agent input ID
    • Filestream input ID
  • Additional configurations like parsers and processors will also be supported.
  • data_stream.* can already be omitted and the Elastic-Agent will use the default ones
  • use_output will be omitted (needs implementing), the default or the only output available will be used. If more than one output is configured, it will cause an error preventing the Elastic-Agent from starting. This could be extended to all inputs.
  • If the streams key is present, it will cause an error preventing the Elastic-Agent from starting.
  • We need to be very clear when writing documentation and highlight when to use this simpler case.

@belimawr
Copy link
Contributor

belimawr commented Aug 1, 2023

I did some testing and the following configuration already works:

outputs:
  default:
    type: elasticsearch
    hosts:
      - http://localhost:9200
    username: "elastic"
    password: "changeme"

inputs:
  - id: elastic-agent-input-id
    type: filestream
    streams:
      - id: filestream-input-id
        paths:
          - /tmp/log.log

@ruflin
Copy link
Contributor

ruflin commented Aug 3, 2023

One key bit that I think is missing is the support for the data_stream.* settings on this level which if I remember correctly currently doesn't work:

inputs:
  - type: filestream
    id: unique-id-per-input
    data_stream.dataset: mystuff
    paths:
      - /var/log/my-file/my.log*

Another thing that should be tested that I stumbled over in the past with this, that this only worked as an object and not flattened. What I mean by that is that the above would not work but the one below does. Both should just work.

inputs:
  - type: filestream
    id: unique-id-per-input
    data_stream:
      dataset: mystuff
    paths:
      - /var/log/my-file/my.log*

Some questions on the above points:

  • "This simpler input configuration will be handled as a special case": I'm ok with this if it simplifies things but ideally other inputs like syslog, tcp, udp would work exactly the same.
  • "It will not support the streams key.": Also supportive of this but if this requires extra checks to be added this could come later.

@cmacknz
Copy link
Member

cmacknz commented Aug 3, 2023

"This simpler input configuration will be handled as a special case": I'm ok with this if it simplifies things but ideally other inputs like syslog, tcp, udp would work exactly the same.
"It will not support the streams key.": Also supportive of this but if this requires extra checks to be added this could come later.

I think creating a variant of the log/filestream inputs that do not use a streams section is an important simplification, because it eliminates the need to specify both unique input IDs and unique stream IDs.

Both users and developers struggle to understand the difference and the consequences of not choosing IDs correctly, if we can just remove the need to do this twice it will eliminate bugs and support cases. We have a support case right now where someone hasn't done this correctly when installing multiple instances of the k8s integration.

@belimawr
Copy link
Contributor

belimawr commented Aug 4, 2023

"This simpler input configuration will be handled as a special case": I'm ok with this if it simplifies things but ideally other inputs like syslog, tcp, udp would work exactly the same.

If the type entry is the same for the Elastic-Agent input and the Beat input it's easy to have this working for any input. Effectively, I'd just get everything and send to the Beat as its input configuration with the processors and other stuff the Elastic-Agent adds. If type is not the same, then we will need an explicit dictionary to translate them.

I know that for logs there is a range of valid types, I'm not sure about the other inputs.

One "problem" with having this generic is that type: log uses the log input and type: filestream uses the filestream input. Which can cause a lot of confusion due to the different configuration keys and the state not being shared.

One solution I see is to limit the simplified log ingestion to type: filestream to avoid burden to our users migrating to fliestream in the future.

@belimawr
Copy link
Contributor

belimawr commented Aug 4, 2023

Another thing that should be tested that I stumbled over in the past with this, that this only worked as an object and not flattened.

I've just tested it and indeed the flattened version (data_stream.dataset) does not work. I did not find an issue about it, so I created one: #3191

@belimawr
Copy link
Contributor

belimawr commented Aug 4, 2023

My draft code: https://github.com/belimawr/beats/tree/simple-log-ingest-config (mostly for me to remember :P)

@ruflin
Copy link
Contributor

ruflin commented Aug 7, 2023

My understanding is the current approach you are taking is that parts on the transformation layer change. An alternative could be that it is just forwarded to Filebeat and Filebeat needs to support exactly this format. I always hoped eventually to get rid of any transformations and the binaries support all the configs directly.

@blakerouse
Copy link
Contributor

@ruflin With V2 architecture that was done in the Elastic Agent for 8.6+ the Elastic Agent no longer does any transformation of the configurations. Looking at @belimawr change is in the beat because Elastic Agent no longer performs this role.

@ruflin
Copy link
Contributor

ruflin commented Aug 9, 2023

With V2 architecture that was done in the Elastic Agent for 8.6+ the Elastic Agent no longer does any transformation of the configurations.

I missed this change, great to hear. Does this mean I could take the current Elastic Agent input config with the streams and filebeat supports it directly? Meaning we start to converge on Agent configs work also in Beats.

@cmacknz
Copy link
Member

cmacknz commented Aug 9, 2023

I missed this change, great to hear. Does this mean I could take the current Elastic Agent input config with the streams and filebeat supports it directly? Meaning we start to converge on Agent configs work also in Beats.

We still transform the agent policy into a Beat configuration, the logic to do this just lives in Filebeat instead and doesn't use the transpiler at all. https://github.com/elastic/beats/blob/103869cb3d5f312fb4aed3910ea059a7d5147055/x-pack/libbeat/management/generate.go#L71

This code is only hit when the configuration come in via the agent control protocol, so you can't just use it directly with Filebeat today.

@ruflin
Copy link
Contributor

ruflin commented Aug 10, 2023

Having the transformation in Beats is what I hoped for. Having it there means in "theory" it could makes its way also in actual config itself (not a feature request). Thx for the details.

@cmacknz
Copy link
Member

cmacknz commented Aug 28, 2023

Reopening this to make sure we document this in the agent docs at https://github.com/elastic/ingest-docs/tree/main/docs/en/ingest-management

@cmacknz cmacknz reopened this Aug 28, 2023
@ruflin
Copy link
Contributor

ruflin commented Aug 29, 2023

I suggest to make #3191 a dependency of this also before closing. Not having it can be pretty confusing for users.

@belimawr
Copy link
Contributor

Here is the PR for the last missing piece, the docs: elastic/ingest-docs#473. I added it next to the standalone configuration section. Any suggestions on how to better document it are greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team
Projects
None yet
7 participants