[Standalone] Simplify the required config for inputs #2416

joshdover · 2023-03-29T15:24:01Z

Using Elastic Agent with basic inputs has some rough edges that could be smoothed out in order to make simple “tail a file” and related use cases easier to get right. There’s just too many papercuts that make these simple use cases hard to use standalone Agent for.

A very basic “tail a file” configuration for standalone agent looks like this today:

inputs:
  - id: my-file
    name: my-file
    type: logfile
    use_output: default
    data_stream:
      namespace: default
    streams:
      - id: my-file-1
        paths:
          - /var/log/my-file/my.log*

We could simplify this by:

Making the streams block optional, only require if multiple streams on the same input are used
Support “dot-notation” on all fields - currently data_stream.dataset does not work
Adding validation to the configuration to throw errors early on unrecognized keys or invalid values
Making use_output optional (maybe already the case?)
Beats 8.6 does not require data_stream.* fields and instead will use reasonable defaults based on the input type (eg. logs-generic-default or metrics-generic-default). We should update our example configuration and agent policy generated by Fleet to take advantage of this change.

An ideal minimal configuration for tailing a log should look like this:

inputs:
  - type: logfile
    paths:
      - /var/log/my-file/my.log*

data_stream.* field are filled with defaults for the input type
use_output defaults to default or, if there is only 1 output, whatever output that is. Otherwise throws an error to force the user to specify an output.
name is not required
id is not required (unless you have multiple inputs)
stream block is not required

Open questions

How should we handle name conflicts between input options and the base structure for the inputs array?
What validation can we reasonably add to input options? What could we accomplish just for the logfile and filestream inputs?

The text was updated successfully, but these errors were encountered:

jlind23 · 2023-07-07T12:46:58Z

@belimawr For this issue worth setting up a kick off call with the o11y folks in order to agree on the scope and outcomes.

belimawr · 2023-08-01T15:14:21Z

After talking with the Observability folks, @cmacknz and @joshdover we got to the following minimal config and considerations:

inputs:
  - type: filestream
    id: unique-id-per-input
    paths:
      - /var/log/my-file/my.log*

This simpler input configuration will be handled as a special case
It will not support the streams key.
The id needs to be unique per policy and will act as both:
- The Elastic-Agent input ID
- Filestream input ID
Additional configurations like parsers and processors will also be supported.
data_stream.* can already be omitted and the Elastic-Agent will use the default ones
use_output will be omitted (needs implementing), the default or the only output available will be used. If more than one output is configured, it will cause an error preventing the Elastic-Agent from starting. This could be extended to all inputs.
If the streams key is present, it will cause an error preventing the Elastic-Agent from starting.
We need to be very clear when writing documentation and highlight when to use this simpler case.

belimawr · 2023-08-01T16:25:26Z

I did some testing and the following configuration already works:

outputs:
  default:
    type: elasticsearch
    hosts:
      - http://localhost:9200
    username: "elastic"
    password: "changeme"

inputs:
  - id: elastic-agent-input-id
    type: filestream
    streams:
      - id: filestream-input-id
        paths:
          - /tmp/log.log

ruflin · 2023-08-03T12:05:26Z

One key bit that I think is missing is the support for the data_stream.* settings on this level which if I remember correctly currently doesn't work:

inputs:
  - type: filestream
    id: unique-id-per-input
    data_stream.dataset: mystuff
    paths:
      - /var/log/my-file/my.log*

Another thing that should be tested that I stumbled over in the past with this, that this only worked as an object and not flattened. What I mean by that is that the above would not work but the one below does. Both should just work.

inputs:
  - type: filestream
    id: unique-id-per-input
    data_stream:
      dataset: mystuff
    paths:
      - /var/log/my-file/my.log*

Some questions on the above points:

"This simpler input configuration will be handled as a special case": I'm ok with this if it simplifies things but ideally other inputs like syslog, tcp, udp would work exactly the same.
"It will not support the streams key.": Also supportive of this but if this requires extra checks to be added this could come later.

cmacknz · 2023-08-03T13:22:32Z

"This simpler input configuration will be handled as a special case": I'm ok with this if it simplifies things but ideally other inputs like syslog, tcp, udp would work exactly the same.
"It will not support the streams key.": Also supportive of this but if this requires extra checks to be added this could come later.

I think creating a variant of the log/filestream inputs that do not use a streams section is an important simplification, because it eliminates the need to specify both unique input IDs and unique stream IDs.

Both users and developers struggle to understand the difference and the consequences of not choosing IDs correctly, if we can just remove the need to do this twice it will eliminate bugs and support cases. We have a support case right now where someone hasn't done this correctly when installing multiple instances of the k8s integration.

belimawr · 2023-08-04T08:48:18Z

"This simpler input configuration will be handled as a special case": I'm ok with this if it simplifies things but ideally other inputs like syslog, tcp, udp would work exactly the same.

If the type entry is the same for the Elastic-Agent input and the Beat input it's easy to have this working for any input. Effectively, I'd just get everything and send to the Beat as its input configuration with the processors and other stuff the Elastic-Agent adds. If type is not the same, then we will need an explicit dictionary to translate them.

I know that for logs there is a range of valid types, I'm not sure about the other inputs.

One "problem" with having this generic is that type: log uses the log input and type: filestream uses the filestream input. Which can cause a lot of confusion due to the different configuration keys and the state not being shared.

One solution I see is to limit the simplified log ingestion to type: filestream to avoid burden to our users migrating to fliestream in the future.

belimawr · 2023-08-04T09:48:21Z

Another thing that should be tested that I stumbled over in the past with this, that this only worked as an object and not flattened.

I've just tested it and indeed the flattened version (data_stream.dataset) does not work. I did not find an issue about it, so I created one: #3191

belimawr · 2023-08-04T11:07:04Z

My draft code: https://github.com/belimawr/beats/tree/simple-log-ingest-config (mostly for me to remember :P)

ruflin · 2023-08-07T06:46:51Z

My understanding is the current approach you are taking is that parts on the transformation layer change. An alternative could be that it is just forwarded to Filebeat and Filebeat needs to support exactly this format. I always hoped eventually to get rid of any transformations and the binaries support all the configs directly.

blakerouse · 2023-08-07T19:55:04Z

@ruflin With V2 architecture that was done in the Elastic Agent for 8.6+ the Elastic Agent no longer does any transformation of the configurations. Looking at @belimawr change is in the beat because Elastic Agent no longer performs this role.

ruflin · 2023-08-09T08:08:59Z

With V2 architecture that was done in the Elastic Agent for 8.6+ the Elastic Agent no longer does any transformation of the configurations.

I missed this change, great to hear. Does this mean I could take the current Elastic Agent input config with the streams and filebeat supports it directly? Meaning we start to converge on Agent configs work also in Beats.

cmacknz · 2023-08-09T14:34:46Z

I missed this change, great to hear. Does this mean I could take the current Elastic Agent input config with the streams and filebeat supports it directly? Meaning we start to converge on Agent configs work also in Beats.

We still transform the agent policy into a Beat configuration, the logic to do this just lives in Filebeat instead and doesn't use the transpiler at all. https://github.com/elastic/beats/blob/103869cb3d5f312fb4aed3910ea059a7d5147055/x-pack/libbeat/management/generate.go#L71

This code is only hit when the configuration come in via the agent control protocol, so you can't just use it directly with Filebeat today.

ruflin · 2023-08-10T07:12:06Z

Having the transformation in Beats is what I hoped for. Having it there means in "theory" it could makes its way also in actual config itself (not a feature request). Thx for the details.

cmacknz · 2023-08-28T14:56:20Z

Reopening this to make sure we document this in the agent docs at https://github.com/elastic/ingest-docs/tree/main/docs/en/ingest-management

ruflin · 2023-08-29T06:19:45Z

I suggest to make #3191 a dependency of this also before closing. Not having it can be pretty confusing for users.

belimawr · 2023-09-11T15:54:39Z

Here is the PR for the last missing piece, the docs: elastic/ingest-docs#473. I added it next to the standalone configuration section. Any suggestions on how to better document it are greatly appreciated.

joshdover added the Team:Elastic-Agent Label for the Agent team label Mar 29, 2023

joshdover mentioned this issue Mar 29, 2023

[Standalone] Add better copy/paste examples for standalone Agent elastic/ingest-docs#65

Open

pierrehilbert assigned rdner May 24, 2023

ruflin mentioned this issue May 25, 2023

Support service.name as first class citizen #2724

Open

pierrehilbert assigned belimawr and unassigned rdner Jun 19, 2023

belimawr mentioned this issue Aug 22, 2023

Simplify input configuration under Elastic-Agent elastic/beats#36390

Merged

4 tasks

belimawr closed this as completed in elastic/beats#36390 Aug 23, 2023

cmacknz reopened this Aug 28, 2023

belimawr mentioned this issue Sep 11, 2023

Simplified log ingestion elastic/ingest-docs#473

Merged

belimawr closed this as completed in elastic/ingest-docs#473 Sep 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Standalone] Simplify the required config for inputs #2416

[Standalone] Simplify the required config for inputs #2416

joshdover commented Mar 29, 2023

jlind23 commented Jul 7, 2023

belimawr commented Aug 1, 2023

belimawr commented Aug 1, 2023

ruflin commented Aug 3, 2023

cmacknz commented Aug 3, 2023

belimawr commented Aug 4, 2023

belimawr commented Aug 4, 2023

belimawr commented Aug 4, 2023

ruflin commented Aug 7, 2023

blakerouse commented Aug 7, 2023

ruflin commented Aug 9, 2023 •

edited

Loading

cmacknz commented Aug 9, 2023 •

edited

Loading

ruflin commented Aug 10, 2023

cmacknz commented Aug 28, 2023

ruflin commented Aug 29, 2023

belimawr commented Sep 11, 2023

[Standalone] Simplify the required config for inputs #2416

[Standalone] Simplify the required config for inputs #2416

Comments

joshdover commented Mar 29, 2023

Open questions

jlind23 commented Jul 7, 2023

belimawr commented Aug 1, 2023

belimawr commented Aug 1, 2023

ruflin commented Aug 3, 2023

cmacknz commented Aug 3, 2023

belimawr commented Aug 4, 2023

belimawr commented Aug 4, 2023

belimawr commented Aug 4, 2023

ruflin commented Aug 7, 2023

blakerouse commented Aug 7, 2023

ruflin commented Aug 9, 2023 • edited Loading

cmacknz commented Aug 9, 2023 • edited Loading

ruflin commented Aug 10, 2023

cmacknz commented Aug 28, 2023

ruflin commented Aug 29, 2023

belimawr commented Sep 11, 2023

ruflin commented Aug 9, 2023 •

edited

Loading

cmacknz commented Aug 9, 2023 •

edited

Loading