Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add otelcol filelogreceiver #2711

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Conversation

dehaansa
Copy link
Contributor

PR Description

Add a wrapper around the otelcol filelogreciever to Alloy.

Which issue(s) this PR fixes

#2694

Notes to the Reviewer

PR Checklist

  • CHANGELOG.md updated
  • Documentation added
  • Tests updated
  • Config converters updated

@dehaansa dehaansa requested review from clayton-cornell and a team as code owners February 13, 2025 05:46
Copy link
Contributor

github-actions bot commented Feb 13, 2025

Copy link
Contributor

@wildum wildum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solid work, well done!

type OrderingCriteria struct {
Regex string `alloy:"regex,attr,optional"`
TopN int `alloy:"top_n,attr,optional"`
SortBy []Sort `alloy:"sort_by,block"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the sort_by block is documented as optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unsure about this, as it's "optional" in that the ordering_criteria parent block is optional, but then the sort_by is required if you configure ordering_criteria. Do we have a good pattern for this in docs @clayton-cornell ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should follow the same pattern as when you have a required arg in an optional block. One example here: https://grafana.com/docs/alloy/latest/reference/components/loki/loki.source.gcplog/
The block pull is optional and in this block you have required args.
I think that's ok to understand from a user perspective

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. We have more than one instance of this in the component docs where an optional block has required arguments.

The arguments become required as part of the optional block configuration only if the optional block is added to the overall component definition. At least that's how I understand it when reading the docs.


require.NoError(t, ctrl.WaitRunning(3*time.Second))

// TODO(@dehaansa) - test if this is removeable after https://github.com/grafana/alloy/pull/2262
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PR was merged, is the sleep still needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if yes, can this be a problem in real scenarios?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The otel component scheduler runs Start() on the otel components, but that is not required to wait for components to be fully ready. I am not currently aware of a better way to wait for readiness, will take a look later. Updating comment for now.

var args filelog.Arguments
err := syntax.Unmarshal([]byte(alloyCfg), &args)
require.NoError(t, err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to have a test that covers the errors from the validate

"github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/trim"
"go.opentelemetry.io/collector/confmap"
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a comment about what this file is about and why it is used in several otel receivers? (what are stanza receivers?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do! Stanza receivers are logs receivers built on the stanza log agent that was donated to opentelemetry by observIQ.

diag.SeverityLevelInfo,
fmt.Sprintf("Header cannot currently be translated for %s", StringifyInstanceID(id)),
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the above diags be "warn" instead of "info"? Users might convert and run the alloy file without looking and it's a big deal if the config is not the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, they should. I made them info to pass the test and I must have stashed the changes somewhere to put them back and add the expected diags.

Copy link
Contributor

@clayton-cornell clayton-cornell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc suggestions are mainly on formatting/style. Added a missing link to Wikipedia.

Comment on lines 37 to 64
| Name | Type | Description | Default | Required |
|---------------------------------|---------------------|--------------------------------------------------------------------------------------------|-------------|----------|
| `include` | `list(string)` | A list of glob patterns to include files. | `[]` | yes |
| `exclude` | `list(string)` | A list of glob patterns to exclude files that would be included by the `include` patterns. | `[]` | no |
| `poll_interval` | `time.Duration` | The interval at which the file is polled for new entries. | `1s` | no |
| `max_concurrent_files` | `int` | The maximum number of files to read concurrently. | `10` | no |
| `max_batches` | `int` | The maximum number of batches to process concurrently. | `10` | no |
| `start_at` | `string` | The position to start reading the file from. | `beginning` | no |
| `fingerprint_size` | `units.Base2Bytes` | The size of the fingerprint used to detect file changes. | `1KiB` | no |
| `max_log_size` | `units.Base2Bytes` | The maximum size of a log entry. | `1MiB` | no |
| `encoding` | `string` | The encoding of the log file. | `utf-8` | no |
| `force_flush_period` | `time.Duration` | The period after which logs are flushed even if the buffer is not full. | `500ms` | no |
| `delete_after_read` | `bool` | Whether to delete the file after reading. | `false` | no |
| `compression` | `string` | The compression type used for the log file. | `none` | no |
| `acquire_fs_lock` | `bool` | Whether to acquire a file system lock while reading the file. | `false` | no |
| `attributes` | `map(string)` | A map of attributes to add to each log entry. | `{}` | no |
| `resource` | `map(string)` | A map of resource attributes to associate with each log entry. | `{}` | no |
| `exclude_older_than` | `time.Duration` | Exclude files with a modification time older than the specified duration. | `0s` | no |
| `include_file_record_number` | `bool` | Whether to include the file record number in the log entry. | `false` | no |
| `include_file_name` | `bool` | Whether to include the file name in the log entry. | `true` | no |
| `include_file_path` | `bool` | Whether to include the file path in the log entry. | `false` | no |
| `include_file_name_resolved` | `bool` | Whether to include the resolved file name in the log entry. | `false` | no |
| `include_file_path_resolved` | `bool` | Whether to include the resolved file path in the log entry. | `false` | no |
| `include_file_owner_name` | `bool` | Whether to include the file owner's name in the log entry. | `false` | no |
| `include_file_owner_group_name` | `bool` | Whether to include the file owner's group name in the log entry. | `false` | no |
| `preserve_leading_whitespaces` | `bool` | Preserves leading whitespace in messages when set to `true`. | `false` | no |
| `preserve_trailing_whitespaces` | `bool` | Preserves trailing whitespace in messages when set to `true`. | `false` | no |
| `operators` | `lists(map(string)` | A list of operators used to parse the log entries. | `[]` | no |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| Name | Type | Description | Default | Required |
|---------------------------------|---------------------|--------------------------------------------------------------------------------------------|-------------|----------|
| `include` | `list(string)` | A list of glob patterns to include files. | `[]` | yes |
| `exclude` | `list(string)` | A list of glob patterns to exclude files that would be included by the `include` patterns. | `[]` | no |
| `poll_interval` | `time.Duration` | The interval at which the file is polled for new entries. | `1s` | no |
| `max_concurrent_files` | `int` | The maximum number of files to read concurrently. | `10` | no |
| `max_batches` | `int` | The maximum number of batches to process concurrently. | `10` | no |
| `start_at` | `string` | The position to start reading the file from. | `beginning` | no |
| `fingerprint_size` | `units.Base2Bytes` | The size of the fingerprint used to detect file changes. | `1KiB` | no |
| `max_log_size` | `units.Base2Bytes` | The maximum size of a log entry. | `1MiB` | no |
| `encoding` | `string` | The encoding of the log file. | `utf-8` | no |
| `force_flush_period` | `time.Duration` | The period after which logs are flushed even if the buffer is not full. | `500ms` | no |
| `delete_after_read` | `bool` | Whether to delete the file after reading. | `false` | no |
| `compression` | `string` | The compression type used for the log file. | `none` | no |
| `acquire_fs_lock` | `bool` | Whether to acquire a file system lock while reading the file. | `false` | no |
| `attributes` | `map(string)` | A map of attributes to add to each log entry. | `{}` | no |
| `resource` | `map(string)` | A map of resource attributes to associate with each log entry. | `{}` | no |
| `exclude_older_than` | `time.Duration` | Exclude files with a modification time older than the specified duration. | `0s` | no |
| `include_file_record_number` | `bool` | Whether to include the file record number in the log entry. | `false` | no |
| `include_file_name` | `bool` | Whether to include the file name in the log entry. | `true` | no |
| `include_file_path` | `bool` | Whether to include the file path in the log entry. | `false` | no |
| `include_file_name_resolved` | `bool` | Whether to include the resolved file name in the log entry. | `false` | no |
| `include_file_path_resolved` | `bool` | Whether to include the resolved file path in the log entry. | `false` | no |
| `include_file_owner_name` | `bool` | Whether to include the file owner's name in the log entry. | `false` | no |
| `include_file_owner_group_name` | `bool` | Whether to include the file owner's group name in the log entry. | `false` | no |
| `preserve_leading_whitespaces` | `bool` | Preserves leading whitespace in messages when set to `true`. | `false` | no |
| `preserve_trailing_whitespaces` | `bool` | Preserves trailing whitespace in messages when set to `true`. | `false` | no |
| `operators` | `lists(map(string)` | A list of operators used to parse the log entries. | `[]` | no |
| Name | Type | Description | Default | Required |
| ------------------------------- | ------------------- | ------------------------------------------------------------------------------------------ | ----------- | -------- |
| `include` | `list(string)` | A list of glob patterns to include files. | `[]` | yes |
| `acquire_fs_lock` | `bool` | Whether to acquire a file system lock while reading the file. | `false` | no |
| `attributes` | `map(string)` | A map of attributes to add to each log entry. | `{}` | no |
| `compression` | `string` | The compression type used for the log file. | `none` | no |
| `delete_after_read` | `bool` | Whether to delete the file after reading. | `false` | no |
| `encoding` | `string` | The encoding of the log file. | `utf-8` | no |
| `exclude_older_than` | `time.Duration` | Exclude files with a modification time older than the specified duration. | `0s` | no |
| `exclude` | `list(string)` | A list of glob patterns to exclude files that would be included by the `include` patterns. | `[]` | no |
| `fingerprint_size` | `units.Base2Bytes` | The size of the fingerprint used to detect file changes. | `1KiB` | no |
| `force_flush_period` | `time.Duration` | The period after which logs are flushed even if the buffer isn't full. | `500ms` | no |
| `include_file_name_resolved` | `bool` | Whether to include the resolved filename in the log entry. | `false` | no |
| `include_file_name` | `bool` | Whether to include the filename in the log entry. | `true` | no |
| `include_file_owner_group_name` | `bool` | Whether to include the file owner's group name in the log entry. | `false` | no |
| `include_file_owner_name` | `bool` | Whether to include the file owner's name in the log entry. | `false` | no |
| `include_file_path_resolved` | `bool` | Whether to include the resolved file path in the log entry. | `false` | no |
| `include_file_path` | `bool` | Whether to include the file path in the log entry. | `false` | no |
| `include_file_record_number` | `bool` | Whether to include the file record number in the log entry. | `false` | no |
| `max_batches` | `int` | The maximum number of batches to process concurrently. | `10` | no |
| `max_concurrent_files` | `int` | The maximum number of files to read concurrently. | `10` | no |
| `max_log_size` | `units.Base2Bytes` | The maximum size of a log entry. | `1MiB` | no |
| `operators` | `lists(map(string)` | A list of operators used to parse the log entries. | `[]` | no |
| `poll_interval` | `time.Duration` | The interval at which the file is polled for new entries. | `1s` | no |
| `preserve_leading_whitespaces` | `bool` | Preserves leading whitespace in messages when set to `true`. | `false` | no |
| `preserve_trailing_whitespaces` | `bool` | Preserves trailing whitespace in messages when set to `true`. | `false` | no |
| `resource` | `map(string)` | A map of resource attributes to associate with each log entry. | `{}` | no |
| `start_at` | `string` | The position to start reading the file from. | `beginning` | no |

Formatting, alphabetize rows

Comment on lines 99 to 107
| Hierarchy | Block | Description | Required |
|-----------------------------|-----------------------|-------------------------------------------------------------------------------------------------|----------|
| output | [output][] | Configures where to send received telemetry data. | yes |
| multiline | [multiline][] | Configures rules for multiline parsing of log messages | no |
| header | [header][] | Configures rules for parsing a log header line | no |
| retry_on_failure | [retry_on_failure][] | Configures the retry behavior when the receiver encounters an error downstream in the pipeline. | no |
| debug_metrics | [debug_metrics][] | Configures the metrics that this component generates to monitor its state. | no |
| ordering_criteria | [ordering_criteria][] | Configures the order in which log files are processed. | no |
| ordering_criteria > sort_by | [sort_by][] | Configures the fields to sort by within the ordering critera. | no |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| Hierarchy | Block | Description | Required |
|-----------------------------|-----------------------|-------------------------------------------------------------------------------------------------|----------|
| output | [output][] | Configures where to send received telemetry data. | yes |
| multiline | [multiline][] | Configures rules for multiline parsing of log messages | no |
| header | [header][] | Configures rules for parsing a log header line | no |
| retry_on_failure | [retry_on_failure][] | Configures the retry behavior when the receiver encounters an error downstream in the pipeline. | no |
| debug_metrics | [debug_metrics][] | Configures the metrics that this component generates to monitor its state. | no |
| ordering_criteria | [ordering_criteria][] | Configures the order in which log files are processed. | no |
| ordering_criteria > sort_by | [sort_by][] | Configures the fields to sort by within the ordering critera. | no |
| Block | Description | Required |
| ------------------------------------------ | ----------------------------------------------------------------------------------------------- | -------- |
| [`output`][output] | Configures where to send received telemetry data. | yes |
| [`debug_metrics`][debug_metrics] | Configures the metrics that this component generates to monitor its state. | no |
| [`header`][header] | Configures rules for parsing a log header line | no |
| [`multiline`][multiline] | Configures rules for multiline parsing of log messages | no |
| [`ordering_criteria`][ordering_criteria] | Configures the order in which log files are processed. | no |
| `ordering_criteria` > [`sort_by`][sort_by] | Configures the fields to sort by within the ordering criteria. | no |
| [`retry_on_failure`][retry_on_failure] | Configures the retry behavior when the receiver encounters an error downstream in the pipeline. | no |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants