-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LogStash configuration is compiled for every sample #33
Comments
Each time let(:results) do
puts "do results"
results = []
start = Time.now
pipeline.instance_eval { @filters.each(&:register) }
puts "pipeline.instance_eval in #{Time.now - start} s"
start = Time.now
event.each do |e|
# filter call the block on all filtered events, included new events added by the filter
pipeline.filter(e) { |filtered_event| results << filtered_event }
end
puts "pipeline.filter in #{Time.now - start} s"
start = Time.now
# flush makes sure to empty any buffered events in the filter
pipeline.flush_filters(:final => true) { |flushed_event| results << flushed_event }
puts "pipeline.flush_filters in #{Time.now - start} s"
start = Time.now
results.select { |e| !e.cancelled? }
puts "results.select in #{Time.now - start} s"
end And here is the output:
|
Actually the value is cached across multiple calls in the same sample but not across samples. |
@Mogztter - would it help if the sample method allowed you to pass in a pipeline object? def sample(sample_event, user_pipeline = nil , &block)
name = sample_event.is_a?(String) ? sample_event : LogStash::Json.dump(sample_event)
name = name[0..50] + "..." if name.length > 50
describe "\"#{name}\"" do
let(:pipeline) { user_pipeline || LogStash::Pipeline.new(config) }
...
....
end
end However I would be worried about the pipeline having some in-flight events internally. The pipeline object does not have a reset method. Obviously you want a clean setup for RSpec. |
@guyboertje Yes this is one way of solving the problem but I think "end user" should not be aware of LogStash internal (pipeline). It would be nice if we could be able to execute multiple samples against the same configuration using Rspec If I want to execute sample A with another configuration, I can declare another context with another configuration. |
@Mogztter - in this case the "end user" has the role of test case developer and with this role comes some degree of responsibility for the objects under test. Presently, the sample method hides the fact that 'object under test' is really a (new) pipeline instance in a RSpec I know what you want to achieve, but I think there is a strong chance of 'not clean slate' tests. Also, FWIW, this issue is the correct place to generally discuss this (not your PR, as it is only one possible solution to this issue). |
I am okay with having a way to execute multiples test with a single pipeline instance, I understand the cost of bringing one up on every tests. But the problem with the By requirements, I mean they expect a clean state on every calls of sample which is the current behavior and this is usually what you want when you when are developping the system, you want to make sure that every call is independent of the other, one way to do it is to have a new pipeline under evaluation. On the other hand, You as the devops/user, you want want fast feedback against the same configuration and the sample method doesn't provide it out of the box. |
@guyboertje @Mogztter maybe having a subset of specialized helpers targeted at config developers would be better? |
@Mogztter all this you're fighting are rspec mechanisms to ensure isolation in tests. The reason you're fighting it is because logstash is slow when setting up the filters :( I wonder if we can profile the filter register calls and see which ones are making this delay so big. |
@Mogztter the problem in changing |
@ph - Note: @Mogztter is also (in the PR) proposing to change the I was thinking of this: let(:config) { }
samples do |p|
# or let(:config) { }
sample("apache log in debug", p) do
#expectation
end
sample("other", p) do
# expectation
end
end where the There is a way to hide the (reused) pipeline provided by All of the above is dependent on a way to reset or light-weight clone a pipeline instance when receiving it in |
@ph - yes perhaps a tailored helpers is a good idea. |
I agree but I would prefer to use an utility method rather than instantiating and configuring manually a Pipeline object.
The idea is not to merge this pull request but to highlight what part of the code need to be enhanced.
I understand
Sounds great.
The most costly part of the code is
That would be nice and will also make LogStash starts faster. A simple enhancement could be to create a new method The behavior of |
This is actually a pain, the way sample works and used is very slow, but for now is a way used in tons of places in logstash, but I see here several open questions.
And at the end we've full pipeline test, what I think people who aim to test a configuration would really need. This would certainly also benefit from a sample alike method, although I would most probably change the implementation of it to a direction closer to how the input helper works What do you think? I wonder @Mogztter, what is the main use case why do you write test? are you willing to test plugins? a given configuration for logstash? [1] input helper usage example: https://github.com/logstash-plugins/logstash-input-relp/blob/master/spec/inputs/relp_spec.rb#L28-L56 |
Our main use case is to catch regressions in our custom patterns and/or in our LogStash filters configuration. We also use our test suites to add new messages in order to validate that LogStash will properly extract and transform the data. So basically we are doing integration tests to make sure that everything works. Our filters configuration is split by message type but we also have common filters. So we could create smaller configuration files for each message type but it's faster to load/compile the full configuration and start JRuby only once. |
Hi, In https://github.com/purbon/logstash-helpers I started creating a gem that basically aims to be a collection of helpful helpers and matchers for logstash configuration, you could see for example an input helper or a grok/patterns matcher. This aims to easy and organize your experience with test in the project. In https://github.com/purbon/logstash-input-syslog/blob/feature/use-logstash-helpers/spec/inputs/syslog_spec.rb, I did an small example, experiment on having a pipeline helper compile the configuration only once (still could be improved a bit). This is useful, until the configuration change when is necessary to redo this process. some numbers run in my laptop with the syslog input.
This means approx an speed up of 47%, worth persuading ! I will keep you up to date with the evolution of this, it would be super nice to keep in touch to now how you are using the test more in detail. From your last comment I see:
Thanks for your feedback @Mogztter ! |
@Mogztter I was wondering if you are able to share your config so I might be able to do some work on improving the expectations of people like you doing this kind of test. Makes sense to you? /cheers |
I really just want to assert that when a message is received, data are correctly extracted/transformed. I'm testing the From an API point of view, the method |
Yes but I can validate both the custom patterns and the filter configuration at once. But I do agree that this is not unit testing.
I really just want to test the filter part of the configuration, I don't care about the From an API point of view, the |
We have a quite complex configuration file (approximately 1500 lines) and I believe that this line is costly: https://github.com/elastic/logstash/blob/master/lib/logstash/pipeline.rb#L26
If I understand correctly this code, a new pipeline is created for every
sample()
method, meaning that the configuration is compiled every time.Meaning that the following test will load and compile 3 times the same configuration:
Would it be possible to configure this behavior ? We have 300+ samples and running our tests suite is really slow.
Having said that I think that this behavior should be the default. If I do want to use a different configuration I can call
config
again with the new configuration, no ?The text was updated successfully, but these errors were encountered: