Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose More Performance-Related Configurations #630

Conversation

coltmcnealy-lh
Copy link
Member

No description provided.

@coltmcnealy-lh
Copy link
Member Author

coltmcnealy-lh commented Jan 28, 2024

Doing testing with the defaults.

  • Core Stream Threads: 2
  • Timer Stream Threads: 2
  • Core Stream Commit Interval: 5 seconds
  • Timer Stream Commit Interval: 30 seconds
  • Core Memtable Size: 64MB
  • Timer Memtable Size: 32MB
  • Core State Store Cache Size: 32MB
  • Timer State Store Cache Size: 64MB
  • Shared RocksDB Block Cache Size: 64MB
  • Write Buffer Manager Size: 256MB
12:54:22 -> for i in $(seq 1 1000); do     lhctl run hundred-tasks; done

It completed in 10 minutes 3 seconds. Yikes. Looked like a lot of RocksDB Write Stalls. Most likely caused by the Write Buffer Manager Size being too small.

@coltmcnealy-lh
Copy link
Member Author

  • Core Stream Threads: 2
  • Timer Stream Threads: 2
  • Core Stream Commit Interval: 5 seconds
  • Timer Stream Commit Interval: 30 seconds
  • Core Memtable Size: 64MB
  • Timer Memtable Size: 32MB
  • Core State Store Cache Size: 32MB
  • Timer State Store Cache Size: 64MB
  • Shared RocksDB Block Cache Size: 64MB
  • Write Buffer Manager Size: 4GB

This also took ~10 minutes. Turns out that this ticket is blocked by #576

@coltmcnealy-lh coltmcnealy-lh marked this pull request as ready for review January 28, 2024 22:14
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to open a tech-debt ticket to make this class smaller.

@coltmcnealy-lh
Copy link
Member Author

I disabled metrics aggregation (MetricsUpdater#listen()) and then ran the benchmark again. Prior to the changes in this PR, the benchmark took 119 seconds. After the changes, running with larger-memory.config took 113 seconds.

I disabled the changes related to the WriteBufferManager and Block Cache size (no shared WBM or Block Cache). Keeping the changes to commit interval and state store cache kept the small performance improvement (113 seconds). My thoughts going into this PR were confirmed:

  • The changes to the WriteBufferManager and Block Cache are intended to reduce the risk of OOM errors.
  • Using a larger commit interval should result in better performance. This performance change should be more drastic when running in a replicated environment with network latency (I am using local dev right now).

I think this PR is ready to review on its own. We should define follow-up issues to:

  • Enable jemalloc on our docker image
  • Determine recommended settings for the various cache sizes based on a given amount of memory.

My profiling has confirmed that this ticket accomplishes what it needed to do—power users can mostly limit the total on-heap and off-heap memory by tuning configurations, and we also improved performance slightly in the case where more resources are provided to the server.

The only danger of this PR is that the new default configurations result in poorer performance than the old default configurations, because for certain types of memory the old configurations did not specify a limit (which is dangerous as it makes OOM more easy to occur)

@coltmcnealy-lh coltmcnealy-lh force-pushed the 478-intelligently-allow-user-to-set-streams-configurations-with-smart-guardrails branch from 07880ec to ffcfc8d Compare January 29, 2024 19:01
// Need to inject the LHServerConfig, but Kafka Streams requires we pass in a Class with
// a default constructor. So the only way to do that is to have a static singleton which
// we set elsewhere in the code.
public static LHServerConfig serverConfig;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjsax this appears to be the only way to "inject" configuration into the RockDBConfigSetter implementation. Would it be worth a KIP to add a method:

KafkaStreams#setRocksDBConfigurator(RocksDBConfigSetter otterSetter)

The method should throw IllegalStateException if it is called after KafkaStreams#start(), or if the rocksdb.config.setter parameter is set on the properties.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think it's the only way to inject configs.

There is RocksConfigSetter.setConfig(..., config) taking the Kafka Streams config object and third parameter. When you create StreamsConfig you pass in Map<?, ?> props and all entries will be contained in config.

Thus, you can take LHServerConfig and add all of it to props before creating StreamsConfig and should have access to all of it in setConfig(...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjsax Thanks so much! This works great. Much better than submitting a KIP to the Apache Kafka Debate Club 😄

CC @eduwercamacaro let's refactor this tomorrow.

@coltmcnealy-lh coltmcnealy-lh merged commit c79af27 into master Jan 31, 2024
12 checks passed
@coltmcnealy-lh coltmcnealy-lh deleted the 478-intelligently-allow-user-to-set-streams-configurations-with-smart-guardrails branch January 31, 2024 04:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Intelligently allow user to set Streams configurations with smart guardrails.
3 participants