Is it a problem to cache latest
?
#154
-
Hey, We are using We hadn't noticed that the # No caching (latest)
10.times.map { with_execution_time_ms { avro.encode(data, subject: subject, version: 'latest') } }
# => [23.79, 15.32, 7.06, 6.36, 5.22, 4.66, 5.12, 5.44, 4.96, 5.34]
# With cache (version 1)
10.times.map { with_execution_time_ms { avro.encode(data, subject: subject, version: '1') } }
# => [7.46, 0.33, 0.21, 0.21, 0.19, 0.16, 0.2, 0.16, 0.22, 0.16] There are different approaches for solving this problem (as for example pre-loading all "latest" schemas to get their ID and use it in the next calls), but we are currently testing a monkey patch since that allowed us to control either or not we should cache "latest", here is the current implementation we are working on: module AvroTurfPatcher
class << self
# This allow us to not send a HTTP request to get the "latest" version for a Avro schema
DEFAULT_SKIP_LATEST_CACHE = false
def initialize!
patch_avro_turf!
end
private
def patch_avro_turf!
patch_registry
patch_messaging
end
def patch_messaging
::AvroTurf::Messaging.class_eval do
def fetch_schema(subject:, version: 'latest', skip_latest_cache: DEFAULT_SKIP_LATEST_CACHE)
schema_data = @registry.subject_version(subject, version, skip_latest_cache: skip_latest_cache)
schema_id = schema_data.fetch('id')
schema = Avro::Schema.parse(schema_data.fetch('schema'))
[schema, schema_id]
end
end
end
def patch_registry
::AvroTurf::CachedConfluentSchemaRegistry.class_eval do
def subject_version(subject, version = 'latest', skip_latest_cache: true)
return @upstream.subject_version(subject, version) if version == 'latest' && skip_latest_cache
@cache.lookup_by_version(subject, version) ||
@cache.store_by_version(subject, version, @upstream.subject_version(subject, version))
end
end
end
end
end Just wondering if something like this would be accepted in a PR, or at least some notes on README.md so people using the library in high-load environments can be aware that for every Kafka message being produced is going to be followed by an HTTP request when using "latest" as the version of the subject. Thanks for your attention! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I don't think it makes sense to cache |
Beta Was this translation helpful? Give feedback.
I don't think it makes sense to cache
latest
in the general case – your setup would require explanation for new hires, and isn't intuitive at all. If I were you I'd communicate the latest version explicitly, e.g. through an ENV variable, so that you require a redeploy of your applications.