-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Append-only Indices #12886
Comments
Thanks @sarthakaggarwal97 for the proposal. +1 on all the optimizations which can be applied under-the-hood for the append-only indices. Recently, found with Security plugin, there is a way to configure Immutable indices, the definition looks similar
FYI to ensure, there shouldn't be any conflict between the two in terms of implementation later Couple of questions:
Without the definition of alias (pointing to index) from users, automated rollover can't work. With DataStreams, it is possible as DataStream itself provides that logical construct on which searches and indexing can be performed.
Though, not lot of users use custom doc id with time series workload but some users may still use it and it could be helpful in establishing consistency and debugging issues across their systems. They will not benefit from append-only semantics as such as otherwise version map would be needed for request Id (doc id) idempotency.
Auto tiering/ migration should support irrespective of append-only or updates. Definitely, it would be more efficient for append-only indices but can work for indices which take updates/ deletes as well and other factors could define the efficiency like update frequency. |
@shwetathareja thank you for your comments!
Yes, I came across the immutable indices. I agree that we should keep the implementation in sync to avoid conflicts. Moreover, we can also look backwards and see if immutable indices can benefit from append-only semantics from OpenSearch core. Will have to dive into Security Plugin's implementation to do that.
Datastreams allows all CRUD operations for their documents. Some of the suggested semantics / optimizations for append-only indices would be expensive for datastreams I believe.
This is a valid point. If we want to support this, we should look to mandate an alias incase the user wants to perform automated rollovers with append-only indices.
Denying the custom id helps us to avoid shard skews. We may run into one shard / one segment to be a hotspot, and with merge optimizations in place, I am not sure if we should do this. I would like more inputs on this from the community.
Agreed, I am aligned on this. We should provide a ready-to-go template to allow for append-only indices. Tagging @mgodwan to give more insights.
Yes, this was added to highlight that append-only indices would be able to efficiently handle auto-tiering since we won't be allowing updates, and can create bigger segments after a point as well without worrying about updates. |
@sarthakaggarwal97 it looks to me this what is already possible with data streams?
|
@reta I just tried it, and it looks like we can update the documents in the backing indices of a datastream. So the backing indices are not truly append-only. |
@sarthakaggarwal97 thank you, that's by design (and in accordance with the documentation). I am wondering if the providing the capability to have backing indices truly append-only would be a natural improvement over data streams (in scope of this feature proposal)? |
@sarthakaggarwal97 append-only semantics should be applicable to Data Streams as DS abstractions are meant for time series workload. In order not to have breaking changes, this should still be driven by configuration. Be it regular indices or data streams backed indices, append-only semantics should work for both. |
Thanks @shwetathareja and @reta for the feedback. Append only indices will be supported through a configurable setting. If this setting is enabled, all update and delete operations on the index (UPDATE, DELETE, UPSERT, UPDATE BY QUERY, DELETE BY QUERY, etc,) will be blocked. Initially, we will also block indexing a document with a custom id to ensure index operation will always create a new document rather than updating an existing one. Additionally, append-only indices will not be enabled for data stream in the initial phase, although they can be extended for these use cases as well in the future. |
Thanks @RS146BIJAY , I would expect it to be per index upon creation, right? (aka index setting). |
How would retries work here? |
Since the proposal is to have an index setting to configure this, do you see issues with users trying to configure it for backing indices for data streams? |
Yes. |
Retries will be separate update requests for OpenSearch. Right? So this will rejecting each of update requests. This is something similar to when write block is present on the Index and user tries to index the documents via retries. Since write is blocked, every write requests are rejected. I am thinking of implementing this configuration which blocks all forms of updates on an index. |
|
Thanks @RS146BIJAY ! It is reasonable to start with blocking all operation except creation of document on regular indices. I am fine with solving Data Streams in next phase but wanted to know if there are specific challenges with DataStreams as DS is an abstraction over backing indices which are like regular indices except certain restrictions. Regarding handling of failure scenarios including retries, it would be good to capture those explicitly as you start working on low level design. I agree with @mgodwan that internal retries should be handled properly as opposed to explicit retries from the user. In case of auto generated document ids, any retries from user will be treated as new document (which is same as today).
@RS146BIJAY Are you planning to introduce finer blocks at operation level similar read/write blocks at index level? |
On a high level, do not see any specific issue with DataStreams, but still need to validate if this can break any specific usecase
Nope. Sorry for earlier confusion. There is a single
Thanks @mgodwan for pointing this scenario. Need to deep dive a bit more on how to handle retry scenarioes as we are not allowing passing a new custom doc ids. |
For coordinator retriesIn order to handle retries from coordinator (incase of Approach 1: Handling retries at coordinator layer itselfA straightforward method to handle retries from the coordinator for append-only indices is to filter out documents that have already been indexed in OpenSearch for subsequent retries. This can be achieved by making an additional transport call to search for all the documents passed in the bulk request using their respective document ids before retrying. If there is an entry for the documents in the search response, we can exclude these documents in the retry call. This ensures OpenSearch will try to index a single document (with one document id) only once. We can use the response from the search api calls to reconstruct the final response of the bulk api call for the documents. Issue with this approach
Approach 2: Handling retries at data node layer (at
|
Adding a POC link for append only indices with the third approach for retry handling: RS146BIJAY@5334da3 |
@reta @shwetathareja @mgodwan @Bukhtawar Let me know what do u think on the above approach for retries? |
Thanks @RS146BIJAY for the thoughts on this one. I agree with the recommended approach of Handling retries at InternalEngine layer as this avoid transport calls while ensuring we are performing operations in a thread-safe manner. |
Thanks @RS146BIJAY , I am wondering if InternalEngine is the only place to make the retry decisions? For example, what if during the retry operation the node goes down / primary shard changes? Shouldn't the coordinator step in and retry the operation than? |
Coordinator retries still occur when a node goes down, the primary shard changes, or for other failure scenarios. The concern we are addressing above is that if such a retry happens today, an update is also performed for all documents that were already indexed in the first attempt (performing a create and an update operation even for a document create request). With above change what are we ensuring is in such retry scenarios, for an Append only index, the InternalEngine will reconstruct and return the response using the already indexed document (indexed at the first attempt), without updating document in Lucene or making a Translog entry. This ensures that Coordinator retries do not create a new version for already indexed documents (by not allowing update) for an Append only index. Let me know if I have addressed the concern. |
@RS146BIJAY Approach 3 would be preferred to avoid extra transport call. Lets say bulk for docId1 succeeded, then user sent again another bulk for docId1 (insert), first attempt failed due to some issue, this will result in retry, how will this retry attempt handled now with Approach 3? |
+1 on the approach 3
Should we update the version map with the success entry only after a successful indexing operation |
Note that the problem with coordinator/user-level retries goes away with pull-based ingestion, since the primary shard directly controls whether an update is applied or not. (Of course, during recovery the primary shard needs to check each update from in-flight batches to see if they were included in the commit point.) |
Thanks @msfroh @shwetathareja @Bukhtawar @reta for the feedback.
Currently, for a retry of the second request, we will reconstruct the response using the document which got indexed in the first request. Ideally, the response for the second request should have been a validation exception indicating that updates are not allowed for an append only index, regardless of whether it is a retry. The reason for maintaining this behaviour is that, at the InternalEngine layer, there is no way differentiate between retry request for first request (index call) and a retry request for the second request on docId1 (update call). Due to this issue, for any retry call (be it for Index or an update request), I am reconstructing the response using the already indexed document.
I think this may fail in some scenarios. When retrieving the version in Engine, we query both the version map and the actual Lucene Index (using a searcher) in case the version map does not have an entry for this docId. So while getting version for the doc id, if indexing went through, we will still receive a valid doc version for this document. If we want to update version map atomically (only when coordinator receives a successful response), we probably need to update Lucene Index atomically as well. Another drawback is that this approach will require a separate transport call from Coordinator to update version map since version map is maintained at InternalEngine. I can think of three possible ways to ensure a consistent response is returned during indexing, even if there is a retry:
I think for the initial release we can keep things simple and go with Approach 1 by not allowing custom doc id during the indexing call. Let me know if this makes sense. |
Thanks @RS146BIJAY , I believe for the first case,
it is captured by @shwetathareja here #12886 (comment)
Could we make it per-bulk-request policy (if it makes sense)? Something like
I think this could be a heavy hit (since we have to fetch the whole document and do comparison), should we plan for it by adding something like |
Thanks @RS146BIJAY for the exploring different approaches for sending consistent responses during failure handling. I agree with @reta we can look into generating internal hash of document for comparison and generate it doing regular parsing of document so that it doesn't result in double parsing. Comparing documents is expensive operation. In order to make progress here
|
Thanks @RS146BIJAY for listing down the possible approaches. For approach 1, I think it is a good way to start as it provides a 2-way-door decision. If we start by blocking custom _id field for the append-only indices (I don't know if there are many use cases which require both append-only and custom _id?), we essentially get a request-id to _id mapping which ensures that we know whether retry is internal or external based on the checks from available data within InternalEngine. Approach 2 requires integration changes for users as they may now need to handle an exception or provide additional parameter to ensure that the expected document has been ingested into OpenSearch, and will lead into a breaking change if we decide to solve for it through a mechanism. |
Agreed for append only use cases, using custom_id may not be first choice. |
Is your feature request related to a problem? Please describe
OpenSearch today caters to various use cases like log analytics, full text search, metrics, observability, security events, etc. By default, any index created in OpenSearch allows updates and deletes on the documents ingested. While this is good to cater the various use cases mentioned above. There are time series based use cases such as logs, metrics, observability, security events, etc. which does not require update or delete operations. It is well known that updates and deletes are expensive operations as they require the OpenSearch to lookup and perform operations, add soft deletes, and consequently can cause additional work during merges which can hinder the overall performance which can be avoided by restricting those operations for those use cases which doesn’t have a need. Also, there are certain optimizations (listed below) that can be applied if we know the data will not be updated or deleted (at document level).
Disabling updates/deletes on the index documents can allow us to handle multiple use cases efficiently, they are:
index.merge.policy.max_merged_segment
defaults to 5gb). We would be avoiding a chunk of merges by preventing deletes and updates, and thus these huge segments will come in contention to be merged, allowing us to increase 5gb limit.Describe the solution you'd like
We propose to introduce the concept of append-only indices in OpenSearch to support aforementioned use-cases. With the support for restriction around keeping documents immutable, we would deny any updates and deletes of the document. This will help on reducing the footprint around memory usage for indices (e.g. version map) and also unlock the avenues to enable optimizations and features in future based on this restriction e.g.
Implementation details: TBU based on community feedback.
Additional context
FAQs:
Q: How would it be different from data streams?
A: While Data Streams optimizes on the automated rollover of time series data, it still supports for all CRUD operations on the backing indices. With append-only mode we would aim to provide with specialized optimizations and security features to such indices as a core functionality.
Q: What would be the APIs/features that we will not allow for append-only indices?
A: Some initial thoughts on the APIs/features we may not be able to support are:
_doc
API will be denied to avoid document index with custom id, updates and deletes_split
and_shrink
APIs will be denied to avoid removal of documents from underlying shards of the source index.The text was updated successfully, but these errors were encountered: