Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] [Resource Access Control] Finalizing the code design #5062

Open
DarshitChanpura opened this issue Jan 27, 2025 · 3 comments
Open

[RFC] [Resource Access Control] Finalizing the code design #5062

DarshitChanpura opened this issue Jan 27, 2025 · 3 comments
Labels
resource-permissions Label to track all items related to resource permissions triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.

Comments

@DarshitChanpura
Copy link
Member

DarshitChanpura commented Jan 27, 2025

Background

W.r.t Resource Access Control, Doc-Level Security (DLS) approach has been updated in PR #5016. Recently, the plan shifted from implementing abstract APIs in OpenSearch core to modifying the Security plugin so it can automatically invoke resource-access-control for relevant indices. However, this method also has certain drawbacks around thread exhaustion. To guide the final decision, three primary approaches have been considered:

Below is an updated version of the approaches, each with Advantages and Limitations sections.


1. Terms Lookup Query

Description
Leverage TLQ to dynamically fetch resource-sharing information from a separate index, then match requested resource IDs against those entries.

Advantages

  • Native Query: Uses standard OpenSearch query features, so minimal custom logic is required.
  • Simple Integration: If resource-sharing data is already in a single document, TLQ is straightforward to set up.
  • Built-in Caching: TLQ benefits from the caching and query optimizations provided by OpenSearch.

Limitations

  • Single Document Constraint: TLQ requires all resource IDs for a user (or resource) to be in a single document, which is rarely feasible for real-world data scattered across multiple docs.
  • Scalability Issues: Merging large sets of resource IDs into one document can become unwieldy, leading to performance or storage problems.
  • Narrow Applicability: If the resource-sharing model is more complex, TLQ quickly becomes impractical as a generic solution.

2. In-Memory Map

Description
Load resource-sharing configuration into an in-memory map—similar to how the Security plugin loads its main security configuration. This map would be updated in near real-time whenever resource-sharing information changes (e.g., new resources or updated permissions).

Advantages

  • Fast Lookups: In-memory data structures can offer very quick read performance.
  • Direct Integration: Follows the same pattern as existing Security config, which is already well understood.
  • Low Runtime Query Overhead: No need to perform frequent index lookups if all sharing data is already in memory.

Limitations

  • Frequent Updates: Resource-sharing data can change often (user grants/revokes), leading to continuous map updates.
  • Scalability & Distribution: Synchronizing frequent changes across a cluster can become a bottleneck and risk DoS if updates spike.
  • Operational Complexity: Requires robust mechanisms to keep the in-memory map consistent across all nodes.

3. Plugins Make API Calls

Description
Expose new APIs that other plugins can call whenever they need to check if a user has access to a given resource. These APIs handle the logic for determining resource access, potentially using the DLS approach behind the scenes or another method.

Advantages

  • Flexibility: Can be applied to resources stored in an index or elsewhere (e.g., external systems).
  • Centralized Logic: Minimizes the risk of misconfiguration by consolidating access checks in one place.
  • Extensibility: Provides a uniform interface, making it easier to evolve or integrate new resource types in the future.

Limitations

  • Implementation Complexity: Requires designing and maintaining well-defined, backward-compatible APIs.
  • Human Error: Plugin developers must remember to call these APIs correctly and consistently.
  • Performance Overheads: Multiple API calls could introduce latency, especially under high load.

Conclusion

  1. Terms Lookup Query is too restrictive due to the single-document requirement.
  2. In-Memory Map could create scalability issues with frequent updates.
  3. APIs for Resource Verification are more flexible and extensible, albeit with higher implementation complexity and reliance on proper usage by plugin developers.

Feedback from plugin developers and end users will help guide the final choice. Each approach has trade-offs in terms of performance, maintainability, and extensibility.

@DarshitChanpura DarshitChanpura added the resource-permissions Label to track all items related to resource permissions label Jan 27, 2025
@github-actions github-actions bot added the untriaged Require the attention of the repository maintainers and may need to be prioritized label Jan 27, 2025
@DarshitChanpura
Copy link
Member Author

@cwperks @nibix @reta We should pour in our thoughts and finalize the approach here.

@DarshitChanpura DarshitChanpura changed the title [Resource Access Control] Finalizing the code design [RFC] [Resource Access Control] Finalizing the code design Jan 27, 2025
@cwperks cwperks added triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable. and removed untriaged Require the attention of the repository maintainers and may need to be prioritized labels Jan 27, 2025
@cwperks
Copy link
Member

cwperks commented Jan 27, 2025

@DarshitChanpura I think we should expand on each option in a decision doc with more detail and capture some of what's been discussed in person on Github.

As I see, we have discussed 2 main approaches, but there is now a 3rd one coming into view that takes into account the https://github.com/opensearch-project/opensearch-remote-metadata-sdk/ where resource metadata could be stored outside of OpenSearch.

In the first 2 designs, we were operating under the assumption that resource metadata is stored in a system index. In this design, from a plugin developer's point of view they only need to tell the security plugin what index the sharable resources are stored in and security would handle everything else. Plugins would be unaware of whether the cluster was running with security or not as the same code would be written for both cases.

Those 2 options are:

  1. Store the resource_user and shared_with info with the resource metadata (similar to how its done today, but standardizing it and having security be the one to write and control this info)
  2. Centrally storing sharing information in an index for all resource types

Definitions:

  1. resource_user - The creator of the resource
  2. shared_with - A data structure that contains sharing info (and will be designed to support resource authorization as well which allows the sharer to specific the level of access when sharing)

With shared_with there are a couple of different ways that resources can be shared. @DarshitChanpura has been referring to this as Recipient Type

  1. users - Direct share by username
  2. role - Sharing based on the mapped roles (Roles contained in the security index)
  3. backend_role - This is pertinent to SSO users and these are roles from the backend identity provider.

Each shared_with would also be associated with an action group to specify the level of access that the target group of recipients has to the sharable resource.

Conditions for sharing

With resource sharing, there are 2 conditions in which a resource is visible to the authenticated user.

  1. The authenticated user is the resource owner
  2. The resource has been shared with the authenticated user (either via username, role or backend_role) at any access level

1. Store resource owner and sharing info w/ the resource metadata

In this approach, there must be a way for security to write the resource_user and shared_with info to the resource metadata document. Ideally, this data is protected such that only the security plugin can make updates to these fields.

  1. Search Request - When a plugin makes a search request, security would perform DLS behind the scenes where it would add a term-level query to only return documents that either 1) the authenticated user is the owner of or 2) the resource has been shared with the authenticated user
  2. Get Request - For Get Request, we would need to ensure that the authenticated user could only get documents that meet conditions 1 and 2 described above

2. Centrally storing sharing information in an index for all resource types (preferred)

Similar to 1, but centrally stored in a single index for all resource sharing info across the cluster. In this model, it can be assumed that this information is safe from being overridden because security owns the index, but it does introduce complications when plugins perform Search Requests and Get Requests on their resource indices.

  1. Search Request - When a plugin makes a search request, security fist needs to obtain the docIDs of the resources visible to authenticated user. After the docIDs are collected, security will ensure that the search request can only be performed on those docIDs
  2. Get Request - Similar to above, but if the docID is not contained in the list then security can fail the Get Request since the resource either doesn't exist or is not visible to the authenticated user

Considerations for resource metadata stored outside of OpenSearch

With https://github.com/opensearch-project/opensearch-remote-metadata-sdk/, there is an effort to abstract away metadata storage for plugins and allow metadata to be stored outside of OpenSearch. With that in mind, I think the design for resource sharing should account for this to regardless of whether 1) resource metadata is stored in OpenSearch or 2) resource metadata is stored outside of OpenSearch.

I like the idea of extrapolating the concept of DLS to remote stores, but I'm not sure how best to design that. One thing I was thinking about was whether to give plugin developers a mechanism for obtaining the IDs of resources visible to the authenticated user and leaving it up to the plugin developer to use that appropriated.

i.e. From a plugin, they can make a call similar to:

// This is pseudo-code
ResourceSharingService<SampleResource> sharingService; // sharingService is assigned if the security plugin is installed

Set<String> visibleResourceIds;

if (sharingService != null) {
     // Supports pagination?
     visibleResourceIds = sharingService.getResourceIdsForCurrentUser();
}

SearchRequest searchReq = new SearchRequest(resourceIndex);
if (visibleResourceIds != null) {
    // plugin dev is responsible for adding the filter here
}

If the plugin uses a remote store for resource metadata then they can figure out how to use the resource ids appropriately.

The security plugin will also needs hooks onto when sharable resources are created/deleted.

@nibix
Copy link
Collaborator

nibix commented Feb 3, 2025

A couple of additions:

Avoiding "human error"

The issue lists as limitation of the last option "3. Plugins Make API Calls":

Human Error: Plugin developers must remember to call these APIs correctly and consistently.

Even though the DLS approaches reduce this risk of security issue by wrongly using the provided concepts, they do not fix these completely. The DLS approaches always assume that a resource corresponds to exactly one document in an index. There are many thinkable cases where this is not the case, for example the alerting plugin has a concept called "alert comments" where the plugin implementation needs to do more checks to ensure authorized access: https://github.com/opensearch-project/alerting/blob/main/alerting/src/main/kotlin/org/opensearch/alerting/transport/TransportIndexAlertingCommentAction.kt ... thus, this risk to a certain extend applies to all approaches.

DLS

It is already mentioned in the issue, but I'd just like to put emphasis on this: DLS has quite a few limitations which make DLS based approaches challenging and expensive to achieve.

  • Enforcing DLS rules depending on cross index information is not possible with the classic, lucene based DLS. It can be achieved with a term lookup query and filter level DLS. However, filter level DLS is also subject to a couple of limitations.
  • DLS only applies to read operations. A resource access control mechanism however also needs to control write operations (so that it is not possible for a user to manipulate/overwrite resources not owned by them). It might be possible to achieve a limited DLS for write operations, but this needs additional research, design and problem solving. Each write operation needs to be considered how to enforce access controls there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
resource-permissions Label to track all items related to resource permissions triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.
Projects
None yet
Development

No branches or pull requests

3 participants