Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Convert-Index-To-Remote Action for issue #808 #1302

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

wntmddus
Copy link

@wntmddus wntmddus commented Nov 2, 2024

Description

Currently Searchable snapshot features are available from Opensearch 2.6 version. But this feature itself would not make the fully automated searchable snapshot possible without convert_index_to_remote action within ism actions. Currently ISM plugin does not provide any form of restore action as part of ism actions.
Solution.
Adding convert_index_to_remote action that performs remote restore on snapshot that was created in previous action or stages and that into search nodes as REMOTE_SNAPSHOT. Only thing that need to be passed in is repository name where customer need to assign their repository first then use that to take snapshot and restore.
"convert_index_to_remote": { "properties": { "repository": { "type": "keyword" } } },
Since actions in policy are initially going through schema validation through schema in opendistro-ism-config.json file, this need to be added in here so that we can submit policy with new action.
Then Added new AttemptRestoreStep and WaitForRestoreStep to perform restore.
Then Single ConvertIndexToRemoteAction will perform AttemptRestoreStep and WaitForRestoreStep to perform restore.
Adding this functionality to the user would benefit them to have fully automated searchable snapshot feature on all of their clusters.

Documentation PR: opensearch-project/documentation-website#8638

Related Issues

Related Issue #808

Check List

[V] New functionality includes testing.
[V] New functionality has been documented.
[v] API changes companion pull request created. not required
[V] Commits are signed per the DCO using --signoff.
[V] Public documentation issue/PR created.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Seung Yeon Joo <[email protected]>

Convert-Index-To-Remote action IT added

Signed-off-by: Seung Yeon Joo <[email protected]>

Fixing IT

Signed-off-by: Seung Yeon Joo <[email protected]>

Fixing IT

Signed-off-by: Seung Yeon Joo <[email protected]>
@bowenlan-amzn
Copy link
Member

Tried to look into this error, but cannot make sense of it

org.opensearch.indexmanagement.indexstatemanagement.action.ConvertIndexToRemoteActionIT > test basic conversion to remote index FAILED
    org.junit.ComparisonFailure: expected:<[Successfully started restore for [index=convertindextoremoteactionit_index_basic]]> but was:<[Failed to start restore for [index=convertindextoremoteactionit_index_basic], cause: no permissions for [] and associated roles [own_index, all_access]]>
        at __randomizedtesting.SeedInfo.seed([82D2EDFDB4C2F9AC:D379ECB[62](https://github.com/opensearch-project/index-management/actions/runs/12626094047/job/35212093229?pr=1302#step:6:63)C9A9419]:0)
        at org.junit.Assert.assertEquals(Assert.java:117)
        at org.junit.Assert.assertEquals(Assert.java:146)
        at org.opensearch.indexmanagement.indexstatemanagement.action.ConvertIndexToRemoteActionIT$test basic conversion to remote index$2.invoke(ConvertIndexToRemoteActionIT.kt:[64](https://github.com/opensearch-project/index-management/actions/runs/12626094047/job/35212093229?pr=1302#step:6:65))
        at org.opensearch.indexmanagement.indexstatemanagement.action.ConvertIndexToRemoteActionIT$test basic conversion to remote index$2.invoke(ConvertIndexToRemoteActionIT.kt:62)
        at org.opensearch.indexmanagement.TestHelpersKt.waitFor(TestHelpers.kt:126)
        at org.opensearch.indexmanagement.TestHelpersKt.waitFor$default(TestHelpers.kt:113)
        at org.opensearch.indexmanagement.indexstatemanagement.action.ConvertIndexToRemoteActionIT.test basic conversion to remote index(ConvertIndexToRemoteActionIT.kt:62)
REPRODUCE WITH: ./gradlew ':integTest' --tests "org.opensearch.indexmanagement.indexstatemanagement.action.ConvertIndexToRemoteActionIT.test basic conversion to remote index" -Dtests.seed=82D2EDFDB4C2F9AC -Dtests.security.manager=false -Dtests.locale=mzn-IR -Dtests.timezone=America/Resolute -Druntime.java=21

It's thrown from here
https://github.com/opensearch-project/security/blob/cd1dcbde45a6c2523505360f2fa15483b170e7f5/src/main/java/org/opensearch/security/filter/SecurityFilter.java#L462-L464

But it doesn't tell what permission is missing, and all_access is shown in roles which I suppose should be able to perform this restore action.

@wntmddus
Copy link
Author

wntmddus commented Jan 7, 2025

Can I at least have permission to run workflow? it gets very hard to debug this IT error as I need permission to run workflow @bowenlan-amzn

@@ -216,6 +216,13 @@
}
}
},
"convert_index_to_remote": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer this to be a "restore" action which clearly indicate which API it calls behind the scene. With storage_type as a param here, this suits the searchable snapshot use case.

repository would be the required parameter as you have here

snapshot name, rename pattern.., can be optional params with default values

indices should be hard-coded to the managed index itself

https://opensearch.org/docs/latest/tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot/#create-a-searchable-snapshot-index
https://opensearch.org/docs/latest/tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot/#create-a-searchable-snapshot-index

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason why I chose this name is that restore can mean different things here. as Restore could just mean regular restore. Since this action is specifically designed for searchable snapshot's remote restore, I named it like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understand, in ISM I think we follow the naming around the wrapped API pretty well.
And a normal way to inform users is by mentioning what to do using ISM in the searchable snapshot documentation page so it won't be hard to find out.

@cwperks
Copy link
Member

cwperks commented Jan 7, 2025

An error message like the one in #1302 (comment) usually indicates that a regular user is trying to perform an admin operation on a system index. I'm not familiar with this PR, but I can see errors in the logs pertaining to the security index.

? WARN ][o.o.s.p.SnapshotRestoreEvaluator] [integTest-0] cluster:admin/snapshot/restore for '.opendistro_security' as source index is not allowed
? ERROR][o.o.i.i.s.r.AttemptRestoreStep] [integTest-0] Failed to start restore for [index=convertindextoremoteactionit_index_basic], cause: no permissions for [] and associated roles [own_index, all_access]
?  org.opensearch.OpenSearchSecurityException: no permissions for [] and associated roles [own_index, all_access]


// Proceed with the restore operation
val restoreSnapshotRequest = RestoreSnapshotRequest(repository, snapshotName)
.indices("*")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May not harm to hard code this to be $indexName

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

// List snapshots matching the pattern
val getSnapshotsRequest = GetSnapshotsRequest()
.repository(repository)
.snapshots(arrayOf("$indexName*"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this will retrieve snapshots with name started with index name, what if the snapshot is for many indexes (User has snapshot management to take a snapshot of group of indexes) and the name would not start with specific index.

I think $indexName* can be a good default value, but we better provide a param for user to specify other snapshot prefix.

.storageType(RestoreSnapshotRequest.StorageType.REMOTE_SNAPSHOT)
.renamePattern("^(.*)\$")
.renameReplacement("$1_remote")
.waitForCompletion(false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an expert on searchable snapshot, I can see that it won't download the data but only some cluster state, I feel safe to just wait for completion true here as it probably won't take long. And this would simplify the workflow by about half 😜

@kotwanikunal to have a second opinion on this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was debating alot on this on my mind whether to wait for it so making the thread blocking and simplify or just make it false and wait for it to complete.

@bowenlan-amzn
Copy link
Member

bowenlan-amzn commented Jan 7, 2025

@wntmddus commented on Jan 6, 2025, 4:37 PM PST:

Can I at least have permission to run workflow? it gets very hard to debug this IT error as I need permission to run workflow @bowenlan-amzn

Originally posted by @wntmddus in #1302 (comment)

I suppose only maintainer can re-run, please feel free to ping me on slack

And seems the first contribution even needs maintainer to manually approve to run checks

@bowenlan-amzn
Copy link
Member

@cwperks commented on Jan 6, 2025, 4:58 PM PST:

An error message like the one in #1302 (comment) usually indicates that a regular user is trying to perform an admin operation on a system index. I'm not familiar with this PR, but I can see errors in the logs pertaining to the security index.

? WARN ][o.o.s.p.SnapshotRestoreEvaluator] [integTest-0] cluster:admin/snapshot/restore for '.opendistro_security' as source index is not allowed
? ERROR][o.o.i.i.s.r.AttemptRestoreStep] [integTest-0] Failed to start restore for [index=convertindextoremoteactionit_index_basic], cause: no permissions for [] and associated roles [own_index, all_access]
?  org.opensearch.OpenSearchSecurityException: no permissions for [] and associated roles [own_index, all_access]

Originally posted by @cwperks in #1302 (comment)

Ahh it seems the snapshot includes security system index, @wntmddus would you try create snapshot only including the test index and see.

@wntmddus
Copy link
Author

wntmddus commented Jan 7, 2025

@cwperks I see whats happening. I have taken snapshot of the whole cluster instead of using snapshotAction to wait for it. thats why its trying to restore that system index. let me try to fix that part on IT

@wntmddus
Copy link
Author

wntmddus commented Jan 7, 2025

@bowenlan-amzn where Can I join the slack community?

@shiv0408
Copy link
Member

shiv0408 commented Jan 7, 2025

Find the public slack link from: https://opensearch.org/slack.html

Signed-off-by: Seung Yeon Joo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants