Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement deleter #470

Closed
wants to merge 4 commits into from
Closed

Conversation

carabasdaniel
Copy link

This should close: #454

@carabasdaniel carabasdaniel force-pushed the delete branch 4 times, most recently from dac5f62 to 008a185 Compare March 23, 2023 14:35
@carabasdaniel carabasdaniel marked this pull request as ready for review March 24, 2023 13:26
@codecov-commenter
Copy link

codecov-commenter commented Mar 27, 2023

Codecov Report

Attention: 17 lines in your changes are missing coverage. Please review.

Comparison is base (7dd0378) 73.22% compared to head (69a1851) 73.09%.
Report is 47 commits behind head on main.

Files Patch % Lines
content/memory/memory.go 33.33% 4 Missing and 2 partials ⚠️
content/oci/oci.go 60.00% 4 Missing and 2 partials ⚠️
content/oci/storage.go 57.14% 2 Missing and 1 partial ⚠️
internal/resolver/memory.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #470      +/-   ##
==========================================
- Coverage   73.22%   73.09%   -0.14%     
==========================================
  Files          49       49              
  Lines        4602     4642      +40     
==========================================
+ Hits         3370     3393      +23     
- Misses        923      935      +12     
- Partials      309      314       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shizhMSFT shizhMSFT changed the title feature: Implement deleter feat: implement deleter Mar 27, 2023
@carabasdaniel
Copy link
Author

Hello @shizhMSFT @Wwwsylvia,

This is the start for the delete implementation, I would definitely appreciate a bit of feedback on this PR.

@shizhMSFT
Copy link
Contributor

@carabasdaniel Thanks for your contribution. We recognize and are reviewing this PR. Since it is related to design, it takes more time to be reviewed but should finish in 1 day or two.

content/storage.go Show resolved Hide resolved
content/oci/storage.go Outdated Show resolved Hide resolved
@@ -192,6 +192,22 @@ func (s *Store) Resolve(ctx context.Context, reference string) (ocispec.Descript
return desc, nil
}

func (s *Store) Delete(ctx context.Context, target ocispec.Descriptor) error {
Copy link
Member

@Wwwsylvia Wwwsylvia Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tricky thing about deleting is that we need to maintain the predecessors relationship as well. See Store.Push and Store.Predecessors.
This also applies to other GraphStorage that implement PredecessorsFinder.

content/file/file.go Outdated Show resolved Hide resolved
content/storage.go Show resolved Hide resolved
internal/cas/proxy.go Outdated Show resolved Hide resolved
internal/resolver/memory.go Outdated Show resolved Hide resolved
internal/cas/memory.go Outdated Show resolved Hide resolved
internal/cas/memory.go Outdated Show resolved Hide resolved
content/file/file.go Outdated Show resolved Hide resolved
content/oci/storage.go Outdated Show resolved Hide resolved
content/oci/storage.go Outdated Show resolved Hide resolved
content/oci/storage.go Outdated Show resolved Hide resolved
Comment on lines 196 to 204
resolvers := s.tagResolver.Map()
for reference, desc := range resolvers {
if content.Equal(desc, target) {
s.tagResolver.Remove(reference)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if some is pushing and deleting at the same time? There is a race condition.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The targetResolver.Map() represent an inconsistent snapshot of the storage content as far as I understand, therefore I'm not sure if a delete of a pushing content would be possible as the tagResolver is actually a syncMap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carabasdaniel Since it is possible that someone is trying to push and delete at the same time, we have to trade-off some performance for the deletion.

Similar to what I have mentioned in other comments, the current oci.Store is append-only and thus it has the benefits that multiple push and fetch can happen at the same time without race conditions.

However, adding the functionality of Delete() makes the store mutable, and we have to change the underlying implementation from sync maps to a RWMutex to ensure consistency. The drawback is that we no longer can push and delete simultaneously.

Overall, I suggest

  1. Drop Delete() for oci.Store.
  2. Develop oci.MutableStore with Delete() support.

/cc @Wwwsylvia

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I suggest
Drop Delete() for oci.Store.
Develop oci.MutableStore with Delete() support.

It makes sense.

@carabasdaniel I would also like to learn your use cases of deleting content in OCI layout. Is it for GC purpose? Could you share a little bit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Wwwsylvia, @shizhMSFT,

We have a tool used for building OCI images from rego policies: policy CLI.

We have been using the previous major version of oras-go to build our containers and interact with upstream OCI compliant registries. In our case we want the user to be able to build and easily remove containers from the local OCI storage, thus the need to have a delete mechanism in place, keeping the same workflow feel as the docker CLI.

At the moment we use a fork that includes these changes in our v0.2.0 release candidate versions of the policy CLI and it seems to work with our current needs.

I think having a different store might make sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thank you for sharing this!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Wwwsylvia for additional use case info... I am working on https://github.com/awslabs/soci-snapshotter which uses https://pkg.go.dev/oras.land/oras-go/content to keep content in the local filesystem. Our artifacts are conceptually similar to image and layer metadata; we store one artifact per image we handle, and one artifact per layer in those images, with one-way references (via a list of digests equivalent to image layers) from artifacts of the first type to artifacts of the second type.

@carabasdaniel
Copy link
Author

Hi @shizhMSFT @Wwwsylvia,

Thanks for the feedback, I'll start working on the required changes.

@carabasdaniel carabasdaniel force-pushed the delete branch 2 times, most recently from 8812296 to 8c25175 Compare March 29, 2023 13:57
internal/cas/memory.go Outdated Show resolved Hide resolved
Comment on lines 48 to 49
// DeleteStorage represents an extension of the Storage interface that includes the Deleter.
type DeleteStorage interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Storage interface abstracts append-only storages. Adding Deleter makes the storage a MutableStorage.

@Wwwsylvia Do you have a better name suggestion on DeleteStorage?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MutableStorage sounds like that the existing data can be modified or deleted. But in our case, the data can be deleted but cannot be modified.
How about StorageDeleter or DeletableStorage?
BTW, do we really need to define this interface?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type StorageWithDelete interface {
    Storage
    Deleter
}

This is from my current efforts in the same direction as this PR. If you don't define this interface, folks like me will just have to do it themselves, and our naming divergence will make it harder to find usage examples in the wild.

content/oci/storage.go Outdated Show resolved Hide resolved
content/oci/storage.go Outdated Show resolved Hide resolved
content/oci/oci.go Outdated Show resolved Hide resolved
content/oci/oci.go Show resolved Hide resolved
internal/cas/memory_test.go Show resolved Hide resolved
internal/cas/memory_test.go Show resolved Hide resolved
Comment on lines 48 to 49
// DeleteStorage represents an extension of the Storage interface that includes the Deleter.
type DeleteStorage interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MutableStorage sounds like that the existing data can be modified or deleted. But in our case, the data can be deleted but cannot be modified.
How about StorageDeleter or DeletableStorage?
BTW, do we really need to define this interface?

content/oci/storage_test.go Show resolved Hide resolved
@sparr
Copy link
Contributor

sparr commented May 26, 2023

I find myself preparing to implement similar functionality as part of https://github.com/awslabs/soci-snapshotter and want to check in on this PR. I would much prefer to contribute upstream and then consume that. Is this still under development? Would additional effort be welcome?

@@ -1993,3 +2042,114 @@ func equalDescriptorSet(actual []ocispec.Descriptor, expected []ocispec.Descript
}
return true
}

func TestStore_Delete(t *testing.T) {
Copy link
Contributor

@sparr sparr May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can some/most of this be deduplicated with the memory_test.go TestStoreDelete with similar contents? Ditto in storage_test.go

content/oci/storage.go Outdated Show resolved Hide resolved
Copy link
Member

@TerryHowe TerryHowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like maybe two other changes needed here as well as the deletes

content/oci/storage.go Outdated Show resolved Hide resolved
content/oci/oci_test.go Outdated Show resolved Hide resolved
content/oci/oci.go Outdated Show resolved Hide resolved
carabasdaniel and others added 4 commits June 2, 2023 10:26
Signed-off-by: carabasdaniel <[email protected]>
Co-authored-by: Terry Howe <[email protected]>
Signed-off-by: carabasdaniel <[email protected]>
Co-authored-by: Terry Howe <[email protected]>
Signed-off-by: carabasdaniel <[email protected]>
Co-authored-by: Terry Howe <[email protected]>
Signed-off-by: carabasdaniel <[email protected]>
@carabasdaniel
Copy link
Author

Merged suggested changes and rebased on top of latest upstream.
If we merge this I can open another PR that adds the untag feature.

Thanks for the help @TerryHowe.

@sparr
Copy link
Contributor

sparr commented Jun 2, 2023

@carabasdaniel There appear to be two outstanding comments referring to needed functionality for updating graph and resolver when content is deleted.

My limited understanding is that graph is used for predecessor/successor (aka parent/child) lookups and traversal of content and is populated in Store.Push where it calls graph.Index and during loadIndex when it loads index.json and then calls graph.IndexAll, and resolver is used for looking up content by tags and is populated by Tag (which may require something like your Untag to undo although I haven't looked far enough to determine if it would meet that need)

@Wwwsylvia
Copy link
Member

My limited understanding is that graph is used for predecessor/successor (aka parent/child) lookups and traversal of content and is populated in Store.Push where it calls graph.Index and during loadIndex when it loads index.json and then calls graph.IndexAll, and resolver is used for looking up content by tags and is populated by Tag (which may require something like your Untag to undo although I haven't looked far enough to determine if it would meet that need)

Yes it's right.

@carabasdaniel
Copy link
Author

Hi @sparr @Wwwsylvia,

Currently my bandwidth is very limited so I'm not sure when I'll be able to come back to this. For our current needs we are using a fork that has this change and the UnTag change merged.

In addition to the recent changes I made here I also prepared an untag branch that would build on top of this PR for the resolver.

I would appreciate any help to get this set implemented/merged.

Copy link
Contributor

@shizhMSFT shizhMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is technically challenging to add Delete() to the existing *oci.Store, as mentioned in #470 (comment), I would suggest developing a new store named *oci.DeletableStore where the functionality of Predecessors() can be dropped so that the difficulty of the development should be dramatically reduced.

It should also meet the requirements of https://github.com/opcr-io/policy as it does not leverage the functionality of Predecessors().

@@ -72,6 +72,18 @@ func (s *Store) Resolve(ctx context.Context, reference string) (ocispec.Descript
return s.resolver.Resolve(ctx, reference)
}

// Delete a target descriptor for storage.
func (s *Store) Delete(ctx context.Context, target ocispec.Descriptor) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.resolver is for the tagging service, i.e. for Resolve(), as Resolve() should not resolve a reference to a deleted descriptor. Similarly, s.graph is for Predecessors().

Since this PR mainly focuses on OCI store, we can keep the changes to the Memory store in another PR.

@@ -116,6 +116,25 @@ func (m *Memory) Predecessors(_ context.Context, node ocispec.Descriptor) ([]oci
return res, nil
}

// RemoveFromIndex removes the node and all its predecessors from the index and predecessor maps.
func (m *Memory) RemoveFromIndex(ctx context.Context, fetcher content.Fetcher, node ocispec.Descriptor) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems fetcher is not used anywhere.

Comment on lines +197 to +198
// Delete removed a target descriptor from index and storage.
func (s *Store) Delete(ctx context.Context, target ocispec.Descriptor) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in #470 (comment), the Delete() method might be called concurrently with other methods like Push() and Resolve(). Race condition is a concern here.

@sparr
Copy link
Contributor

sparr commented Jun 9, 2023

Speaking only for myself, an implementation of a new Store type with Delete functionality but without Predecessors would not fit my current use case. My current solution is to delete files then recreate index.json then recreate the content store so that the graph will be reindexed from scratch.

@shizhMSFT
Copy link
Contributor

The main technical challenge here is that the current *oci.Store implementation is optimized for the concurrent use of append-only store in terms of performance. It is non-trivial to modify it to have the delete functionality.

To unblock this PR attempt, @wangxiaoxuan273 could you prototype a *oci.DeletableStore with both the functionality of Delete() and Predecessors()? We can have performance penalty, but we have to make sure that all public methods of *oci.DeletableStore MUST be go-routine safe since it is by design for all Targets.

@TerryHowe
Copy link
Member

Where is this going. Any progress on the oci.DeletableStore ?

@wangxiaoxuan273
Copy link
Contributor

wangxiaoxuan273 commented Jul 8, 2023

Where is this going. Any progress on the oci.DeletableStore ?

No progress so far. I'm still in the stage of investigating. But it's likely that this PR will not be continued, and we will open a new one to fix this issue. @TerryHowe

@Wwwsylvia
Copy link
Member

Where is this going. Any progress on the oci.DeletableStore ?

FYI, here is the PR authored by @wangxiaoxuan273: #614

@shizhMSFT shizhMSFT added the stale Inactive issues or pull requests label Dec 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Inactive issues or pull requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

oci.Store and oci.Storage should implement content.Deleter
8 participants