Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Query endpoint #1643

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from
Draft

Conversation

anik120
Copy link
Contributor

@anik120 anik120 commented Jan 21, 2025

Description

Implementation of Query Endpoint API RFC

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@anik120 anik120 requested a review from a team as a code owner January 21, 2025 20:31
Copy link

netlify bot commented Jan 21, 2025

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 046c66c
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/679d5455523465000898e963
😎 Deploy Preview https://deploy-preview-1643--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented Jan 21, 2025

Codecov Report

Attention: Patch coverage is 52.63158% with 171 lines in your changes missing coverage. Please review.

Project coverage is 66.22%. Comparing base (a19f91a) to head (f3588fb).

Files with missing lines Patch % Lines
catalogd/internal/storage/multireadseeker.go 14.10% 64 Missing and 3 partials ⚠️
catalogd/internal/storage/localdir.go 67.58% 43 Missing and 16 partials ⚠️
catalogd/internal/storage/index.go 60.86% 33 Missing and 3 partials ⚠️
catalogd/cmd/catalogd/main.go 0.00% 6 Missing ⚠️
catalogd/internal/serverutil/serverutil.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1643      +/-   ##
==========================================
- Coverage   67.42%   66.22%   -1.21%     
==========================================
  Files          55       57       +2     
  Lines        4632     4950     +318     
==========================================
+ Hits         3123     3278     +155     
- Misses       1284     1427     +143     
- Partials      225      245      +20     
Flag Coverage Δ
e2e 53.29% <ø> (ø)
unit 54.08% <52.63%> (-0.37%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@anik120 anik120 force-pushed the query-enpoint branch 2 times, most recently from 17e0f9a to 729645c Compare January 21, 2025 21:57
@azych
Copy link
Contributor

azych commented Jan 22, 2025

What's the relation of this PR to #1642?
This one seems to include two additional commits, with the first 4 being the same.

Which PR is a duplicate and which should be reviewed?
If this is the one to review (and also not a Draft), can you add a description of the main changes this introduces?

@anik120 anik120 marked this pull request as draft January 22, 2025 21:30
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 22, 2025
@anik120 anik120 mentioned this pull request Jan 22, 2025
4 tasks
@anik120
Copy link
Contributor Author

anik120 commented Jan 22, 2025

What's the relation of this PR to #1642? This one seems to include two additional commits, with the first 4 being the same.

Which PR is a duplicate and which should be reviewed? If this is the one to review (and also not a Draft), can you add a description of the main changes this introduces?

@azych apologies for the confusion. You're just seeing the result of rapid experimentation. This is the one to review, but it's still a WIP. I've converted this PR to a draft.

@grokspawn grokspawn changed the title Query endpoint ✨ Query endpoint Jan 23, 2025
@anik120 anik120 force-pushed the query-enpoint branch 4 times, most recently from d8dcf61 to f94eb2e Compare January 24, 2025 20:16
Comment on lines 236 to 250
httpError(w, fs.ErrNotExist)
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should change to write a 200 status and write no content to the response body.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that index.Get() was only ever returning true (so this code path was never actually being hit). Something else solved the test case failure though, but I went ahead and cleaned up Get()'s signature 53692a9

storeMetaFuncs = append(storeMetaFuncs, storeIndexData)
}

var (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throughout these lines where we setup and run concurrent channel writers and readers, we should add some comments to explain what is happening.

Comment on lines 269 to 285
func serveJsonLines(w http.ResponseWriter, r *http.Request, modTime time.Time, rs io.ReadSeeker) {
w.Header().Add("Content-Type", "application/jsonl")
http.ServeContent(w, r, "", modTime, rs)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With my suggestion of backing away from the multi-read-seeker code, we'll have to have a different version of this function that the query handler can use (the reader returned by the index will no longer be a io.ReadSeeker, it'll just be a reader)

Unfortunately, if we don't use http.ServeContent for the query handler, we also do not get the Last-Modified/If-Modified-Since header handling. Does the standard library have a way for us to implement this caching logic so that we avoid implementing bugs in our own implementation? If we copy code from the Go stdlib, let's just make sure it's license allows that and we do whatever attribution is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was able to find a way to http.ServeContent with io.Reader 79da4de

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied the checkPreconditions function and all the code associated with it from the net/http package in the http_preconditions_check.go file. I did have to make modifications to the copied code too. Imho it's smells funky to me, it's a lot of copied code that we'll have to hope won't carry any bugs, especially with the modifications I did. But, for now it's in there.


func (m *mockStorageInstance) StorageServerHandler() http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Write([]byte(m.content))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually care what this content is, or do we just care about its length? If the latter, I'd suggest removing the testCompressableJSON variable and replacing with with something like strings.Rand(sizeThreshold+1)

"github.com/operator-framework/operator-registry/alpha/declcfg"
)

type index struct {
Copy link
Member

@joelanford joelanford Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add some GoDoc to index to help our future selves understand how the indexing works.

Suggested change
type index struct {
// index is a simple index of sections of an FBC file used to lookup FBC blobs that
// match any combination of their schema, package, and name fields. A section is the
// byte offset and length of an FBC blob within the file.
//
// This index strikes a balance between space and performance. It indexes each field
// separately, and performs logical set intersections at lookup time in order to implement
// a multi-parameter query.
//
// Note: it is permissible to change the indexing algorithm later if it is necessary to
// tune the space / performance tradeoff. However care should be taken to ensure
// that the actual content returned by the index remains identical, as users of the index
// may be sensitive to differences introduced by index algorithm changes (e.g. if the
// order of the returned sections changes).
type index struct {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that we'd want robust test cases for the indexing algorithm too, but that too is probably a follow up in the interest of time.

return newMultiReadSeeker(srs...), true
}

func (i *index) getSectionSet(schema, packageName, name string) sets.Set[section] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that occurs to me is that the indexed sections are always guaranteed to be sorted by the offset. We could likely take advantage of that fact and implement a more performant algorithm that can step directly through each slice rather than building a set for each slice, performing a set intersection, the converting back into a slice and resorting.

But this is something we can capture as a follow-up.

@anik120 anik120 mentioned this pull request Jan 29, 2025
4 tasks
@anik120 anik120 force-pushed the query-enpoint branch 2 times, most recently from ee14624 to 9009472 Compare January 29, 2025 21:13
// in os.ErrNotExists
type filesOnlyFilesystem struct {
FS fs.FS
func serveJsonLines(w http.ResponseWriter, r *http.Request, modTime time.Time, rs io.ReadSeeker) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have a separate function to serve data for the query handler, I think we should have the all handler directly call http.ServeFile (after explicitly setting the "Content-Type")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like 046c66c. We get to retain the benefits of serverContent that way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants