-
Notifications
You must be signed in to change notification settings - Fork 573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposed new search lexicons #1594
Conversation
db826ff
to
12f3711
Compare
Larger refactors in this branch: - [x] local docker dev env documented - [x] specify mappings (schemas) for post and profile indices - [x] transform raw records in to the index schemas - [x] different doc _id syntax - [x] skip read+deserialization of records other than profile and post, for efficiency - [x] don't store records in database; database only used for firehose cursor state - [x] switch to informal /xrpc/app.bsky.unspecced.search*Skeleton endpoints - [x] return only skeleton responses (eg, AT-URI or DID lists) - [x] handle non-success OpenSearch responses as errors - [x] auto-create indices with schema when in indexing mode (not READONLY) (with `go:embed` schemas) - [x] switch logging to `log/slog`, including echo integration - [x] use `atproto/identity` package for identity caching and handling, not `User` database record - [x] merged in backfill worker code - [x] use `analysis-icu` plugin for (hopefully) better internationalized search - [x] special typeahead indexing and query parameter - [x] basic/simple query string parsing, which should be safe, supports quoted phrases, and `from:` filtering This branch includes a couple small commits to SDK code, which i've cherry-picked out as separate PRs for easier review. See also Lexicon PR in atproto repo: bluesky-social/atproto#1594 This is not compatible with the previous version of `palomar` at the HTTP API, opensearch index, or database schema levels. The config vars should be backwards compatible. The operational plan for staging and prod is to deploy this as an entirely new environment (eg, "prod2", "staging2"), get everything backfilled, and then flip over the AppView and then client app to use the lexicons/endpoints instead of the older version. ---- I think this is ready for review, merge, and deploy to staging. Some things to check before prod: - [ ] compare index size and performance to existing version/schema - [ ] real-world testing of profile typeahead (eg, do we need fuzzy?) - [ ] real-world search relevancy checks - [ ] real-world CJK text analysis checks (#302) Out of scope for this PR: - [ ] deal with `created_at` timestamp not being reliable, by adding a `sort_at` hybrid field, for future "sort by date" - [ ] instrumentation and metrics (Jaz to implement on top of this branch) - [ ] better bulk indexing performance, especially during backfill: disable refresh during backfill? longer refresh window? bulk (batch) indexing would be best - [x] integrate a better identity service/cache; current is probably Ok in context of backfill. or perhaps just bump the cache size to ~50k or ~100k identities in prod?
"type": "string", | ||
"description": "search query string; syntax, phrase, boolean, and faceting is unspecified, but Lucene query syntax is recommended. For typeahead search, only simple term match is supported, not full syntax" | ||
}, | ||
"typeahead": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to use a non-normative parameter for this? something like simple
or quick
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, I feel like typeahead is pretty term-of-art for this query type (not just the term we are using). different sub-set of fields are queried, and only really works for prefix of a single token or two, character-by-character.
"simple" or "quick" would be confusing to me, i'd assume that they would return un-hydrated responses or something (responses are always hydrated).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool cool - i'm game for "typeahead" 👍
"limit": { | ||
"type": "integer", | ||
"minimum": 1, | ||
"maximum": 100, | ||
"default": 50 | ||
"default": 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice i dig both of these new defaults 👍
"parameters": { | ||
"type": "params", | ||
"properties": { | ||
"term": { "type": "string" }, | ||
"term": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would support phasing out term
and phasing in q
I think "term" makes sense when it's simple search on a word & no additional syntax is allowed
q
fits nicely when we allow for special query syntax (like Lucene syntax)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, I updated all the instances of 'term' as a search query with 'q', and marked the old 'term' fields as "DEPRECATED" in description. This impacted an admin route as well. Idea is that for a short transition we'll fall back to "term" if "q" is empty, and maybe eventually nuke them (if/when we are feeling comfortable breaking query params in Lexicons in a small way, like a v1.0 release)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup that works 👍
These look good and make a lot of sense to me 👍 Just chatting about a few things from the PR description:
I like the idea of keeping these relatively similar - I'm down with moving to
One question is: will we ever want any extra metadata on an object for why it turned up in search? something like "trending down" or "you follow this user"? If so, we may not want to use
Makes sense to me 👍 And strong agree that we do not want to support full index pagination |
w/r/t I'd lean towards keeping it simple (makes for simpler code), but don't feel super strongly and could add wrapper objects if you think we should be consistent. |
b2319fd
to
288d113
Compare
Makes sense - I think it's fine as is then 👍 |
Ran codegen, and naively prefer new cc: @devinivy if you might have any last-minute Lexicon style vibes |
"cursor": { | ||
"type": "string", | ||
"description": "optional pagination mechanism; may not necessarily allow scrolling through entire result set" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the past it's proven useful to not support a cursor for typeahead search. Just wanted to note that in case it's relevant here, since both typeahead
and cursor
parameters can appear together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the alternative be a second endpoint just for typeahead? That feels expansive to me. I think in typeahead mode we can safely ignore the cursor
param and never populate that field in the output.
"actors": { | ||
"type": "array", | ||
"items": { | ||
"type": "string", | ||
"format": "did" | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not terribly worried about it, but there could be some value in wrapping the did in an object, just so that there's an upgrade path in case we want to provide any additional/meta info with each result in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dan had the same thoughts in earlier review, which is strong enough signal that i'll add that to these endpoints.
"posts": { | ||
"type": "array", | ||
"items": { | ||
"type": "string", | ||
"format": "at-uri" | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same goes here, there could be some value in wrapping the uri in an object so that we leave open an upgrade path to add additional/meta info about each result in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Left a few misc thoughts.
updated with skeletal objects, and re-codegen |
* proposed new search lexicons * lexicons: lint * lexicons: fix actors typo * lexicons: camelCase bites again, ssssss * lexicons: add 'q' and mark 'term' as deprecated for search endpoints * codegen for search lexicon updates * bsky: prefer 'q' over 'term' in existing search endpoints * search: bugfix * lexicons: make unspecced search endpoints return skeleton obj * re-codegen for search skeleton obj
* lexicons * codegen * email templates * request routes * impl * migration * tidy * tests * tidy & bugfixes * format * fix api test * fix auth test * codegen * add unique constraint * Add email confirmed to AtpSessionData * interop test files (#1529) * initial interop-test-files * crypto: switch signature-fixtures.json to a symlink * syntax: test against interop files * prettier * Update interop-test-files/README.md Co-authored-by: Eric Bailey <[email protected]> * disable prettier on test vectors --------- Co-authored-by: Eric Bailey <[email protected]> Co-authored-by: dholms <[email protected]> * add getSuggestedFollowsByActor (#1553) * add getSuggestedFollowsByActor lex * remove pagination * codegen * add pds route * add app view route * first pass at likes-based suggested actors, plus tests * format * backfill with suggested_follow table * combine actors queries * fall back to popular follows, handle backfill differently * revert seed change, update test * lower likes threshold * cleanup * remove todo * format * optimize queries * cover mute lists * clean up into pipeline steps * add changeset * List feeds (#1557) * lexicons for block lists * reorg blockset functionality into graph service, impl block/mute filtering * apply filterBlocksAndMutes() throughout appview except feeds * update local feeds to pass through cleanFeedSkeleton(), offload block/mute application * impl for grabbing block/mute details by did pair * refactor getActorInfos away, use actor service * experiment with moving getFeedGenerators over to a pipeline * move getPostThread over to a pipeline * move feeds over to pipelines * move suggestions and likes over to pipelines * move reposted-by, follows, followers over to pipelines, tidy author feed and post thread * remove old block/mute checks * unify post presentation logic * move profiles endpoints over to pipelines * tidy * tidy * misc fixes * unify some profile hydration/presentation in appview * profile detail, split hydration and presentation, misc fixes * unify feed hydration w/ profile hydration * unify hydration step for embeds, tidy application of labels * setup indexing of list-blocks in bsky appview * apply list-blocks, impl getListBlocks, tidy getList, tests * tidy * update pds proxy snaps * update pds proxy snaps * fix snap * make algos return feed items, save work in getFeed * misc changes, tidy * tidy * fix aturi import * lex * list purpose * lex gen * add route * add proxy route * seed client helpers * tests * mutes and blocks * proxy test * snapshot * hoist actors out of composeThread() * tidy * tidy * run ci on all prs * format * format * fix snap name * fix snapsh --------- Co-authored-by: Devin Ivy <[email protected]> * Improve xrpc server error handling (#1597) improve xrpc server error handling * Remove appview proxy runtime flags (#1590) * remove appview proxy runtime flags * clean up proxy tests * getPopular hotfix (#1599) dont pass all params * Interaction Gating (#1561) * lexicons for block lists * reorg blockset functionality into graph service, impl block/mute filtering * apply filterBlocksAndMutes() throughout appview except feeds * update local feeds to pass through cleanFeedSkeleton(), offload block/mute application * impl for grabbing block/mute details by did pair * refactor getActorInfos away, use actor service * experiment with moving getFeedGenerators over to a pipeline * move getPostThread over to a pipeline * move feeds over to pipelines * move suggestions and likes over to pipelines * move reposted-by, follows, followers over to pipelines, tidy author feed and post thread * remove old block/mute checks * unify post presentation logic * move profiles endpoints over to pipelines * tidy * tidy * misc fixes * unify some profile hydration/presentation in appview * profile detail, split hydration and presentation, misc fixes * unify feed hydration w/ profile hydration * unify hydration step for embeds, tidy application of labels * setup indexing of list-blocks in bsky appview * apply list-blocks, impl getListBlocks, tidy getList, tests * tidy * update pds proxy snaps * update pds proxy snaps * fix snap * make algos return feed items, save work in getFeed * misc changes, tidy * tidy * fix aturi import * initial lexicons for interaction-gating * add interactions view to post views * codegen * model bad reply/interaction check state on posts * initial impl for checking bad reply or interaction on write * omit invalid interactions from post thread * support not-found list in interaction view * hydrate can-reply state on threads * present interaction views on posts * misc fixes, update snaps * tidy/reorg * tidy * split interaction gating into separate record in lexicon * switch interaction-gating impl to use separate record type * allow checking reply gate w/ root post deletion * fix * initial gating tests * tighten gated reply views, tests * reply-gating list rule tests * allow custom post rkeys within window * hoist actors out of composeThread() * tidy * update thread gate lexicons, codegen * lex fix * rename gate to threadgate in bsky, update views * lex fix * improve terminology around reply validation * fix down migration * remove thread gates on actor unindexing * add back .prettierignore * tidy * run ci on all prs * syntax * run ci on all prs * format * fix snap --------- Co-authored-by: Devin Ivy <[email protected]> * order by `like.indexedAt` in app view (#1592) * order by like.indexedAt * use keyset for ordering * simplify * ok ok ok I get it now * Update packages/bsky/src/api/app/bsky/feed/getActorLikes.ts Co-authored-by: Daniel Holmgren <[email protected]> --------- Co-authored-by: Daniel Holmgren <[email protected]> * Remove default value for post table invalid attrs (#1601) remove default value for post table attrs * Version packages (#1602) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * update Bluesky PBLLC to PBC (Public Benefit Corporation) (#1600) * Temporarily disable filtering `invalidReplyRoot`s (#1609) temporarily disable invalidReplyRoot check * fix syntax docs (#1611) * Version packages (#1612) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Allow bypass on ratelimit ip (#1613) allow bypass on ratelimit ip * Write rate limits (#1578) * get rate limit ip correctly * add write rate-limits * Tweak createSession rate limit key (#1614) tweak create session rl key * Filter preferences for app passwords (#1626) filter preferences for app passwords * Tweak rate limit setup for multi rate limit routes (#1627) tweak rate limit setup for multi rate limit routes * Remove zod from xrpc-server error handling (#1631) remove zod from xrpc-server error handling check * Enforce properties field on lexicon object schemas (#1628) * add empty properites to thread gate schema fragments * tweak lexicon type * Add feed-vew and thread-view preferences (#1638) * Add feed and thread preference lexicons * Add feed-view and thread-view preference APIs * Add changeset for new preferences (#1639) Add changeset * Version packages (#1640) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Disable getAccountInviteCodes for app passwords (#1642) disable getAccountInviteCodes for app passwords * remove cruft packages (uri, nsid, identifier) (#1606) * remove @atproto/nsid (previously moved to syntax) * remove @atproto/uri (previously moved to syntax) * remove @atproto/identifier (previously moved to syntax) * bump lockfile to remove old packages --------- Co-authored-by: Eric Bailey <[email protected]> * api: update login/resumeSession examples in README (#1634) * api: update login/resumeSession examples in README * Update packages/api/README.md Co-authored-by: Daniel Holmgren <[email protected]> --------- Co-authored-by: Daniel Holmgren <[email protected]> * small syntax lints (#1646) * lint: remove unused imports and variables * lint: prefix unused args with '_' * eslint: skip no-explicit-any; ignore unused _var (prefix) * eslint: explicitly mark ignores for tricky cases * indicate that getPopular is deprecated (#1647) * indicate that getPopular is deprecated * codegen for deprecating getPopular * tidy up package.json and READMEs (#1649) * identity: README example and tidy * tidy up package metadata (package.json files) * updated README headers/stubs for several packages * crypto: longer README, with usage * syntax: tweak README * Apply suggestions from code review Co-authored-by: Eric Bailey <[email protected]> Co-authored-by: devin ivy <[email protected]> --------- Co-authored-by: Eric Bailey <[email protected]> Co-authored-by: devin ivy <[email protected]> * Improve the types of the thread and feed preferences APIs (#1653) * Improve the types of the thread and feed preferences APIs * Remove unused import * Add changeset * Version packages (#1654) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Disable pds appview routes (#1644) * wip * remove all canProxyReadc * finish cleanup * clean up tests * fix up tests * fix api tests * fix build * fix compression test * update image tests * fix dev envs * build branch * fix service file * re-enable getPopular * format * rm unused sharp code * dont build branch * auto-moderator tweaks: pass along record URI, create report for takedown action (#1643) * auto-moderator: include record URI in abyss requests * auto-moderator: log attempt at hard takedown; create report as well The motivation is to flag the event to mod team, and to make it easier to confirm that takedown took place. * auto-mod: typo fix * auto-mod: bugfixes * bsky: always create auto-mod report locally, not pushAgent (if possible) * bsky: fix auto-mod build * bsky: URL-encode scanBlob call * Clear follow viewer state when blocking (#1659) * clear follow viewer state when blocking * tidy * add `tags` to posts (#1637) * add tags to post lex * kiss * add richtext facet and validation attrs * add tag validation attrs to post * codegen * add maxLength for tags, add description * validate post tags on write * add test * handle tags in indexer * add tags to postView, codegen * return tags on post thread view * format * revert formatting change to docs * use establish validation pattern * add changeset (cherry picked from commit fcb6fe7) * remove tags from postView, codegen * remove tags from thread view * revert unused changes * Version packages (#1664) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * merge * Reverse order of blocks from sync.getRepo (#1665) * reverse order of blocks from sync.getRepo * write to car while fetching next page * Add hashtag detection to richtext (#1651) * add tag detection to richtext * fix duplicate tag index error * add utils * fix leading space index failures, test for them * add changeset * Version packages (#1669) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * proposed new search lexicons (#1594) * proposed new search lexicons * lexicons: lint * lexicons: fix actors typo * lexicons: camelCase bites again, ssssss * lexicons: add 'q' and mark 'term' as deprecated for search endpoints * codegen for search lexicon updates * bsky: prefer 'q' over 'term' in existing search endpoints * search: bugfix * lexicons: make unspecced search endpoints return skeleton obj * re-codegen for search skeleton obj * Disable pds appview indexing (#1645) * rm indexing service * remove message queue & refactor background queue * wip * remove all canProxyReadc * finish cleanup * clean up tests * fix up tests * fix api tests * fix build * fix compression test * update image tests * fix dev envs * build branch * wip - removing labeler * fix service file * remove kysely tables * re-enable getPopular * format * cleaning up tests * rm unused sharp code * rm pds build * clean up tests * fix build * fix build * migration * tidy * build branch * tidy * build branch * small tidy * dont build * Refactor PDS appview routes (#1673) move routes around * Strip leading `#` from from detected tag facets (#1674) ensure # is removed from facets * Version packages (#1675) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Proxy search queries (#1676) * proxy search * tweak profile resp * fix admin.searchRepos * add mock mailer * Fix to daniel's MOCKERY of a mock mailer * Don't allow non-verified email updates until app feature is out (#1682) stricter updating email until app feature is out * changesets --------- Co-authored-by: Paul Frazee <[email protected]> Co-authored-by: bnewbold <[email protected]> Co-authored-by: Eric Bailey <[email protected]> Co-authored-by: Devin Ivy <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* proposed new search lexicons * lexicons: lint * lexicons: fix actors typo * lexicons: camelCase bites again, ssssss * lexicons: add 'q' and mark 'term' as deprecated for search endpoints * codegen for search lexicon updates * bsky: prefer 'q' over 'term' in existing search endpoints * search: bugfix * lexicons: make unspecced search endpoints return skeleton obj * re-codegen for search skeleton obj
* lexicons * codegen * email templates * request routes * impl * migration * tidy * tests * tidy & bugfixes * format * fix api test * fix auth test * codegen * add unique constraint * Add email confirmed to AtpSessionData * interop test files (bluesky-social#1529) * initial interop-test-files * crypto: switch signature-fixtures.json to a symlink * syntax: test against interop files * prettier * Update interop-test-files/README.md Co-authored-by: Eric Bailey <[email protected]> * disable prettier on test vectors --------- Co-authored-by: Eric Bailey <[email protected]> Co-authored-by: dholms <[email protected]> * add getSuggestedFollowsByActor (bluesky-social#1553) * add getSuggestedFollowsByActor lex * remove pagination * codegen * add pds route * add app view route * first pass at likes-based suggested actors, plus tests * format * backfill with suggested_follow table * combine actors queries * fall back to popular follows, handle backfill differently * revert seed change, update test * lower likes threshold * cleanup * remove todo * format * optimize queries * cover mute lists * clean up into pipeline steps * add changeset * List feeds (bluesky-social#1557) * lexicons for block lists * reorg blockset functionality into graph service, impl block/mute filtering * apply filterBlocksAndMutes() throughout appview except feeds * update local feeds to pass through cleanFeedSkeleton(), offload block/mute application * impl for grabbing block/mute details by did pair * refactor getActorInfos away, use actor service * experiment with moving getFeedGenerators over to a pipeline * move getPostThread over to a pipeline * move feeds over to pipelines * move suggestions and likes over to pipelines * move reposted-by, follows, followers over to pipelines, tidy author feed and post thread * remove old block/mute checks * unify post presentation logic * move profiles endpoints over to pipelines * tidy * tidy * misc fixes * unify some profile hydration/presentation in appview * profile detail, split hydration and presentation, misc fixes * unify feed hydration w/ profile hydration * unify hydration step for embeds, tidy application of labels * setup indexing of list-blocks in bsky appview * apply list-blocks, impl getListBlocks, tidy getList, tests * tidy * update pds proxy snaps * update pds proxy snaps * fix snap * make algos return feed items, save work in getFeed * misc changes, tidy * tidy * fix aturi import * lex * list purpose * lex gen * add route * add proxy route * seed client helpers * tests * mutes and blocks * proxy test * snapshot * hoist actors out of composeThread() * tidy * tidy * run ci on all prs * format * format * fix snap name * fix snapsh --------- Co-authored-by: Devin Ivy <[email protected]> * Improve xrpc server error handling (bluesky-social#1597) improve xrpc server error handling * Remove appview proxy runtime flags (bluesky-social#1590) * remove appview proxy runtime flags * clean up proxy tests * getPopular hotfix (bluesky-social#1599) dont pass all params * Interaction Gating (bluesky-social#1561) * lexicons for block lists * reorg blockset functionality into graph service, impl block/mute filtering * apply filterBlocksAndMutes() throughout appview except feeds * update local feeds to pass through cleanFeedSkeleton(), offload block/mute application * impl for grabbing block/mute details by did pair * refactor getActorInfos away, use actor service * experiment with moving getFeedGenerators over to a pipeline * move getPostThread over to a pipeline * move feeds over to pipelines * move suggestions and likes over to pipelines * move reposted-by, follows, followers over to pipelines, tidy author feed and post thread * remove old block/mute checks * unify post presentation logic * move profiles endpoints over to pipelines * tidy * tidy * misc fixes * unify some profile hydration/presentation in appview * profile detail, split hydration and presentation, misc fixes * unify feed hydration w/ profile hydration * unify hydration step for embeds, tidy application of labels * setup indexing of list-blocks in bsky appview * apply list-blocks, impl getListBlocks, tidy getList, tests * tidy * update pds proxy snaps * update pds proxy snaps * fix snap * make algos return feed items, save work in getFeed * misc changes, tidy * tidy * fix aturi import * initial lexicons for interaction-gating * add interactions view to post views * codegen * model bad reply/interaction check state on posts * initial impl for checking bad reply or interaction on write * omit invalid interactions from post thread * support not-found list in interaction view * hydrate can-reply state on threads * present interaction views on posts * misc fixes, update snaps * tidy/reorg * tidy * split interaction gating into separate record in lexicon * switch interaction-gating impl to use separate record type * allow checking reply gate w/ root post deletion * fix * initial gating tests * tighten gated reply views, tests * reply-gating list rule tests * allow custom post rkeys within window * hoist actors out of composeThread() * tidy * update thread gate lexicons, codegen * lex fix * rename gate to threadgate in bsky, update views * lex fix * improve terminology around reply validation * fix down migration * remove thread gates on actor unindexing * add back .prettierignore * tidy * run ci on all prs * syntax * run ci on all prs * format * fix snap --------- Co-authored-by: Devin Ivy <[email protected]> * order by `like.indexedAt` in app view (bluesky-social#1592) * order by like.indexedAt * use keyset for ordering * simplify * ok ok ok I get it now * Update packages/bsky/src/api/app/bsky/feed/getActorLikes.ts Co-authored-by: Daniel Holmgren <[email protected]> --------- Co-authored-by: Daniel Holmgren <[email protected]> * Remove default value for post table invalid attrs (bluesky-social#1601) remove default value for post table attrs * Version packages (bluesky-social#1602) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * update Bluesky PBLLC to PBC (Public Benefit Corporation) (bluesky-social#1600) * Temporarily disable filtering `invalidReplyRoot`s (bluesky-social#1609) temporarily disable invalidReplyRoot check * fix syntax docs (bluesky-social#1611) * Version packages (bluesky-social#1612) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Allow bypass on ratelimit ip (bluesky-social#1613) allow bypass on ratelimit ip * Write rate limits (bluesky-social#1578) * get rate limit ip correctly * add write rate-limits * Tweak createSession rate limit key (bluesky-social#1614) tweak create session rl key * Filter preferences for app passwords (bluesky-social#1626) filter preferences for app passwords * Tweak rate limit setup for multi rate limit routes (bluesky-social#1627) tweak rate limit setup for multi rate limit routes * Remove zod from xrpc-server error handling (bluesky-social#1631) remove zod from xrpc-server error handling check * Enforce properties field on lexicon object schemas (bluesky-social#1628) * add empty properites to thread gate schema fragments * tweak lexicon type * Add feed-vew and thread-view preferences (bluesky-social#1638) * Add feed and thread preference lexicons * Add feed-view and thread-view preference APIs * Add changeset for new preferences (bluesky-social#1639) Add changeset * Version packages (bluesky-social#1640) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Disable getAccountInviteCodes for app passwords (bluesky-social#1642) disable getAccountInviteCodes for app passwords * remove cruft packages (uri, nsid, identifier) (bluesky-social#1606) * remove @atproto/nsid (previously moved to syntax) * remove @atproto/uri (previously moved to syntax) * remove @atproto/identifier (previously moved to syntax) * bump lockfile to remove old packages --------- Co-authored-by: Eric Bailey <[email protected]> * api: update login/resumeSession examples in README (bluesky-social#1634) * api: update login/resumeSession examples in README * Update packages/api/README.md Co-authored-by: Daniel Holmgren <[email protected]> --------- Co-authored-by: Daniel Holmgren <[email protected]> * small syntax lints (bluesky-social#1646) * lint: remove unused imports and variables * lint: prefix unused args with '_' * eslint: skip no-explicit-any; ignore unused _var (prefix) * eslint: explicitly mark ignores for tricky cases * indicate that getPopular is deprecated (bluesky-social#1647) * indicate that getPopular is deprecated * codegen for deprecating getPopular * tidy up package.json and READMEs (bluesky-social#1649) * identity: README example and tidy * tidy up package metadata (package.json files) * updated README headers/stubs for several packages * crypto: longer README, with usage * syntax: tweak README * Apply suggestions from code review Co-authored-by: Eric Bailey <[email protected]> Co-authored-by: devin ivy <[email protected]> --------- Co-authored-by: Eric Bailey <[email protected]> Co-authored-by: devin ivy <[email protected]> * Improve the types of the thread and feed preferences APIs (bluesky-social#1653) * Improve the types of the thread and feed preferences APIs * Remove unused import * Add changeset * Version packages (bluesky-social#1654) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Disable pds appview routes (bluesky-social#1644) * wip * remove all canProxyReadc * finish cleanup * clean up tests * fix up tests * fix api tests * fix build * fix compression test * update image tests * fix dev envs * build branch * fix service file * re-enable getPopular * format * rm unused sharp code * dont build branch * auto-moderator tweaks: pass along record URI, create report for takedown action (bluesky-social#1643) * auto-moderator: include record URI in abyss requests * auto-moderator: log attempt at hard takedown; create report as well The motivation is to flag the event to mod team, and to make it easier to confirm that takedown took place. * auto-mod: typo fix * auto-mod: bugfixes * bsky: always create auto-mod report locally, not pushAgent (if possible) * bsky: fix auto-mod build * bsky: URL-encode scanBlob call * Clear follow viewer state when blocking (bluesky-social#1659) * clear follow viewer state when blocking * tidy * add `tags` to posts (bluesky-social#1637) * add tags to post lex * kiss * add richtext facet and validation attrs * add tag validation attrs to post * codegen * add maxLength for tags, add description * validate post tags on write * add test * handle tags in indexer * add tags to postView, codegen * return tags on post thread view * format * revert formatting change to docs * use establish validation pattern * add changeset (cherry picked from commit 464b8074f726fa12b0dc9887add3537ae85b8055) * remove tags from postView, codegen * remove tags from thread view * revert unused changes * Version packages (bluesky-social#1664) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * merge * Reverse order of blocks from sync.getRepo (bluesky-social#1665) * reverse order of blocks from sync.getRepo * write to car while fetching next page * Add hashtag detection to richtext (bluesky-social#1651) * add tag detection to richtext * fix duplicate tag index error * add utils * fix leading space index failures, test for them * add changeset * Version packages (bluesky-social#1669) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * proposed new search lexicons (bluesky-social#1594) * proposed new search lexicons * lexicons: lint * lexicons: fix actors typo * lexicons: camelCase bites again, ssssss * lexicons: add 'q' and mark 'term' as deprecated for search endpoints * codegen for search lexicon updates * bsky: prefer 'q' over 'term' in existing search endpoints * search: bugfix * lexicons: make unspecced search endpoints return skeleton obj * re-codegen for search skeleton obj * Disable pds appview indexing (bluesky-social#1645) * rm indexing service * remove message queue & refactor background queue * wip * remove all canProxyReadc * finish cleanup * clean up tests * fix up tests * fix api tests * fix build * fix compression test * update image tests * fix dev envs * build branch * wip - removing labeler * fix service file * remove kysely tables * re-enable getPopular * format * cleaning up tests * rm unused sharp code * rm pds build * clean up tests * fix build * fix build * migration * tidy * build branch * tidy * build branch * small tidy * dont build * Refactor PDS appview routes (bluesky-social#1673) move routes around * Strip leading `#` from from detected tag facets (bluesky-social#1674) ensure # is removed from facets * Version packages (bluesky-social#1675) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Proxy search queries (bluesky-social#1676) * proxy search * tweak profile resp * fix admin.searchRepos * add mock mailer * Fix to daniel's MOCKERY of a mock mailer * Don't allow non-verified email updates until app feature is out (bluesky-social#1682) stricter updating email until app feature is out * changesets --------- Co-authored-by: Paul Frazee <[email protected]> Co-authored-by: bnewbold <[email protected]> Co-authored-by: Eric Bailey <[email protected]> Co-authored-by: Devin Ivy <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@bnewbold how are you running the search in the backend for posts? are you using pgsql FTS or something else? |
@viksit the backend search service is The actual index/engine is opensearch, the AWS fork of elasticsearch. |
First pass at post search and backend skeleton Lexicons to support search iteration.
The existing searchActors and searchActorsTypeahead are slightly tweaked (default result size, descriptions), in a way I think is safe and backwards compatible.
A bit of tension around how similar to keep searchActors and searchPosts. I think having the query param be
q
for search is really ingrained in API design, so I went with an inconsistent value there. Could do searchProfiles instead? Or maybe sticking withterms
(like searchActors) is the right move.I didn't end up using the exact
app.bsky.feed.defs#skeletonFeedPost
from feed skeletons, just a list of ATURIs, because we don't have a "reason" in search.I'd like us to stick with limit/offset queries in to opensearch (elasticsearch), and not do full scroll cursors. This can be done with the existing cursor setup, by having the search service stick an offset+limit number in the cursor string, i'm just mentioning it as a "what are we trying to provide with this searchPosts endpoint".
It is relatively cheap for opensearch to do limit/offset up to a result set of a few thousands hits. The cursor/scroll mode allows fast scrolling through the entire index (eg, billions of docs), but has per-cursor overhead in most situations and we don't want that. Basically, this API should support common search cases, and not be a defacto public API for enumerating our full index (folks who want to do deep investigation/research should run their own mirror cluster/index; we can access our opensearch cluster directly internally if we want that).
I'm guessing that we will add additional params to these over time, but want to start relatively simple.