Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposed new search lexicons #1594

Merged
merged 11 commits into from
Sep 25, 2023
Merged

proposed new search lexicons #1594

merged 11 commits into from
Sep 25, 2023

Conversation

bnewbold
Copy link
Collaborator

First pass at post search and backend skeleton Lexicons to support search iteration.

The existing searchActors and searchActorsTypeahead are slightly tweaked (default result size, descriptions), in a way I think is safe and backwards compatible.

A bit of tension around how similar to keep searchActors and searchPosts. I think having the query param be q for search is really ingrained in API design, so I went with an inconsistent value there. Could do searchProfiles instead? Or maybe sticking with terms (like searchActors) is the right move.

I didn't end up using the exact app.bsky.feed.defs#skeletonFeedPost from feed skeletons, just a list of ATURIs, because we don't have a "reason" in search.

I'd like us to stick with limit/offset queries in to opensearch (elasticsearch), and not do full scroll cursors. This can be done with the existing cursor setup, by having the search service stick an offset+limit number in the cursor string, i'm just mentioning it as a "what are we trying to provide with this searchPosts endpoint".

It is relatively cheap for opensearch to do limit/offset up to a result set of a few thousands hits. The cursor/scroll mode allows fast scrolling through the entire index (eg, billions of docs), but has per-cursor overhead in most situations and we don't want that. Basically, this API should support common search cases, and not be a defacto public API for enumerating our full index (folks who want to do deep investigation/research should run their own mirror cluster/index; we can access our opensearch cluster directly internally if we want that).

I'm guessing that we will add additional params to these over time, but want to start relatively simple.

@bnewbold bnewbold requested review from devinivy and dholms September 13, 2023 22:28
@bnewbold bnewbold force-pushed the bnewbold/search-lexicons branch from db826ff to 12f3711 Compare September 13, 2023 22:47
bnewbold added a commit to bluesky-social/indigo that referenced this pull request Sep 15, 2023
Larger refactors in this branch:

- [x] local docker dev env documented
- [x] specify mappings (schemas) for post and profile indices
- [x] transform raw records in to the index schemas
- [x] different doc _id syntax
- [x] skip read+deserialization of records other than profile and post,
for efficiency
- [x] don't store records in database; database only used for firehose
cursor state
- [x] switch to informal /xrpc/app.bsky.unspecced.search*Skeleton
endpoints
- [x] return only skeleton responses (eg, AT-URI or DID lists)
- [x] handle non-success OpenSearch responses as errors
- [x] auto-create indices with schema when in indexing mode (not
READONLY) (with `go:embed` schemas)
- [x] switch logging to `log/slog`, including echo integration
- [x] use `atproto/identity` package for identity caching and handling,
not `User` database record
- [x] merged in backfill worker code
- [x] use `analysis-icu` plugin for (hopefully) better internationalized
search
- [x] special typeahead indexing and query parameter
- [x] basic/simple query string parsing, which should be safe, supports
quoted phrases, and `from:` filtering

This branch includes a couple small commits to SDK code, which i've
cherry-picked out as separate PRs for easier review.

See also Lexicon PR in atproto repo:
bluesky-social/atproto#1594

This is not compatible with the previous version of `palomar` at the
HTTP API, opensearch index, or database schema levels. The config vars
should be backwards compatible. The operational plan for staging and
prod is to deploy this as an entirely new environment (eg, "prod2",
"staging2"), get everything backfilled, and then flip over the AppView
and then client app to use the lexicons/endpoints instead of the older
version.

----

I think this is ready for review, merge, and deploy to staging. Some
things to check before prod:

- [ ] compare index size and performance to existing version/schema
- [ ] real-world testing of profile typeahead (eg, do we need fuzzy?)
- [ ] real-world search relevancy checks
- [ ] real-world CJK text analysis checks
(#302)

Out of scope for this PR:

- [ ] deal with `created_at` timestamp not being reliable, by adding a
`sort_at` hybrid field, for future "sort by date"
- [ ] instrumentation and metrics (Jaz to implement on top of this
branch)
- [ ] better bulk indexing performance, especially during backfill:
disable refresh during backfill? longer refresh window? bulk (batch)
indexing would be best
- [x] integrate a better identity service/cache; current is probably Ok
in context of backfill. or perhaps just bump the cache size to ~50k or
~100k identities in prod?
"type": "string",
"description": "search query string; syntax, phrase, boolean, and faceting is unspecified, but Lucene query syntax is recommended. For typeahead search, only simple term match is supported, not full syntax"
},
"typeahead": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to use a non-normative parameter for this? something like simple or quick?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, I feel like typeahead is pretty term-of-art for this query type (not just the term we are using). different sub-set of fields are queried, and only really works for prefix of a single token or two, character-by-character.

"simple" or "quick" would be confusing to me, i'd assume that they would return un-hydrated responses or something (responses are always hydrated).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool cool - i'm game for "typeahead" 👍

"limit": {
"type": "integer",
"minimum": 1,
"maximum": 100,
"default": 50
"default": 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice i dig both of these new defaults 👍

"parameters": {
"type": "params",
"properties": {
"term": { "type": "string" },
"term": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would support phasing out term and phasing in q

I think "term" makes sense when it's simple search on a word & no additional syntax is allowed

q fits nicely when we allow for special query syntax (like Lucene syntax)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, I updated all the instances of 'term' as a search query with 'q', and marked the old 'term' fields as "DEPRECATED" in description. This impacted an admin route as well. Idea is that for a short transition we'll fall back to "term" if "q" is empty, and maybe eventually nuke them (if/when we are feeling comfortable breaking query params in Lexicons in a small way, like a v1.0 release)

Copy link
Collaborator

@dholms dholms Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup that works 👍

@dholms
Copy link
Collaborator

dholms commented Sep 18, 2023

These look good and make a lot of sense to me 👍

Just chatting about a few things from the PR description:

A bit of tension around how similar to keep searchActors and searchPosts

I like the idea of keeping these relatively similar - I'm down with moving to q over term.

I didn't end up using the exact app.bsky.feed.defs#skeletonFeedPost from feed skeletons, just a list of ATURIs, because we don't have a "reason" in search.

One question is: will we ever want any extra metadata on an object for why it turned up in search? something like "trending down" or "you follow this user"? If so, we may not want to use #skeletonFeedPost but we could define a new container schema like #skeletonSearchPost that would give us the leeway to add additional metadata

I'd like us to stick with limit/offset queries in to opensearch (elasticsearch), and not do full scroll cursors. This can be done with the existing cursor setup, by having the search service stick an offset+limit number in the cursor string,

Makes sense to me 👍 And strong agree that we do not want to support full index pagination

@bnewbold
Copy link
Collaborator Author

w/r/t #skeletonSearchPost or similar: that does make sense as a good Lexicon design pattern, but I can't really think of any reason we would want that for search specifically in the skeleton stage. The views that get returned will have stuff like "do I follow this account" or "have I liked post" hydrated just like a feed, and also things like blocks and mutes will get enforced. The AppView has all that extra context, the search index is pretty dumb.

I'd lean towards keeping it simple (makes for simpler code), but don't feel super strongly and could add wrapper objects if you think we should be consistent.

@bnewbold bnewbold force-pushed the bnewbold/search-lexicons branch from b2319fd to 288d113 Compare September 19, 2023 01:05
@dholms
Copy link
Collaborator

dholms commented Sep 19, 2023

Makes sense - I think it's fine as is then 👍

@bnewbold
Copy link
Collaborator Author

Ran codegen, and naively prefer new q over term for older endpoints (but fallback to term if q is not set).

cc: @devinivy if you might have any last-minute Lexicon style vibes

Comment on lines +26 to +29
"cursor": {
"type": "string",
"description": "optional pagination mechanism; may not necessarily allow scrolling through entire result set"
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past it's proven useful to not support a cursor for typeahead search. Just wanted to note that in case it's relevant here, since both typeahead and cursor parameters can appear together.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the alternative be a second endpoint just for typeahead? That feels expansive to me. I think in typeahead mode we can safely ignore the cursor param and never populate that field in the output.

Comment on lines 43 to 49
"actors": {
"type": "array",
"items": {
"type": "string",
"format": "did"
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not terribly worried about it, but there could be some value in wrapping the did in an object, just so that there's an upgrade path in case we want to provide any additional/meta info with each result in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dan had the same thoughts in earlier review, which is strong enough signal that i'll add that to these endpoints.

Comment on lines 39 to 45
"posts": {
"type": "array",
"items": {
"type": "string",
"format": "at-uri"
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same goes here, there could be some value in wrapping the uri in an object so that we leave open an upgrade path to add additional/meta info about each result in the future.

Copy link
Collaborator

@devinivy devinivy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Left a few misc thoughts.

@bnewbold
Copy link
Collaborator Author

updated with skeletal objects, and re-codegen

@bnewbold bnewbold merged commit 233a132 into main Sep 25, 2023
10 checks passed
@bnewbold bnewbold deleted the bnewbold/search-lexicons branch September 25, 2023 23:56
dholms pushed a commit that referenced this pull request Sep 26, 2023
* proposed new search lexicons

* lexicons: lint

* lexicons: fix actors typo

* lexicons: camelCase bites again, ssssss

* lexicons: add 'q' and mark 'term' as deprecated for search endpoints

* codegen for search lexicon updates

* bsky: prefer 'q' over 'term' in existing search endpoints

* search: bugfix

* lexicons: make unspecced search endpoints return skeleton obj

* re-codegen for search skeleton obj
dholms added a commit that referenced this pull request Sep 27, 2023
* lexicons

* codegen

* email templates

* request routes

* impl

* migration

* tidy

* tests

* tidy & bugfixes

* format

* fix api test

* fix auth test

* codegen

* add unique constraint

* Add email confirmed to AtpSessionData

* interop test files (#1529)

* initial interop-test-files

* crypto: switch signature-fixtures.json to a symlink

* syntax: test against interop files

* prettier

* Update interop-test-files/README.md

Co-authored-by: Eric Bailey <[email protected]>

* disable prettier on test vectors

---------

Co-authored-by: Eric Bailey <[email protected]>
Co-authored-by: dholms <[email protected]>

* add getSuggestedFollowsByActor (#1553)

* add getSuggestedFollowsByActor lex

* remove pagination

* codegen

* add pds route

* add app view route

* first pass at likes-based suggested actors, plus tests

* format

* backfill with suggested_follow table

* combine actors queries

* fall back to popular follows, handle backfill differently

* revert seed change, update test

* lower likes threshold

* cleanup

* remove todo

* format

* optimize queries

* cover mute lists

* clean up into pipeline steps

* add changeset

* List feeds (#1557)

* lexicons for block lists

* reorg blockset functionality into graph service, impl block/mute filtering

* apply filterBlocksAndMutes() throughout appview except feeds

* update local feeds to pass through cleanFeedSkeleton(), offload block/mute application

* impl for grabbing block/mute details by did pair

* refactor getActorInfos away, use actor service

* experiment with moving getFeedGenerators over to a pipeline

* move getPostThread over to a pipeline

* move feeds over to pipelines

* move suggestions and likes over to pipelines

* move reposted-by, follows, followers over to pipelines, tidy author feed and post thread

* remove old block/mute checks

* unify post presentation logic

* move profiles endpoints over to pipelines

* tidy

* tidy

* misc fixes

* unify some profile hydration/presentation in appview

* profile detail, split hydration and presentation, misc fixes

* unify feed hydration w/ profile hydration

* unify hydration step for embeds, tidy application of labels

* setup indexing of list-blocks in bsky appview

* apply list-blocks, impl getListBlocks, tidy getList, tests

* tidy

* update pds proxy snaps

* update pds proxy snaps

* fix snap

* make algos return feed items, save work in getFeed

* misc changes, tidy

* tidy

* fix aturi import

* lex

* list purpose

* lex gen

* add route

* add proxy route

* seed client helpers

* tests

* mutes and blocks

* proxy test

* snapshot

* hoist actors out of composeThread()

* tidy

* tidy

* run ci on all prs

* format

* format

* fix snap name

* fix snapsh

---------

Co-authored-by: Devin Ivy <[email protected]>

* Improve xrpc server error handling (#1597)

improve xrpc server error handling

* Remove appview proxy runtime flags (#1590)

* remove appview proxy runtime flags

* clean up proxy tests

* getPopular hotfix (#1599)

dont pass all params

* Interaction Gating (#1561)

* lexicons for block lists

* reorg blockset functionality into graph service, impl block/mute filtering

* apply filterBlocksAndMutes() throughout appview except feeds

* update local feeds to pass through cleanFeedSkeleton(), offload block/mute application

* impl for grabbing block/mute details by did pair

* refactor getActorInfos away, use actor service

* experiment with moving getFeedGenerators over to a pipeline

* move getPostThread over to a pipeline

* move feeds over to pipelines

* move suggestions and likes over to pipelines

* move reposted-by, follows, followers over to pipelines, tidy author feed and post thread

* remove old block/mute checks

* unify post presentation logic

* move profiles endpoints over to pipelines

* tidy

* tidy

* misc fixes

* unify some profile hydration/presentation in appview

* profile detail, split hydration and presentation, misc fixes

* unify feed hydration w/ profile hydration

* unify hydration step for embeds, tidy application of labels

* setup indexing of list-blocks in bsky appview

* apply list-blocks, impl getListBlocks, tidy getList, tests

* tidy

* update pds proxy snaps

* update pds proxy snaps

* fix snap

* make algos return feed items, save work in getFeed

* misc changes, tidy

* tidy

* fix aturi import

* initial lexicons for interaction-gating

* add interactions view to post views

* codegen

* model bad reply/interaction check state on posts

* initial impl for checking bad reply or interaction on write

* omit invalid interactions from post thread

* support not-found list in interaction view

* hydrate can-reply state on threads

* present interaction views on posts

* misc fixes, update snaps

* tidy/reorg

* tidy

* split interaction gating into separate record in lexicon

* switch interaction-gating impl to use separate record type

* allow checking reply gate w/ root post deletion

* fix

* initial gating tests

* tighten gated reply views, tests

* reply-gating list rule tests

* allow custom post rkeys within window

* hoist actors out of composeThread()

* tidy

* update thread gate lexicons, codegen

* lex fix

* rename gate to threadgate in bsky, update views

* lex fix

* improve terminology around reply validation

* fix down migration

* remove thread gates on actor unindexing

* add back .prettierignore

* tidy

* run ci on all prs

* syntax

* run ci on all prs

* format

* fix snap

---------

Co-authored-by: Devin Ivy <[email protected]>

* order by `like.indexedAt` in app view (#1592)

* order by like.indexedAt

* use keyset for ordering

* simplify

* ok ok ok I get it now

* Update packages/bsky/src/api/app/bsky/feed/getActorLikes.ts

Co-authored-by: Daniel Holmgren <[email protected]>

---------

Co-authored-by: Daniel Holmgren <[email protected]>

* Remove default value for post table invalid attrs (#1601)

remove default value for post table attrs

* Version packages (#1602)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* update Bluesky PBLLC to PBC (Public Benefit Corporation) (#1600)

* Temporarily disable filtering `invalidReplyRoot`s (#1609)

temporarily disable invalidReplyRoot check

* fix syntax docs (#1611)

* Version packages (#1612)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Allow bypass on ratelimit ip (#1613)

allow bypass on ratelimit ip

* Write rate limits (#1578)

* get rate limit ip correctly

* add write rate-limits

* Tweak createSession rate limit key (#1614)

tweak create session rl key

* Filter preferences for app passwords (#1626)

filter preferences for app passwords

* Tweak rate limit setup for multi rate limit routes (#1627)

tweak rate limit setup for multi rate limit routes

* Remove zod from xrpc-server error handling (#1631)

remove zod from xrpc-server error handling check

* Enforce properties field on lexicon object schemas (#1628)

* add empty properites to thread gate schema fragments

* tweak lexicon type

* Add feed-vew and thread-view preferences (#1638)

* Add feed and thread preference lexicons

* Add feed-view and thread-view preference APIs

* Add changeset for new preferences  (#1639)

Add changeset

* Version packages (#1640)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Disable getAccountInviteCodes for app passwords (#1642)

disable getAccountInviteCodes for app passwords

* remove cruft packages (uri, nsid, identifier) (#1606)

* remove @atproto/nsid (previously moved to syntax)

* remove @atproto/uri (previously moved to syntax)

* remove @atproto/identifier (previously moved to syntax)

* bump lockfile to remove old packages

---------

Co-authored-by: Eric Bailey <[email protected]>

* api: update login/resumeSession examples in README (#1634)

* api: update login/resumeSession examples in README

* Update packages/api/README.md

Co-authored-by: Daniel Holmgren <[email protected]>

---------

Co-authored-by: Daniel Holmgren <[email protected]>

* small syntax lints (#1646)

* lint: remove unused imports and variables

* lint: prefix unused args with '_'

* eslint: skip no-explicit-any; ignore unused _var (prefix)

* eslint: explicitly mark ignores for tricky cases

* indicate that getPopular is deprecated (#1647)

* indicate that getPopular is deprecated

* codegen for deprecating getPopular

* tidy up package.json and READMEs (#1649)

* identity: README example and tidy

* tidy up package metadata (package.json files)

* updated README headers/stubs for several packages

* crypto: longer README, with usage

* syntax: tweak README

* Apply suggestions from code review

Co-authored-by: Eric Bailey <[email protected]>
Co-authored-by: devin ivy <[email protected]>

---------

Co-authored-by: Eric Bailey <[email protected]>
Co-authored-by: devin ivy <[email protected]>

* Improve the types of the thread and feed preferences APIs (#1653)

* Improve the types of the thread and feed preferences APIs

* Remove unused import

* Add changeset

* Version packages (#1654)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Disable pds appview routes (#1644)

* wip

* remove all canProxyReadc

* finish cleanup

* clean up tests

* fix up tests

* fix api tests

* fix build

* fix compression test

* update image tests

* fix dev envs

* build branch

* fix service file

* re-enable getPopular

* format

* rm unused sharp code

* dont build branch

* auto-moderator tweaks: pass along record URI, create report for takedown action (#1643)

* auto-moderator: include record URI in abyss requests

* auto-moderator: log attempt at hard takedown; create report as well

The motivation is to flag the event to mod team, and to make it easier
to confirm that takedown took place.

* auto-mod: typo fix

* auto-mod: bugfixes

* bsky: always create auto-mod report locally, not pushAgent (if possible)

* bsky: fix auto-mod build

* bsky: URL-encode scanBlob call

* Clear follow viewer state when blocking (#1659)

* clear follow viewer state when blocking

* tidy

* add `tags` to posts (#1637)

* add tags to post lex

* kiss

* add richtext facet and validation attrs

* add tag validation attrs to post

* codegen

* add maxLength for tags, add description

* validate post tags on write

* add test

* handle tags in indexer

* add tags to postView, codegen

* return tags on post thread view

* format

* revert formatting change to docs

* use establish validation pattern

* add changeset

(cherry picked from commit fcb6fe7)

* remove tags from postView, codegen

* remove tags from thread view

* revert unused changes

* Version packages (#1664)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* merge

* Reverse order of blocks from sync.getRepo (#1665)

* reverse order of blocks from sync.getRepo

* write to car while fetching next page

* Add hashtag detection to richtext (#1651)

* add tag detection to richtext

* fix duplicate tag index error

* add utils

* fix leading space index failures, test for them

* add changeset

* Version packages (#1669)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* proposed new search lexicons (#1594)

* proposed new search lexicons

* lexicons: lint

* lexicons: fix actors typo

* lexicons: camelCase bites again, ssssss

* lexicons: add 'q' and mark 'term' as deprecated for search endpoints

* codegen for search lexicon updates

* bsky: prefer 'q' over 'term' in existing search endpoints

* search: bugfix

* lexicons: make unspecced search endpoints return skeleton obj

* re-codegen for search skeleton obj

* Disable pds appview indexing (#1645)

* rm indexing service

* remove message queue & refactor background queue

* wip

* remove all canProxyReadc

* finish cleanup

* clean up tests

* fix up tests

* fix api tests

* fix build

* fix compression test

* update image tests

* fix dev envs

* build branch

* wip - removing labeler

* fix service file

* remove kysely tables

* re-enable getPopular

* format

* cleaning up tests

* rm unused sharp code

* rm pds build

* clean up tests

* fix build

* fix build

* migration

* tidy

* build branch

* tidy

* build branch

* small tidy

* dont build

* Refactor PDS appview routes (#1673)

move routes around

* Strip leading `#` from from detected tag facets (#1674)

ensure # is removed from facets

* Version packages (#1675)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Proxy search queries (#1676)

* proxy search

* tweak profile resp

* fix admin.searchRepos

* add mock mailer

* Fix to daniel's MOCKERY of a mock mailer

* Don't allow non-verified email updates until app feature is out (#1682)

stricter updating email until app feature is out

* changesets

---------

Co-authored-by: Paul Frazee <[email protected]>
Co-authored-by: bnewbold <[email protected]>
Co-authored-by: Eric Bailey <[email protected]>
Co-authored-by: Devin Ivy <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
mloar pushed a commit to mloar/atproto that referenced this pull request Nov 15, 2023
* proposed new search lexicons

* lexicons: lint

* lexicons: fix actors typo

* lexicons: camelCase bites again, ssssss

* lexicons: add 'q' and mark 'term' as deprecated for search endpoints

* codegen for search lexicon updates

* bsky: prefer 'q' over 'term' in existing search endpoints

* search: bugfix

* lexicons: make unspecced search endpoints return skeleton obj

* re-codegen for search skeleton obj
mloar pushed a commit to mloar/atproto that referenced this pull request Nov 15, 2023
* lexicons

* codegen

* email templates

* request routes

* impl

* migration

* tidy

* tests

* tidy & bugfixes

* format

* fix api test

* fix auth test

* codegen

* add unique constraint

* Add email confirmed to AtpSessionData

* interop test files (bluesky-social#1529)

* initial interop-test-files

* crypto: switch signature-fixtures.json to a symlink

* syntax: test against interop files

* prettier

* Update interop-test-files/README.md

Co-authored-by: Eric Bailey <[email protected]>

* disable prettier on test vectors

---------

Co-authored-by: Eric Bailey <[email protected]>
Co-authored-by: dholms <[email protected]>

* add getSuggestedFollowsByActor (bluesky-social#1553)

* add getSuggestedFollowsByActor lex

* remove pagination

* codegen

* add pds route

* add app view route

* first pass at likes-based suggested actors, plus tests

* format

* backfill with suggested_follow table

* combine actors queries

* fall back to popular follows, handle backfill differently

* revert seed change, update test

* lower likes threshold

* cleanup

* remove todo

* format

* optimize queries

* cover mute lists

* clean up into pipeline steps

* add changeset

* List feeds (bluesky-social#1557)

* lexicons for block lists

* reorg blockset functionality into graph service, impl block/mute filtering

* apply filterBlocksAndMutes() throughout appview except feeds

* update local feeds to pass through cleanFeedSkeleton(), offload block/mute application

* impl for grabbing block/mute details by did pair

* refactor getActorInfos away, use actor service

* experiment with moving getFeedGenerators over to a pipeline

* move getPostThread over to a pipeline

* move feeds over to pipelines

* move suggestions and likes over to pipelines

* move reposted-by, follows, followers over to pipelines, tidy author feed and post thread

* remove old block/mute checks

* unify post presentation logic

* move profiles endpoints over to pipelines

* tidy

* tidy

* misc fixes

* unify some profile hydration/presentation in appview

* profile detail, split hydration and presentation, misc fixes

* unify feed hydration w/ profile hydration

* unify hydration step for embeds, tidy application of labels

* setup indexing of list-blocks in bsky appview

* apply list-blocks, impl getListBlocks, tidy getList, tests

* tidy

* update pds proxy snaps

* update pds proxy snaps

* fix snap

* make algos return feed items, save work in getFeed

* misc changes, tidy

* tidy

* fix aturi import

* lex

* list purpose

* lex gen

* add route

* add proxy route

* seed client helpers

* tests

* mutes and blocks

* proxy test

* snapshot

* hoist actors out of composeThread()

* tidy

* tidy

* run ci on all prs

* format

* format

* fix snap name

* fix snapsh

---------

Co-authored-by: Devin Ivy <[email protected]>

* Improve xrpc server error handling (bluesky-social#1597)

improve xrpc server error handling

* Remove appview proxy runtime flags (bluesky-social#1590)

* remove appview proxy runtime flags

* clean up proxy tests

* getPopular hotfix (bluesky-social#1599)

dont pass all params

* Interaction Gating (bluesky-social#1561)

* lexicons for block lists

* reorg blockset functionality into graph service, impl block/mute filtering

* apply filterBlocksAndMutes() throughout appview except feeds

* update local feeds to pass through cleanFeedSkeleton(), offload block/mute application

* impl for grabbing block/mute details by did pair

* refactor getActorInfos away, use actor service

* experiment with moving getFeedGenerators over to a pipeline

* move getPostThread over to a pipeline

* move feeds over to pipelines

* move suggestions and likes over to pipelines

* move reposted-by, follows, followers over to pipelines, tidy author feed and post thread

* remove old block/mute checks

* unify post presentation logic

* move profiles endpoints over to pipelines

* tidy

* tidy

* misc fixes

* unify some profile hydration/presentation in appview

* profile detail, split hydration and presentation, misc fixes

* unify feed hydration w/ profile hydration

* unify hydration step for embeds, tidy application of labels

* setup indexing of list-blocks in bsky appview

* apply list-blocks, impl getListBlocks, tidy getList, tests

* tidy

* update pds proxy snaps

* update pds proxy snaps

* fix snap

* make algos return feed items, save work in getFeed

* misc changes, tidy

* tidy

* fix aturi import

* initial lexicons for interaction-gating

* add interactions view to post views

* codegen

* model bad reply/interaction check state on posts

* initial impl for checking bad reply or interaction on write

* omit invalid interactions from post thread

* support not-found list in interaction view

* hydrate can-reply state on threads

* present interaction views on posts

* misc fixes, update snaps

* tidy/reorg

* tidy

* split interaction gating into separate record in lexicon

* switch interaction-gating impl to use separate record type

* allow checking reply gate w/ root post deletion

* fix

* initial gating tests

* tighten gated reply views, tests

* reply-gating list rule tests

* allow custom post rkeys within window

* hoist actors out of composeThread()

* tidy

* update thread gate lexicons, codegen

* lex fix

* rename gate to threadgate in bsky, update views

* lex fix

* improve terminology around reply validation

* fix down migration

* remove thread gates on actor unindexing

* add back .prettierignore

* tidy

* run ci on all prs

* syntax

* run ci on all prs

* format

* fix snap

---------

Co-authored-by: Devin Ivy <[email protected]>

* order by `like.indexedAt` in app view (bluesky-social#1592)

* order by like.indexedAt

* use keyset for ordering

* simplify

* ok ok ok I get it now

* Update packages/bsky/src/api/app/bsky/feed/getActorLikes.ts

Co-authored-by: Daniel Holmgren <[email protected]>

---------

Co-authored-by: Daniel Holmgren <[email protected]>

* Remove default value for post table invalid attrs (bluesky-social#1601)

remove default value for post table attrs

* Version packages (bluesky-social#1602)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* update Bluesky PBLLC to PBC (Public Benefit Corporation) (bluesky-social#1600)

* Temporarily disable filtering `invalidReplyRoot`s (bluesky-social#1609)

temporarily disable invalidReplyRoot check

* fix syntax docs (bluesky-social#1611)

* Version packages (bluesky-social#1612)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Allow bypass on ratelimit ip (bluesky-social#1613)

allow bypass on ratelimit ip

* Write rate limits (bluesky-social#1578)

* get rate limit ip correctly

* add write rate-limits

* Tweak createSession rate limit key (bluesky-social#1614)

tweak create session rl key

* Filter preferences for app passwords (bluesky-social#1626)

filter preferences for app passwords

* Tweak rate limit setup for multi rate limit routes (bluesky-social#1627)

tweak rate limit setup for multi rate limit routes

* Remove zod from xrpc-server error handling (bluesky-social#1631)

remove zod from xrpc-server error handling check

* Enforce properties field on lexicon object schemas (bluesky-social#1628)

* add empty properites to thread gate schema fragments

* tweak lexicon type

* Add feed-vew and thread-view preferences (bluesky-social#1638)

* Add feed and thread preference lexicons

* Add feed-view and thread-view preference APIs

* Add changeset for new preferences  (bluesky-social#1639)

Add changeset

* Version packages (bluesky-social#1640)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Disable getAccountInviteCodes for app passwords (bluesky-social#1642)

disable getAccountInviteCodes for app passwords

* remove cruft packages (uri, nsid, identifier) (bluesky-social#1606)

* remove @atproto/nsid (previously moved to syntax)

* remove @atproto/uri (previously moved to syntax)

* remove @atproto/identifier (previously moved to syntax)

* bump lockfile to remove old packages

---------

Co-authored-by: Eric Bailey <[email protected]>

* api: update login/resumeSession examples in README (bluesky-social#1634)

* api: update login/resumeSession examples in README

* Update packages/api/README.md

Co-authored-by: Daniel Holmgren <[email protected]>

---------

Co-authored-by: Daniel Holmgren <[email protected]>

* small syntax lints (bluesky-social#1646)

* lint: remove unused imports and variables

* lint: prefix unused args with '_'

* eslint: skip no-explicit-any; ignore unused _var (prefix)

* eslint: explicitly mark ignores for tricky cases

* indicate that getPopular is deprecated (bluesky-social#1647)

* indicate that getPopular is deprecated

* codegen for deprecating getPopular

* tidy up package.json and READMEs (bluesky-social#1649)

* identity: README example and tidy

* tidy up package metadata (package.json files)

* updated README headers/stubs for several packages

* crypto: longer README, with usage

* syntax: tweak README

* Apply suggestions from code review

Co-authored-by: Eric Bailey <[email protected]>
Co-authored-by: devin ivy <[email protected]>

---------

Co-authored-by: Eric Bailey <[email protected]>
Co-authored-by: devin ivy <[email protected]>

* Improve the types of the thread and feed preferences APIs (bluesky-social#1653)

* Improve the types of the thread and feed preferences APIs

* Remove unused import

* Add changeset

* Version packages (bluesky-social#1654)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Disable pds appview routes (bluesky-social#1644)

* wip

* remove all canProxyReadc

* finish cleanup

* clean up tests

* fix up tests

* fix api tests

* fix build

* fix compression test

* update image tests

* fix dev envs

* build branch

* fix service file

* re-enable getPopular

* format

* rm unused sharp code

* dont build branch

* auto-moderator tweaks: pass along record URI, create report for takedown action (bluesky-social#1643)

* auto-moderator: include record URI in abyss requests

* auto-moderator: log attempt at hard takedown; create report as well

The motivation is to flag the event to mod team, and to make it easier
to confirm that takedown took place.

* auto-mod: typo fix

* auto-mod: bugfixes

* bsky: always create auto-mod report locally, not pushAgent (if possible)

* bsky: fix auto-mod build

* bsky: URL-encode scanBlob call

* Clear follow viewer state when blocking (bluesky-social#1659)

* clear follow viewer state when blocking

* tidy

* add `tags` to posts (bluesky-social#1637)

* add tags to post lex

* kiss

* add richtext facet and validation attrs

* add tag validation attrs to post

* codegen

* add maxLength for tags, add description

* validate post tags on write

* add test

* handle tags in indexer

* add tags to postView, codegen

* return tags on post thread view

* format

* revert formatting change to docs

* use establish validation pattern

* add changeset

(cherry picked from commit 464b8074f726fa12b0dc9887add3537ae85b8055)

* remove tags from postView, codegen

* remove tags from thread view

* revert unused changes

* Version packages (bluesky-social#1664)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* merge

* Reverse order of blocks from sync.getRepo (bluesky-social#1665)

* reverse order of blocks from sync.getRepo

* write to car while fetching next page

* Add hashtag detection to richtext (bluesky-social#1651)

* add tag detection to richtext

* fix duplicate tag index error

* add utils

* fix leading space index failures, test for them

* add changeset

* Version packages (bluesky-social#1669)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* proposed new search lexicons (bluesky-social#1594)

* proposed new search lexicons

* lexicons: lint

* lexicons: fix actors typo

* lexicons: camelCase bites again, ssssss

* lexicons: add 'q' and mark 'term' as deprecated for search endpoints

* codegen for search lexicon updates

* bsky: prefer 'q' over 'term' in existing search endpoints

* search: bugfix

* lexicons: make unspecced search endpoints return skeleton obj

* re-codegen for search skeleton obj

* Disable pds appview indexing (bluesky-social#1645)

* rm indexing service

* remove message queue & refactor background queue

* wip

* remove all canProxyReadc

* finish cleanup

* clean up tests

* fix up tests

* fix api tests

* fix build

* fix compression test

* update image tests

* fix dev envs

* build branch

* wip - removing labeler

* fix service file

* remove kysely tables

* re-enable getPopular

* format

* cleaning up tests

* rm unused sharp code

* rm pds build

* clean up tests

* fix build

* fix build

* migration

* tidy

* build branch

* tidy

* build branch

* small tidy

* dont build

* Refactor PDS appview routes (bluesky-social#1673)

move routes around

* Strip leading `#` from from detected tag facets (bluesky-social#1674)

ensure # is removed from facets

* Version packages (bluesky-social#1675)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Proxy search queries (bluesky-social#1676)

* proxy search

* tweak profile resp

* fix admin.searchRepos

* add mock mailer

* Fix to daniel's MOCKERY of a mock mailer

* Don't allow non-verified email updates until app feature is out (bluesky-social#1682)

stricter updating email until app feature is out

* changesets

---------

Co-authored-by: Paul Frazee <[email protected]>
Co-authored-by: bnewbold <[email protected]>
Co-authored-by: Eric Bailey <[email protected]>
Co-authored-by: Devin Ivy <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@viksit
Copy link

viksit commented Dec 14, 2023

@bnewbold how are you running the search in the backend for posts? are you using pgsql FTS or something else?

@bnewbold
Copy link
Collaborator Author

@viksit the backend search service is palomar, implemented in golang: https://github.com/bluesky-social/indigo/tree/main/cmd/palomar

The actual index/engine is opensearch, the AWS fork of elasticsearch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants