Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discover] Filter out globally seen posts client-side #7145

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

gaearon
Copy link
Collaborator

@gaearon gaearon commented Dec 17, 2024

We have a tension between two wants:

  • We want Discover to avoid showing posts you've already seen
  • Actually sending every post you see (e.g. in Following) to Discover would be both too expensive on the server side to store and would have privacy implications if we ever wanted to extend this to third party feeds
    • But special-casing Discover forever, even though it is the primary feed in the app, isn't great either

The proposed solution here, which we arrived at after a discussion with @pfrazee and @whyrusleeping, is client-side deduplication. It's not perfect but it hits these points:

  • It'll avoid showing posts you've already seen on the mobile app during the same session
    • Could be extended to multiple sessions if we serialize the buffer (it doesn't need to survive long)
    • Could be shared between browsers tabs in the future as well (for now it's not)
  • It doesn't require any server computation
  • In the longer run, we could imagine third party feeds declaring they want this capability on. Then Discover wouldn't need to be special-cased. Most third party feeds probably don't want this so it would be opt-in.

Test Plan

Add a log, verify posts get added to the buffer. Currently they get added:

  • In any feeds, on linger
  • In post thread, immediately for highlighted post + parent chain (we don't count linger there)

You can verify that with a log.

yea.mov

Then you can verify that it does the filtering. For that, add a log to where the filtering happens and log the post being filtered out. Then spend some time scrolling Following. Then go to Discover and repeatedly request it — if you're lucky, you'll trigger the behavior. Observe that the filtered post is not being displayed.

Verify logged-out Discover still works.

Verify the filtering is not being applied to other feeds.

The Mechanism

I used a Bloom filter because it's cool. (?) No seriously, it's relatively light in memory (see the linked formula), otherwise a Set would be a lot larger. If we want to maintain continuity and serialize it or sync it between tabs, it'll get more complicated, but I think this is an OK start for now.

I'm doing this in the response handler rather than in the query or the feed tuner because it's not deterministic. The filter is mutable so we need to make sure we apply it once and consistently. Right before handing off the response seems like the perfect time.

@@ -1,6 +1,8 @@
import React from 'react'
import {AppState, AppStateStatus} from 'react-native'
import {AppBskyFeedDefs} from '@atproto/api'
// @ts-ignore
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

// store in a .d.ts file
declare module 'bloomfilter' {
  export class BloomFilter {
    buckets: Int32Array;

    /**
     * Create a new empty bloom filter of size m with hashes k or
     * provide buckets as a number[] or Int32Array to deserialize a bloom filter
     * @param mOrBucketsArray number of bits (will be rounded up to nearest 32), or buckets
     * to deserialize into a filled bloomfilter
     * @param k number of hashes
     */
    constructor(m: number | number[] | Int32Array, k: number);

    /**
     * Add a value to a bloom filter
     * @param value
     */
    add(value: any): void;

    /**
     * Test whether a value exists in a bloom filter. (False positives are
     * possible, false negatives are not.)
     */
    test(value: any): boolean;

    /**
     * Estimated cardinality.
     */
    size(): number;
  }
}

Based on https://www.npmjs.com/package/@types/bloomfilter#additional-details and https://github.com/jasondavies/bloomfilter.js/blob/master/bloomfilter.js#L72-L77

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just use @types/bloomfilter?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can, but then you miss size(): number.

Copy link

@SleeplessByte SleeplessByte Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think #7145 (comment) is right, and instead you'll want to "just" create a local typescript file, even, porting the 7 year old code. Maintain it locally.

Copy link

github-actions bot commented Dec 17, 2024

Old size New size Diff
6.82 MB 6.83 MB 2.49 KB (0.04%)

@nevercast
Copy link

This looks really good! Nice one!

I did have one thought: the bloomfilter library (file) is only 100 lines long, has no dependencies, and hasn't been updated in 7 years. Likely abandoned. I wonder if this would be a good candiate for vendoring rather than an npm dependency. Only other requirement would be including the BSD-3 licence (MIT compatible) perhaps at the top of the file.

Hope you don't mind my drive by comment, saw the mention of bloom filter+ discover feed and definitely had to have a look.

Cheers,
Josh

@gaearon gaearon force-pushed the hide-seen-elsewhere branch from b5995f5 to 1461add Compare December 18, 2024 20:29
@arcalinea arcalinea temporarily deployed to hide-seen-elsewhere - social-app PR #7145 December 18, 2024 20:29 — with Render Destroyed
@PapayaJackal
Copy link

I'm frustrated that the home button refreshes the timeline instead of adding new posts to the top. I'm just used to using Twitter that way, and on BlueSky when I click it, all the posts disappear for good. This change will only exaggerate the issue since I’ll never get a chance to see them again. 😞

@gaearon
Copy link
Collaborator Author

gaearon commented Dec 19, 2024

This is only relevant for the Discover feed. It was already filtering out already seen posts on Discover so this change has no effect on the issue you're describing.

I agree re: it would be nice to maintain continuity; we want to do this too but it's a bigger project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants