Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scraper, project: Implement automated greylisting (tracking PR for WIP) #2778

Draft
wants to merge 4 commits into
base: development
Choose a base branch
from

Conversation

jamescowens
Copy link
Member

@jamescowens jamescowens commented Oct 2, 2024

This PR is for tracking the progress of implementing automated greylisting.

Please see the following set of notes for design considerations that need to be discussed.

  • Basic manual greylisting and scraper machinery for determining automatic greylisting complete
      • Manual greylisting is an administrative contract type that rides normal transactions
        • Scrapers will now collect statistics on projects that have a greylisted status of either AUTO_GREYLISTED or MANUAL_GREYLISTED. Credits and average credits will be recorded in the project payloads, but the project magnitude will be zero, and they will not contribute to CPID magnitude
        • The Scraper code does not deal DIRECTLY with greylisting rules as this is an individual node responsibility
        • The convergence rules in terms of required number of projects use ACTIVE projects and do not include greylisted projects even though the stats are being collected for greylisted projects. This is because a project may be greylisted and available or literally not available at all, so convergences cannot be always expected at the project level for greylisted projects.
  • TODO
    • Wire up automatic greylisting
      • These exist along superblock boundaries, like superblocks themselves (claims with a valid superblock contract) and beacon activation
      • Need to compute ZCD and WAS
      • ZCD rule is <= 7 zero credit days out of 40, WAS rule is last 7 days average project credits / 40 days average project credits >= 0.1
      • Since this needs to be on superblock boundaries, we can slightly change the rules for implementation to be in superblocks rather than days. Since almost all of the time superblocks are very close to one day, this is almost the same.
      • Implies an algorithm that operates over 40 days of superblock history
      • No stats for a whitelisted project in a superblock (i.e. because the project is hard down) needs to be counted in ZCD, with zero entry for WAS averaging, even though last project convergence may be from the 48 hour stats carryover. This may require a tweak to the scraper convergence code
      • Choice of
        • stateless methods that repeatedly iterate over 40 superblock history to apply rules
        • methods over a cache structure that stores a subset of information from up to 40 superblocks relevant to the rule computation
        • Advantage of stateless is simplicity
        • Disadvantage is that it is fairly expensive, as the superblock registry has to be iterated over and processed – this involves disk I/0.
        • Advantage of caching is speed
        • Disadvantage of caching is complexity
        • Once client is synced this is only called when the superblock is staked and processed by the client, ~1 per 24 hours.
        • During sync will be approx 1 per 960 blocks
      • Need to define order of precedence of manual greylist versus automatic. Status of manual greylist must always override automatic as the whole point to manual greylisting once this is put into place is to deal with corner case issues that are not handled by the ZCD and WAS rules.
        • Manual greylisting is granular to each block (i.e. an administrative contract of type project)
        • Automatic greylisting is granular to the superblock (valid staked superblock claim)
        • Walking this through…
          • Example 1

block → MAN_GREYLISTED

superblock → AUTO_GREYLISTED → status still MAN_GREYLISTED

superblock → removed from AUTO_GREYLISTED → status still MAN_GREYLISTED

block → removal from MAN_GREYLISTED → ACTIVE

          • Example 2

block → MAN_GREYLISTED

superblock → AUTO_GREYLISTED → status still MAN_GREYLISTED

block → removed from MAN_GREYLISTED → AUTO_GREYLISTED

superblock → removed from AUTO_GREYLISTED → ACTIVE

        • I think this means we have to do the cache. The most convenient way to deal with this order of precedence is to store the underlying AUTO_GREYLIST status in the cache and have methods to utilize this information
          - Have status in cache of something like AUTO_GREYLIST_QUAL which means project has met the conditions for AUTO_GREYLIST by the rules, but was already MAN_GREYLISTED
          - This would be checked for each contract injection to change MAN_GREYLIST status, to decide whether new status is either AUTO_GREYLIST or ACTIVE
          - The AUTO_GREYLIST_QUAL is a flag on the project at the current (head) state
          - Maybe this really belongs in the in memory superblock structure? This does not need to be in the on-disk (chain) superblock structure at the cost of some computations.
          - There is an existing superblock cache (SuperblockIndex g_superblock_index) that currently stores the last three superblocks and could be expanded to 40 superblocks as an easy way to do the cache. This means more computation on top of the cache but much faster because it operates on in memory structures rather than reading from the superblock registry (disk I/O). It also means more memory usage.
          - Maybe best to modify the cache to be a hybrid and store more limited information for superblocks 4 – 40. But this makes the cache updating more complicated.
          - The memory usage of the additional superblocks is minimal compared to the current size of other data structures with the current active beacon count and chain length; however, when the benefactor contracts are implemented, this will no longer be true.
      • Create more detailed automated greylist reporting
          • Simple listing status on project not sufficient, because users will want to know the details of why a project is greylisted (this is ZCD/WAS reporting)
            • Probably should be something that operates on the project grid in the GUI and allows “clicking” the project whitelisting status and then having a pop-up window that displays the details of ZCD/WAS.

@jamescowens jamescowens self-assigned this Oct 2, 2024
@jamescowens jamescowens added this to the Natasha milestone Oct 2, 2024
@jamescowens jamescowens force-pushed the implement_greylist branch 2 times, most recently from 715dba6 to 11cbd3b Compare October 6, 2024 20:19
@div72
Copy link
Member

div72 commented Oct 6, 2024

Scrapers will now collect statistics on projects that have a greylisted status of either AUTO_GREYLISTED or MANUAL_GREYLISTED.

Makes sense for automatic greylisted projects for de-greylisting, but why are statistics collected for manually greylisted projects? The greylister can then operate on projects with either ACTIVE or AUTO_GREYLISTED status.

Also considering the WAS & ZCD calculation is done per day, I am not sure if bothering with caching is worth it. Might be worthwhile to make some dumb implementation and do a benchmark.

Could adding a separate -projectnotify parameter be useful? I've been thinking about making a mailing list for new polls, adding project state changes doesn't sound bad.

@jamescowens
Copy link
Member Author

jamescowens commented Oct 7, 2024

Excellent question. Depending on the reasons for the manual greylist, statistics may still be available for a project. If so they should continue to be collected, because the ZCD and WAS rules would then apply if the manual greylist status was removed.

What are you thinking in terms of functionality for the -projectnotify parameter?

@div72
Copy link
Member

div72 commented Oct 8, 2024

If so they should continue to be collected, because the ZCD and WAS rules would then apply if the manual greylist status was removed.

Good point.

block → removal from MAN_GREYLISTED → ACTIVE

Manual greylisting should instantly take effect, but should ungreylisting do so too? It should be simpler to make manual ungreylisting put the project in an auto greylist state. It'll take until the next superblock for the project to become active but that's ok imo. I'm imagining a FSM like this:

                              ⣀⡤⠤⠒⠒⠒⠉⠉⠉⠉⠉⠉⠙⠒⠒⠢⠤⣄⡀                         
                            ⡴⠋⠁                    ⠉⠳⡄                       
                            ⢣⣀    AUTO_GREYLIST    ⢀⣠⠇                       
                             ⠈⠙⠒⠤⠤⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⡠⠤⠴⠚⠉                         
                          ⠠⠤⣔⠶⢢               ⡠⣤⢄⣀⡀                       
                        ⢀⡠⠔⠉   ⠁             ⠈  ⠑⠢⡀                       
                     ⠘⠴⠮⠥⠤                         ⠈⠑⠤⡀                      
   ⣀⣠⠤⠤⠖⠒⠒⠒⠒⠒⠒⠒⠒⠒⠲⠤⠤⣄⣀                               ⣈⣑⢄⡀⡔                          
⣠⠖⠋⠁                   ⠈⠙⠲⣄                              ⣉⡭⠤⠖⠒⠒⠒⠒⠦⠤⣄⡀                          
⡇      MANUAL_GREYLIST     ⢸   ⠒⠾⣛⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒  ⢸⡁  ACTIVE ⢀⡹                           
⠙⠲⢤⣀                   ⣀⡤⠖⠋                              ⠈⠙⠒⠒⠒⠒⠒⠒⠒⠚⠉                           
   ⠈⠉⠙⠒⠒⠲⠤⠤⠤⠤⠤⠤⠤⠖⠒⠒⠋⠉⠁                                               

What are you thinking in terms of functionality for the -projectnotify parameter?

Similar to other notify commands. It should be triggered on project status changes(added to whitelist, removed, greylisted etc.), should call a script with the contract hash(or superblock hash in case of an automatic greylist).

@jamescowens
Copy link
Member Author

That is a good simplification actually.

@jamescowens jamescowens force-pushed the implement_greylist branch 2 times, most recently from 6dbd145 to 23e6b12 Compare November 25, 2024 01:00
Also implement corresponding -projectnotify startup parameter that
provides the ability to run a cmd based on a change to the whitelist.
This adds a boolean argument to the listprojects RPC function. This
boolean defaults to false, which only lists active projects. When
true, it will list projects of every status.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants