Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

“Libin diagram” Contribution Flows #837

Closed
ryscheng-mobile opened this issue Feb 9, 2024 · 11 comments · Fixed by #2541
Closed

“Libin diagram” Contribution Flows #837

ryscheng-mobile opened this issue Feb 9, 2024 · 11 comments · Fixed by #2541
Assignees

Comments

@ryscheng-mobile
Copy link
Contributor

What is it?

The ability for a project’s ecosystem to understand, in detail, how new users are entering and exiting their open source community / user dependency graph.

  • Over-time graph visualizations for the high and low value user flows into a project (+ growth rates); see Appendix for examples of visualizations.
  • The number of new users (denoted by ach of devs and dependent repos) + growth rate
  • A “user contribution / value” metric to understand the overall value of an individual’s contribution to the project
  • The number of bounced users (devs or dependent repos - i.e. those who contribute for 1 week, then stop)
  • An exportable list of new users and bounced users, sorted and scored by overall contribution
  • Information about activity around major events – during the event, and the weeks following (i.e, hackathons)
  • The number of new issues or PRs + growth rate
@github-project-automation github-project-automation bot moved this to Backlog in OSO Feb 9, 2024
@ryscheng-mobile
Copy link
Contributor Author

Screenshot 2024-02-09 10 20 48
Screenshot 2024-02-09 10 20 35
Screenshot 2024-02-09 10 20 21

@ryscheng-mobile
Copy link
Contributor Author

Suggested steps:
Suggest:

  • Start with accessing our data on BigQuery. Create a Jupyter notebook that does this as a 1-off for a particular repo
  • Create a dbt pipeline in OSO, leveraging the OSO schema to gather data for all repos in a project.
  • Plumb this up into the front-end, so that you can see a Libin diagram for any OSO project.

@ccerv1 ccerv1 modified the milestones: (f) Collection/Project/Artifact Pages, (f) PLN Milestone 2 Apr 9, 2024
@ryscheng ryscheng modified the milestones: (f) PLN Milestone 2, (f) PLN Milestone 3 Jun 12, 2024
@ryscheng ryscheng modified the milestones: (f) PLN Milestone 3, (c) PLN Milestone 2/3, (c) PLN Milestone 1 Jul 26, 2024
@ryscheng ryscheng assigned ryscheng and unassigned innoobijr Aug 27, 2024
@ryscheng ryscheng assigned ravenac95 and unassigned ryscheng Sep 8, 2024
@ryscheng
Copy link
Member

ryscheng commented Sep 8, 2024

Question for @ravenac95

@ccerv1 and I were just talking about this one, and I think we need some help with the metrics rolling window factory to support it.
I think there are actually 3 rolling windows at play here:

  1. The classification rolling window (e.g. a developer needs to have events in 10 of 30 days to be considered fulltime)
  2. The counting rolling window (e.g. we want to know how many active developers there were in the last 6 months)
  3. The comparison rolling window (e.g. across the last 2x 6-month periods --- how many users went from part-time to full-time, or part-time to churned, etc)

I think right now we only assume a single rolling window, is that correct?

@ravenac95
Copy link
Member

ohhhh ya interesting, we do currently assume 1, but ya I'll need to think how we can combine things so we can depend on some of these other rolling windows. This seems to be rolling window queries on rolling windows.

@ravenac95
Copy link
Member

This changes how I was thinking of things because I was trying to constrain the collection/project automatic creation a bit. Let me think on this!

@ravenac95
Copy link
Member

ravenac95 commented Sep 8, 2024

Actually so what i was thinking in terms of changes was to do something like this:

timeseries_metrics(
    model_prefix="timeseries",
    metric_queries={
        # This will automatically generate star counts for the given roll up periods. 
        # A rollup is just a simple addition of the aggregation. So basically we 
        # calculate the daily rollup every day by getting the count of the day. 
        # Then the weekly every week by getting the count of the week and
        # monthly by getting the count of the month. 
        # Additionally this will also create this along the dimensions (entity_types) of 
        # project/collection so the resulting models will be named as follows
        # `metrics.timeseries_stars_to_{entity_type}_{rollup}`
        "stars": MetricQueryDef(
            ref="stars.sql",
            rollups=["daily", "weekly", "monthly"],
            entity_types=["artifact", "project", "collection"], # This is the default value
        ),
        # This defines something with a rolling option that allows you to look back 
        # to some arbitrary window. So you specify the window and specify the unit. 
        # The unit and the window are used to pass in variables to the query. So it's 
        # up to the query to actually query the correct window. 
        # The resultant models are named as such
        # `metrics.timeseries_active_days_to_{entity_type}_over_{window}_{unit}`
        "active_days": MetricQueryDef(
            ref="active_days.sql",
            rolling={
                "windows": [30, 60, 90],
                "unit": "day",
                "cron": "0 0 1 */6 *", # This determines how often this is calculated
            }
        ), 
    },
    default_dialect="clickhouse",
)

I think this setup should give us the flexibility to be able to do the window of windows without having to build much additional craziness i think?

@ryscheng
Copy link
Member

I will do some updating of docs for this but this is an example of a "third order" rolling window: https://github.com/opensource-observer/oso/blob/main/warehouse/metrics_mesh/oso_metrics/change_in_developers.sql

It is derived originally from the active days rolling window, which then uses the developer classifications (active, part-time, full-time) and then compares the last two intervals associated with that metric... it's ergonomically clunky but it works at the moment

@ryscheng
Copy link
Member

I imagine 4 states to start:

  • first time
  • part time
  • full time
  • churned

@Jabolol Jabolol moved this from Up Next to In Progress in OSO Nov 27, 2024
@ryscheng
Copy link
Member

ryscheng commented Dec 2, 2024

This is implemented as of #2541
Closing this out, there are some QA issues that we need to follow up on

@ryscheng ryscheng closed this as completed Dec 2, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in OSO Dec 2, 2024
@ccerv1
Copy link
Member

ccerv1 commented Dec 10, 2024

From Observable

Github Developers Building on our Stack
To make these measurements precise - we can define each type of user and compute the change in each arrow on a weekly basis. There are a lot of ways to define these users for our protocols (by developer, by project, by node, etc), but we can use Github metrics (which are readily available) on developer activity and repo dependence to give us a strong proxy:

Never used: anyone who has never contributed to a github project/repo built on our stack (While this is a large population, we are clearly targeting sub-areas of this market first - aka web3 developers who are building apps, tools, etc on our stack)

First time users: a first-time developer contributing to a github project/repo built on our stack (Note, this means that first time users need to do more than install a binary - they need to build something on it (websites should count!))

High-value users: a frequent (~>5x/week?) contributor to projects/repos built on our stack (ipfs+filecoin) (Examples of high-value users should include: Textile devs, Audius devs, Fleek devs, ENS devs, Infinite Scroll devs, Anytype devs, Valist devs, Infura devs, etc)

Low-value users: a developer occasionally contributing to projects/repos built tangentially on our stack (Maybe they only depend on a small lib, or the dependence is very tangential to their core offering, etc)

Inactive users: a lapsed developer who is no longer actively contributing to a project built on our stack (either because the project is defunct, they stopped contributing, etc)

Top KPIs
(todo - each project team help populate these numbers)

High-Value Users:
New High-Value Users:
Lapsed High-Value Users:
First Time Users:
Bounced Users:
Weekly User Model (by arrow)
(todo - each project team help populate these numbers)

First Time Users: (this week), (last week), (growth rate)
Bounced Users: (this week), (last week), (growth rate)
New Low-Value Users: (this week), (last week), (growth rate)
New High-Value Users: (this week), (last week), (growth rate)
Reactivated Low-Value Users: (this week), (last week), (growth rate)
Upleveled High-Value Users: (this week), (last week), (growth rate)
Downleveled Low-Value Users: (this week), (last week), (growth rate)
Lapsed Low-Value Users: (this week), (last week), (growth rate)
Reactivated High-Value Users: (this week), (last week), (growth rate)
Lapsed High-Value Users: (this week), (last week), (growth rate)
Never Used: (this week), (last week), (growth rate)
High-Value Users: (this week), (last week), (growth rate)
Low-Value Users: (this week), (last week), (growth rate)
Inactive Users: (this week), (last week), (growth rate)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

6 participants