Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Ecosystem registry infrastructure #68

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Conversation

dwhswenson
Copy link
Member

This PR will add in the registry page for tools in the OpenFE ecosystem. The main idea is that we want a centralized listing of plugins/packages in the OpenFE ecosystem, so that we can help users find contributions (both external and internal).

Users would browse the ecosystem page, and find the tool they want. In addition to a short summary on a collective ecosystem page, each tool will have its own full page, containing a box with details and a free-form documentation. Contributors can add their tool to the ecosystem page by adding a simple document (Markdown with YAML front matter) to this repository via pull request.

Screenshots of initial versions of each (first at 1920x1080, then at 414x736 (iPhone 8+), all generated with Safari's responsive design mode):
Main page, desktop
Plugin page, desktop

Main page, mobile Plugin page, mobile

Inspiration largely taken from PyPI, MDAKits, and the E-CAM software library. Target for this PR is a basic ecosystem setup, with a set of fields to include in the YAML front matter.

Questions to resolve (feedback desired):

  • Are pages per-package or per-plugin? Example: LOMAP contains plugins for both an atom mapper and a network planner. Do we add a page for each (current implementation) or just a single page for all things LOMAP? A page for each is more work for a contributor, but (IMO) a better experience for a website visitor (at least, with straightforward implementation), which is why I've started with that approach.
  • What fields do we want to include in the YAML front matter? I've added a few obvious ones, but there's definitely room for discussion on what should be required vs. optional.
  • How do we want to handle authors? I don't see a perfect solution here. It would be great be able to have a page for each author, showing all their contributions, but that needs some kind of author registration (or else it is very error prone). For now, contributor name is just a string, and we don't have a way to group by contributor.
  • Do we want to enable some kind of ordering? For example, we might want all tools maintained by OpenFE to come first.

Outside scope of this PR, but planned future work:

  • Script to generate a draft of the contribution for a given plugin based on package metadata (probably obtained from an installed package with importlib.metadata; other options are to parse pyproject.toml/setup.cfg or to obtain from PyPI JSON files.)
  • Fill in content for the current tools in the ecosystem.
  • (Possible future work) It would be cool if there were more ways to explore the existing ecosystem tools. For example, I can imagine wanting to see things sorted by newest additions, or to see packages that have recently updated or recently released. External information like that could be probably be scraped in a nightly cron job.

So far, this isn't actually linked from the navigation or from any pages. However, the ecosystem landing page can be seen at the ecosystem/ path when you run locally. I think that linking it from the main site should be beyond the scope of this PR (wait until we have some real content).

@richardjgowers
Copy link
Contributor

This looks really cool.

re: pages per package or per item (Mapper/Protocol/etc) - both? I can imagine that you might want to browse for projects, or you might want to browse for Mappers.

Could we have a single feedstock of data (one file per project, submitted by project authors to this repo) and have either view possible via jinja templating? So one page with per-project listing (which maybe has a quick blurb linking to a further full static page for the project) and one page with items grouped together like your current ecosystem view (which is populated by jinja walking all projects and grabbing relevant items)

@dwhswenson
Copy link
Member Author

I think we at least want a separate short blurb to show on the cards for each plugin. Think, for example, how the cards will look for FEFlow, which may have several protocols only distinguished by their names (and which would all be listed side-by-side). With the same blurb for all, it would be pretty uninformative. However, it's fine if the long description (only shown on the project page) is shared by all.

This would also require separate plugin and project name fields. That is potentially confusing for a project that only contains one plugin. Example:

title: My Project
atommappers:
  - name: My Project   # why do I have to say this again?
    description: >-
      Plugin short description
---
Long text description.

This also adds some extra complexity since the list of plugins needs to actually be a proper list (easy YAML mistake to make, and most likely to be relevant to the less experienced plugin contributors, who are more likely to only have one plugin in their package).

None of these are necessarily sticking points (in the future, we should have a GitHub action that gives clear error messages for any non-conforming YAML), but it is worth pointing out the trade-offs before I put in the effort to refactor this code.

@richardjgowers
Copy link
Contributor

Would something like this work:

project_title: Kartograf  # probably the conda-forge package name too
project_description: Long description of the package as a whole
atommappers:
  - name: KartografAtomMapper   # class name of the mapper
    description:   Plugin short description
scorers:
  - name: RMSDScorer
    description: About the RMSD scorer

Where we could probably scrape the names and descriptions from the exported plugins for a package and their __init__ lines

@@ -0,0 +1,36 @@
---
name: LOMAP Atom Mapper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where e.g. the Lomap2 package is providing, LomapAtomMapper, lomap_default_score and lomap_network_planner. This should be all visible on one page, but also in the categorised list too

@dwhswenson
Copy link
Member Author

As far as I can tell, the only differences between your proposal and mine are:

  1. Stick project_description as a YAML front matter instead of just using the available long-form markdown. I disagree with this. Let's allow Jekyll to do what Jekyll does well. (This would default to long_description, which is typically the contents of README.md.)
  2. project_title instead of title. Again, the name title is a choice that makes other things easier: it is already used by Jekyll elsewhere in the theme. Otherwise we either need to add a lot more HTML to duplicate the theme's _layouts/page.html instead of re-using it in our more minimal _layouts/ecosystem-entry.html.
  3. Requiring that the plugin name be the class name. Do we need to make this a requirement? It certainly could be the name, and possibly even that's what gets filled in by any script we write later to autogenerate these, but I don't see a need to require that.

Are there any other differences between what you're proposing and what I suggested in #68 (comment)?

@richardjgowers
Copy link
Contributor

Yeah we're agreeing 99% of the way once it's one file per package and you have two views of the data from that file.

@richardjgowers
Copy link
Contributor

Rereading this, re

  1. I didn't realise your stubs were already pages. Is jinja able to walk existing Project pages to create the per-category listings? Is this what this code is doing? https://github.com/OpenFreeEnergy/openfreeenergy.github.io/pull/68/files#diff-1a4b68348682f6c74ae84bd89a84ab1de1c38594681e6afe59060b7f83986262R8

  2. Sure, I'm just using project_title to disambiguate it from plugin_title, but they're just labels so it's slightly arbitrary

  3. Ideally the name listed on the plugin registry website is the thing they can write into the CLI that will pick up that particular component. This will probably be the class name, but we also have the possibility of having name aliases as required.

@dwhswenson
Copy link
Member Author

I didn't realise your stubs were already pages. Is jinja able to walk existing Project pages to create the per-category listings?

Yeah, this is a full working website, not a mock-up. It is actually easier to do the work than to create a mock-up.

For "existing Project pages," do you mean the projects in _projects/? I think we should treat these as two different collections, because they have very different purposes (and carry fundamentally different data).

I spent yesterday putting the page-per-project approach together in the _ecosystem collection, though. See current implementation of ecosystem.html for looping over projects to get all the individual plugins. There's more to be done on the individual project pages now (at minimum some CSS improvements). However, the main ecosystem page looks identical except that now the file _ecosystem/lomap.md provides a card for both an atom mapper and a network planner.

It's a little annoying because we end up with a process that scales as n_projects * n_plugin_types, but since that happens once per site build (i.e., once per PR merge), I'm not too worried about performance optimization.

Side-note/reminder: You keep calling it jinja. It isn't jinja. If you google "how to do X with jinja," the answers you will find will not help you here. Jekyll and liquid are the names to look for. (Reminding because you've had this problem before.)

Ideally the name listed on the plugin registry website is the thing they can write into the CLI that will pick up that particular component. This will probably be the class name, but we also have the possibility of having name aliases as required.

This is the part that will be hard for us to enforce in any way. We can certainly practice it and recommend it, but we can only check that the name works if we download/import the plugin project as part of validation. I think that is out of scope for this tool (both in this PR and in the future).

Copy link
Contributor

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"existing projects" I meant the stubs inside _ecosystem.

This is looking good, I wouldn't worry about complexity.

I'm veering into things that could be future issues, but eventually Logos / graphical abstracts are something that would make individual ecosystem-project pages look more eye catching.

It sounds like we're settled on what the schema for the _ecosystem stubs are, - Should I populate the Lomap.md stub? Can we get @RiesBen to fill out the kartograf.md stub?

Comment on lines 23 to 26
NOTE: Each ecosystem catalog entry should fit into exactly one category. If
you've created a package that involves tools from multiple categories, please
create multiple catalog entries. One pull request may contain as many catalog
entries as you would like to add.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't correct any more right? one entry can list multiple tools

_config.yml Outdated
Comment on lines 42 to 47
- name: "atommappers"
label: "Atom Mappers"
- name: "networkplanners"
label: "Network Planners"
- name: "protocols"
label: "Protocols"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scorers too

@@ -0,0 +1,42 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This schema looks good to me. example.md and example2.md above now seem out of sync?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this comment: can you double-check what you're seeing? The changes to those in d0165cc should have brought them into the new format. Am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example.md has a category: field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed that! Unused though. (This is part of why it'll be good to have an action to check that the YAML matches the schema -- I started playing with that a bit, but don't plan to include it in this PR).

@@ -0,0 +1,29 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this page creates the grouped-by-category listing of all plugins right?

could we also have a page that is the grouped-by-project list of all projects and their contents?

So in my non-jinja code:

Here's all the different projects that provide plugins within the OpenFE ecosystem:

{% for project in site.ecosystem %}
<h3> project.title </h3> // links to the full page for this project?  i.e. the full rendering of the .md page?

project.short_description

{% for plugin in project %}
  emit some basic info on this tool
{% endfor %}
{% endfor %}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought about this, and yes, it would be straightforward to add such a page. But I had a couple doubts about the usefulness of that:

  1. I suspect that "all plugins organized by project on one page" won't be a super common view people want (at least compared to "all plugins organized by type" and "everything about a specific project", which are already implemented).
  2. The project page gives a quick overview (currently in the summary box) of all plugins in that project. I'm still working on how to make that maximally useful, but that's just a matter of tweaking the page we already have. So this is the easy way to answer the question "what are all the plugins for a given project"
  3. There's a question of where in the website to present this information. If there's only one page for all plugins, it's clear that this is linked from an "Ecosystem" tab in the main nav. If there are two pages, then do we add a secondary nav tab bar within the page to switch between them? That may start to get a little cluttered.

So yes, make that page would be easy. But I'm not entirely sure what we'd do with it once we have it. My thought is to leave it out of this PR and see if we find it needed later.

The one point this brings up is that I currently don't have a per-project short description. Each plugin needs a short description, but I don't require that for each project. This is something that can be added to the schema if we want, though.

@@ -0,0 +1,19 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs updating to the new schema I think. I'd also err on the side of more descriptive prompts inside each placeholder, e.g. <CATEGORY (one of Atom Mapper, Scorer, Planner, Protocol)>

@dwhswenson
Copy link
Member Author

I'm veering into things that could be future issues, but eventually Logos / graphical abstracts are something that would make individual ecosystem-project pages look more eye catching.

Already, everything below the --- is free-form markdown, so you can add images as well (I wanted to play with that a bit to determine where we want people to store those assets, assuming we take them into the repo, but that's just about maximizing ease of learning/maintenance -- it's certainly technically feasible already).

I do have an as-yet-unused "thumbnail" in the _default, which I imagined as a small logo to put on the card. But that might be either too busy (different image styles) or too repetitive (how many times will we see the OpenFE logo?).

It sounds like we're settled on what the schema for the _ecosystem stubs are, - Should I populate the Lomap.md stub? Can we get @RiesBen to fill out the kartograf.md stub?

I'm not sure that we actually have settled on the schema yet. I'd like a little more feedback on that -- I've updated the _default.md to represent the current schema (including some currently unused extensions that I think will be useful). I would definitely appreciate feedback from you and @RiesBen (and also @IAlibay, maybe?) on the fields we're currently using, and starting to fill these and (and see how they look when you build locally) would definitely be a start to that.

One schema question to consider: Currently we list plugins by category as:

atommappers:
  - name: LomapAtomMapper
    description: blah blah
networkplanners:
  - name: LomapNetworkPlanner
    description: blah blah

This could instead be represented as:

plugins:
  - name: LomapAtomMapper
    type: atommapper
    description: blah blah
  - name: LomapNetworkPlanner
    type: networkplanner
    description: blah blah

The second form is a little nicer computationally, but (1) not enough that it actually makes a significant difference, since I'm not terribly worried about build time; and (2) my intuition is that the current implementation is more straightforward for users, and less error-prone (in the alternate, you must spell each plugin type correctly each time, currently you only need to get it right once, for the section name).

But if you try it out and find that the other format seems better from a contributor point of view, it wouldn't be hard to change to that schema.

@richardjgowers
Copy link
Contributor

I've fine with either plugin schema above. Both make enough sense, we're not worried about computational complexity, we can catch typos in PRs since these plugin descriptions have to live centrally.

I'm happy to merge this as a bare-bones version, then we can see how it looks live and do a few passes of iterative tweaks. It's a little hard to completely review without building it locally which is a bit of a pain.

@dwhswenson
Copy link
Member Author

Yeah, this is still a semi-stealth-mode page (not linked from anywhere on the main website), so if you can't get run locally, merging it as is doesn't put the dirty laundry in public. The schema will only be locked when we start invited others to contribute here.

As you play with it, please do keep the two approaches for the plugin config in mind, and see if one feels like it would be much easier than the other. I don't have a sense for which is the better for users.

BTW, I've extended the project overview box to include more of the included metadata, but I haven't tested most of that yet. So expect some bumps still.

@RiesBen
Copy link
Contributor

RiesBen commented Dec 8, 2023

If you need an opinion on the plugins, :) I would prefer the second one, as I imagine you can be more flexible with those plugin settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants