[WIP] Ecosystem registry infrastructure #68

dwhswenson · 2023-12-05T22:21:02Z

This PR will add in the registry page for tools in the OpenFE ecosystem. The main idea is that we want a centralized listing of plugins/packages in the OpenFE ecosystem, so that we can help users find contributions (both external and internal).

Users would browse the ecosystem page, and find the tool they want. In addition to a short summary on a collective ecosystem page, each tool will have its own full page, containing a box with details and a free-form documentation. Contributors can add their tool to the ecosystem page by adding a simple document (Markdown with YAML front matter) to this repository via pull request.

Screenshots of initial versions of each (first at 1920x1080, then at 414x736 (iPhone 8+), all generated with Safari's responsive design mode):

Inspiration largely taken from PyPI, MDAKits, and the E-CAM software library. Target for this PR is a basic ecosystem setup, with a set of fields to include in the YAML front matter.

Questions to resolve (feedback desired):

Are pages per-package or per-plugin? Example: LOMAP contains plugins for both an atom mapper and a network planner. Do we add a page for each (current implementation) or just a single page for all things LOMAP? A page for each is more work for a contributor, but (IMO) a better experience for a website visitor (at least, with straightforward implementation), which is why I've started with that approach.
What fields do we want to include in the YAML front matter? I've added a few obvious ones, but there's definitely room for discussion on what should be required vs. optional.
How do we want to handle authors? I don't see a perfect solution here. It would be great be able to have a page for each author, showing all their contributions, but that needs some kind of author registration (or else it is very error prone). For now, contributor name is just a string, and we don't have a way to group by contributor.
Do we want to enable some kind of ordering? For example, we might want all tools maintained by OpenFE to come first.

Outside scope of this PR, but planned future work:

Script to generate a draft of the contribution for a given plugin based on package metadata (probably obtained from an installed package with importlib.metadata; other options are to parse pyproject.toml/setup.cfg or to obtain from PyPI JSON files.)
Fill in content for the current tools in the ecosystem.
(Possible future work) It would be cool if there were more ways to explore the existing ecosystem tools. For example, I can imagine wanting to see things sorted by newest additions, or to see packages that have recently updated or recently released. External information like that could be probably be scraped in a nightly cron job.

So far, this isn't actually linked from the navigation or from any pages. However, the ecosystem landing page can be seen at the ecosystem/ path when you run locally. I think that linking it from the main site should be beyond the scope of this PR (wait until we have some real content).

richardjgowers · 2023-12-06T12:00:41Z

This looks really cool.

re: pages per package or per item (Mapper/Protocol/etc) - both? I can imagine that you might want to browse for projects, or you might want to browse for Mappers.

Could we have a single feedstock of data (one file per project, submitted by project authors to this repo) and have either view possible via jinja templating? So one page with per-project listing (which maybe has a quick blurb linking to a further full static page for the project) and one page with items grouped together like your current ecosystem view (which is populated by jinja walking all projects and grabbing relevant items)

dwhswenson · 2023-12-06T18:30:59Z

I think we at least want a separate short blurb to show on the cards for each plugin. Think, for example, how the cards will look for FEFlow, which may have several protocols only distinguished by their names (and which would all be listed side-by-side). With the same blurb for all, it would be pretty uninformative. However, it's fine if the long description (only shown on the project page) is shared by all.

This would also require separate plugin and project name fields. That is potentially confusing for a project that only contains one plugin. Example:

title: My Project
atommappers:
  - name: My Project   # why do I have to say this again?
    description: >-
      Plugin short description
---
Long text description.

This also adds some extra complexity since the list of plugins needs to actually be a proper list (easy YAML mistake to make, and most likely to be relevant to the less experienced plugin contributors, who are more likely to only have one plugin in their package).

None of these are necessarily sticking points (in the future, we should have a GitHub action that gives clear error messages for any non-conforming YAML), but it is worth pointing out the trade-offs before I put in the effort to refactor this code.

richardjgowers · 2023-12-06T20:20:15Z

Would something like this work:

project_title: Kartograf  # probably the conda-forge package name too
project_description: Long description of the package as a whole
atommappers:
  - name: KartografAtomMapper   # class name of the mapper
    description:   Plugin short description
scorers:
  - name: RMSDScorer
    description: About the RMSD scorer

Where we could probably scrape the names and descriptions from the exported plugins for a package and their __init__ lines

richardjgowers · 2023-12-06T20:21:22Z

_ecosystem/lomap-mapper.md

@@ -0,0 +1,36 @@
+---
+name: LOMAP Atom Mapper


where e.g. the Lomap2 package is providing, LomapAtomMapper, lomap_default_score and lomap_network_planner. This should be all visible on one page, but also in the categorised list too

dwhswenson · 2023-12-06T20:37:33Z

As far as I can tell, the only differences between your proposal and mine are:

Stick project_description as a YAML front matter instead of just using the available long-form markdown. I disagree with this. Let's allow Jekyll to do what Jekyll does well. (This would default to long_description, which is typically the contents of README.md.)
project_title instead of title. Again, the name title is a choice that makes other things easier: it is already used by Jekyll elsewhere in the theme. Otherwise we either need to add a lot more HTML to duplicate the theme's _layouts/page.html instead of re-using it in our more minimal _layouts/ecosystem-entry.html.
Requiring that the plugin name be the class name. Do we need to make this a requirement? It certainly could be the name, and possibly even that's what gets filled in by any script we write later to autogenerate these, but I don't see a need to require that.

Are there any other differences between what you're proposing and what I suggested in #68 (comment)?

richardjgowers · 2023-12-06T20:47:01Z

Yeah we're agreeing 99% of the way once it's one file per package and you have two views of the data from that file.

richardjgowers · 2023-12-07T13:22:48Z

Rereading this, re

I didn't realise your stubs were already pages. Is jinja able to walk existing Project pages to create the per-category listings? Is this what this code is doing? https://github.com/OpenFreeEnergy/openfreeenergy.github.io/pull/68/files#diff-1a4b68348682f6c74ae84bd89a84ab1de1c38594681e6afe59060b7f83986262R8
Sure, I'm just using project_title to disambiguate it from plugin_title, but they're just labels so it's slightly arbitrary
Ideally the name listed on the plugin registry website is the thing they can write into the CLI that will pick up that particular component. This will probably be the class name, but we also have the possibility of having name aliases as required.

dwhswenson · 2023-12-07T19:38:02Z

I didn't realise your stubs were already pages. Is jinja able to walk existing Project pages to create the per-category listings?

Yeah, this is a full working website, not a mock-up. It is actually easier to do the work than to create a mock-up.

For "existing Project pages," do you mean the projects in _projects/? I think we should treat these as two different collections, because they have very different purposes (and carry fundamentally different data).

I spent yesterday putting the page-per-project approach together in the _ecosystem collection, though. See current implementation of ecosystem.html for looping over projects to get all the individual plugins. There's more to be done on the individual project pages now (at minimum some CSS improvements). However, the main ecosystem page looks identical except that now the file _ecosystem/lomap.md provides a card for both an atom mapper and a network planner.

It's a little annoying because we end up with a process that scales as n_projects * n_plugin_types, but since that happens once per site build (i.e., once per PR merge), I'm not too worried about performance optimization.

Side-note/reminder: You keep calling it jinja. It isn't jinja. If you google "how to do X with jinja," the answers you will find will not help you here. Jekyll and liquid are the names to look for. (Reminding because you've had this problem before.)

Ideally the name listed on the plugin registry website is the thing they can write into the CLI that will pick up that particular component. This will probably be the class name, but we also have the possibility of having name aliases as required.

This is the part that will be hard for us to enforce in any way. We can certainly practice it and recommend it, but we can only check that the name works if we download/import the plugin project as part of validation. I think that is out of scope for this tool (both in this PR and in the future).

richardjgowers

"existing projects" I meant the stubs inside _ecosystem.

This is looking good, I wouldn't worry about complexity.

I'm veering into things that could be future issues, but eventually Logos / graphical abstracts are something that would make individual ecosystem-project pages look more eye catching.

It sounds like we're settled on what the schema for the _ecosystem stubs are, - Should I populate the Lomap.md stub? Can we get @RiesBen to fill out the kartograf.md stub?

richardjgowers · 2023-12-08T14:18:40Z

README_ecosystem_plugins.md

+NOTE: Each ecosystem catalog entry should fit into exactly one category. If
+you've created a package that involves tools from multiple categories, please
+create multiple catalog entries. One pull request may contain as many catalog
+entries as you would like to add.


this isn't correct any more right? one entry can list multiple tools

richardjgowers · 2023-12-08T14:19:39Z

_config.yml

+    - name: "atommappers"
+      label: "Atom Mappers"
+    - name: "networkplanners"
+      label: "Network Planners"
+    - name: "protocols"
+      label: "Protocols"


scorers too

richardjgowers · 2023-12-08T14:31:59Z

_ecosystem/lomap.md

@@ -0,0 +1,42 @@
+---


This schema looks good to me. example.md and example2.md above now seem out of sync?

I'm confused by this comment: can you double-check what you're seeing? The changes to those in d0165cc should have brought them into the new format. Am I missing something?

example.md has a category: field?

Missed that! Unused though. (This is part of why it'll be good to have an action to check that the YAML matches the schema -- I started playing with that a bit, but don't plan to include it in this PR).

richardjgowers · 2023-12-08T14:35:41Z

ecosystem.html

@@ -0,0 +1,29 @@
+---


this page creates the grouped-by-category listing of all plugins right?

could we also have a page that is the grouped-by-project list of all projects and their contents?

So in my non-jinja code:

Here's all the different projects that provide plugins within the OpenFE ecosystem: {% for project in site.ecosystem %} <h3> project.title </h3> // links to the full page for this project? i.e. the full rendering of the .md page? project.short_description {% for plugin in project %} emit some basic info on this tool {% endfor %} {% endfor %}

I had thought about this, and yes, it would be straightforward to add such a page. But I had a couple doubts about the usefulness of that:

I suspect that "all plugins organized by project on one page" won't be a super common view people want (at least compared to "all plugins organized by type" and "everything about a specific project", which are already implemented).

The project page gives a quick overview (currently in the summary box) of all plugins in that project. I'm still working on how to make that maximally useful, but that's just a matter of tweaking the page we already have. So this is the easy way to answer the question "what are all the plugins for a given project"

There's a question of where in the website to present this information. If there's only one page for all plugins, it's clear that this is linked from an "Ecosystem" tab in the main nav. If there are two pages, then do we add a secondary nav tab bar within the page to switch between them? That may start to get a little cluttered.

So yes, make that page would be easy. But I'm not entirely sure what we'd do with it once we have it. My thought is to leave it out of this PR and see if we find it needed later.

The one point this brings up is that I currently don't have a per-project short description. Each plugin needs a short description, but I don't require that for each project. This is something that can be added to the schema if we want, though.

richardjgowers · 2023-12-08T14:36:41Z

_ecosystem/_default.md

@@ -0,0 +1,19 @@
+---


this needs updating to the new schema I think. I'd also err on the side of more descriptive prompts inside each placeholder, e.g. <CATEGORY (one of Atom Mapper, Scorer, Planner, Protocol)>

dwhswenson · 2023-12-08T16:33:00Z

I'm veering into things that could be future issues, but eventually Logos / graphical abstracts are something that would make individual ecosystem-project pages look more eye catching.

Already, everything below the --- is free-form markdown, so you can add images as well (I wanted to play with that a bit to determine where we want people to store those assets, assuming we take them into the repo, but that's just about maximizing ease of learning/maintenance -- it's certainly technically feasible already).

I do have an as-yet-unused "thumbnail" in the _default, which I imagined as a small logo to put on the card. But that might be either too busy (different image styles) or too repetitive (how many times will we see the OpenFE logo?).

It sounds like we're settled on what the schema for the _ecosystem stubs are, - Should I populate the Lomap.md stub? Can we get @RiesBen to fill out the kartograf.md stub?

I'm not sure that we actually have settled on the schema yet. I'd like a little more feedback on that -- I've updated the _default.md to represent the current schema (including some currently unused extensions that I think will be useful). I would definitely appreciate feedback from you and @RiesBen (and also @IAlibay, maybe?) on the fields we're currently using, and starting to fill these and (and see how they look when you build locally) would definitely be a start to that.

One schema question to consider: Currently we list plugins by category as:

atommappers:
  - name: LomapAtomMapper
    description: blah blah
networkplanners:
  - name: LomapNetworkPlanner
    description: blah blah

This could instead be represented as:

plugins:
  - name: LomapAtomMapper
    type: atommapper
    description: blah blah
  - name: LomapNetworkPlanner
    type: networkplanner
    description: blah blah

The second form is a little nicer computationally, but (1) not enough that it actually makes a significant difference, since I'm not terribly worried about build time; and (2) my intuition is that the current implementation is more straightforward for users, and less error-prone (in the alternate, you must spell each plugin type correctly each time, currently you only need to get it right once, for the section name).

But if you try it out and find that the other format seems better from a contributor point of view, it wouldn't be hard to change to that schema.

mostly on the contents and CSS of the details box

richardjgowers · 2023-12-08T19:11:58Z

I've fine with either plugin schema above. Both make enough sense, we're not worried about computational complexity, we can catch typos in PRs since these plugin descriptions have to live centrally.

I'm happy to merge this as a bare-bones version, then we can see how it looks live and do a few passes of iterative tweaks. It's a little hard to completely review without building it locally which is a bit of a pain.

dwhswenson · 2023-12-08T19:51:14Z

Yeah, this is still a semi-stealth-mode page (not linked from anywhere on the main website), so if you can't get run locally, merging it as is doesn't put the dirty laundry in public. The schema will only be locked when we start invited others to contribute here.

As you play with it, please do keep the two approaches for the plugin config in mind, and see if one feels like it would be much easier than the other. I don't have a sense for which is the better for users.

BTW, I've extended the project overview box to include more of the included metadata, but I haven't tested most of that yet. So expect some bumps still.

RiesBen · 2023-12-08T21:59:05Z

If you need an opinion on the plugins, :) I would prefer the second one, as I imagine you can be more flexible with those plugin settings.

dwhswenson added 2 commits December 5, 2023 11:28

Start to ecosystem registry

1981c82

make ecosystem-entry responsive

1b9589d

richardjgowers reviewed Dec 6, 2023

View reviewed changes

Switch to one page per project

d0165cc

per-project cleanup

784fceb

richardjgowers reviewed Dec 8, 2023

View reviewed changes

update schema for new format

65eadbf

dwhswenson added 4 commits December 8, 2023 10:39

add support for scorers

12a6e91

update ecosystem readme

ec95ef0

improvements on the ecosystem entry

23aeade

mostly on the contents and CSS of the details box

small CSS and semantic HTML improvements

33c9afa

dwhswenson added 2 commits December 8, 2023 13:35

more tweaks around semantic html

2cb2cba

update ecosystem/_default with still-unused

4d439b6

richardjgowers approved these changes Dec 11, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Ecosystem registry infrastructure #68

[WIP] Ecosystem registry infrastructure #68

dwhswenson commented Dec 5, 2023

richardjgowers commented Dec 6, 2023

dwhswenson commented Dec 6, 2023

richardjgowers commented Dec 6, 2023

richardjgowers Dec 6, 2023

dwhswenson commented Dec 6, 2023

richardjgowers commented Dec 6, 2023

richardjgowers commented Dec 7, 2023

dwhswenson commented Dec 7, 2023

richardjgowers left a comment

richardjgowers Dec 8, 2023

richardjgowers Dec 8, 2023

richardjgowers Dec 8, 2023

dwhswenson Dec 8, 2023

richardjgowers Dec 8, 2023

dwhswenson Dec 8, 2023

richardjgowers Dec 8, 2023

dwhswenson Dec 8, 2023

richardjgowers Dec 8, 2023

dwhswenson commented Dec 8, 2023

richardjgowers commented Dec 8, 2023

dwhswenson commented Dec 8, 2023

RiesBen commented Dec 8, 2023

[WIP] Ecosystem registry infrastructure #68

Are you sure you want to change the base?

[WIP] Ecosystem registry infrastructure #68

Conversation

dwhswenson commented Dec 5, 2023

richardjgowers commented Dec 6, 2023

dwhswenson commented Dec 6, 2023

richardjgowers commented Dec 6, 2023

Choose a reason for hiding this comment

dwhswenson commented Dec 6, 2023

richardjgowers commented Dec 6, 2023

richardjgowers commented Dec 7, 2023

dwhswenson commented Dec 7, 2023

richardjgowers left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dwhswenson commented Dec 8, 2023

richardjgowers commented Dec 8, 2023

dwhswenson commented Dec 8, 2023

RiesBen commented Dec 8, 2023