Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added document detailing org roadmap to CLI 1.0.0 #159

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nathan-weinberg
Copy link
Member

Resolves #157

@nathan-weinberg
Copy link
Member Author

@RobotSail I didn't include the UI piece in this doc since it's only for the CLI, but let me know if you wanna talk that out or have me mention something in the doc somewhere about a UI anyway, maybe under the Q&A

@RobotSail
Copy link
Member

@nathan-weinberg Ah gotcha, we can find a better place for it.

@cdoern cdoern self-requested a review November 19, 2024 20:27
Copy link
Contributor

@cdoern cdoern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just two small comments, I like the very general approach here.

docs/instructlab-cli-1.0.0.md Outdated Show resolved Hide resolved

**Q. What about the libraries? Will they 1.0.0 as well?**

A. It depends - we historically have not aligned the CLI and Library releases on a particular version numbering scheme, apart from matching Y-Streams to Y-Streams (e.g., InstructLab CLI 0.20 used SDG 0.4, Training 0.5, and Eval 0.3). At this stage, this document scopes only the prereqs we want for the CLI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the libraries will inherently need a large version bump as we standardize around a REST API

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good thought! I will let others weigh-in before changing (particular folks from the Library Maintainer teams) but I'm definitely amicable to this

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to plan this but I agree, the libraries will have to catch up to the backgroundable/RESTy version of ilab.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the CLI would be able to meet any of its backwards compatibility goals stated here unless the libraries also committed to those goals? This is perhaps a more general issue outside of the scope of just a CLI 1.0.0 as to what level of maturity and backwards compatibility of our Python APIs we want to commit to across repos in the org, as that ultimately determines what level of backwards compatibility consumers of those APIs (like CLI) can commit to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also a good point @bbrowning - what do you think might be a reasonable expectation, both for ourselves and the community?

- An official v1 REST API schema
- Integration of the InstructLab CLI with RAG
- An upgrade path to subsequent Y-Streams and an eventual 2.0
- Backwards compatibility across the 1.y stream

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is a really big commitment that we don't necessarily have to make. It's common for consumers of python projects to expect minimal backward compat and to take the most recent version as that one with fewest feature errors.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a compromise be to commit to backward compatibility across all 1.0.y versions? I.e., that API breaking changes will not happen on third digit point releases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, the vagueness of this makes it quite the lift - maybe we can commit to backwards compatibility for N-1 Y-Streams?

E.g., 1.4 is backwards-compatible with 1.3 but not necessarily 1.2

Or do you think even that might be more than worth committing to?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N-1-Y sounds complicated to manage to me because breaking changes need to be staged out across multiple releases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Def good with backwards compatibility across Z-Streams only - @JamesKunstle wdyt?

- Integration of the InstructLab CLI with RAG
- An upgrade path to subsequent Y-Streams and an eventual 2.0
- Backwards compatibility across the 1.y stream
- An official hardware support matrix

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kelbrown20 is actually working on this now! instructlab/instructlab#2670

@@ -0,0 +1,35 @@
# The Road to 1.0.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not anxious to push for a 1.0.0 release for the sake of having a >1.0.0 project. I don't personally believe that semver major versions are all that meaningful to the end user.

However, these are good objectives for us and would represent a major change in the implementation and design of the project, which I'm certainly on board with.


## Context and Goals

At the time of this writing, the InstructLab CLI has gone from a scrappy research project to an upstream community serving as the basis for multiple downstreams, with the goal to continuing to evolve the community to encourage more participation from additional stakeholders. To wit, it would behoove us to determine what exactly we should be roadmapping between now and a proper 1.0.0 release, which demonstrates the following to existing and potential community members:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that is missing from this context section that came up in the discussion of the SDG refactoring draft dev doc is the fact that the repo used to be instructlab/cli and largely focused on just the CLI layer and now it is instructlab/instructlab and is more of a core integrating element that contains a lot of the glue functionality for all of InstructLab. I think this document should either state this or link to a document that states this.

I also have a proposal that I think might be more controversial. Can we stop calling this component the CLI and provide a new name for it? The repo could still be instructlab/instructlab but we would need to update the README, the upstream Slack channel name, start using the new name in dev docs and Jira issues, update the labels in meeting invites, etc.; so it is not a small undertaking, but I think it is one that will provide a lot of value in the long term because having accurate names for the major components makes it easier for new participants in the project to understand the system.

I think InstructLab core would be a good name, but I would be open to other names that convey a sense that the is the central orchestrator for all of the components of InstructLab .

Finally, I really think the name issue should be sorted out in this dev doc because settling on a name and providing consistent communication around the name should be a prereq for declaring the repo 1.0.0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that is missing from this context section that came up in the discussion of the SDG refactoring draft dev doc is the fact that the repo used to be instructlab/cli and largely focused on just the CLI layer and now it is instructlab/instructlab and is more of a core integrating element that contains a lot of the glue functionality for all of InstructLab. I think this document should either state this or link to a document that states this.

I agree with this sentiment, though candidly I don't know where the decision of record is for this - I'll need to do some digging.

I also have a proposal that I think might be more controversial. Can we stop calling this component the CLI and provide a new name for it? The repo could still be instructlab/instructlab but we would need to update the README, the upstream Slack channel name, start using the new name in dev docs and Jira issues, update the labels in meeting invites, etc.; so it is not a small undertaking, but I think it is one that will provide a lot of value in the long term because having accurate names for the major components makes it easier for new participants in the project to understand the system.

I can add a section about this and update the existing references - I agree that the "CLI" is really just a part of the overall instructlab package that serves as a means for controlling the orchestration, which will in the future have other entrypoints such as a GUI and direct API calls. That being said, I don't feel an entirely holistic renaming of things here is necessary - some things like parts of the README and the meeting name yes, but the CLI portion of this component is always going to exist and need a place for discussion/way to be referred to.

I think InstructLab core would be a good name, but I would be open to other names that convey a sense that the is the central orchestrator for all of the components of InstructLab .

Engine might also be a good term for it

Finally, I really think the name issue should be sorted out in this dev doc because settling on a name and providing consistent communication around the name should be a prereq for declaring the repo 1.0.0.

Let's definitely flesh this out in the document - I'll add a new section for it. That being said, I am wary this issue might bog down the rest of the document, and reserve the right to postpone a final decision to after merging this, while specifying that a decision must be made prior to 1.0.0.

Copy link

@jwm4 jwm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seem to be a few open issues here especially around naming and backward compatibility.

At a high-level, these are the items the Maintainer teams believe should serve as prereqs for releasing an InstructLab CLI 1.0.0:

- Removal of the need for a `config.yaml` with a fully-realized configuration scheme, centered around the usage of system profiles
- An official v1 REST API schema

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This REST API looks nice. I think this encourages people's contributions, and people with a less power laptop can train the dataset faster.

@anastasds
Copy link
Contributor

anastasds commented Nov 21, 2024

Thanks for putting this together!

I suggest considering architectural goals even if they are implied by things already listed, e.g. the REST API work will probably drive modularization. Do we want to stabilize our systems architecture before declaring a 1.0?

What about API guarantees? State of CI? Documentation? Testing? I think it can definitely be reasonable for these things to be out of scope for this doc - but if so, I think it would be prudent to say so and at least hint at how and where those will end up being addressed.

@nathan-weinberg
Copy link
Member Author

Thanks for putting this together!

I suggest considering architectural goals even if they are implied by things already listed, e.g. the REST API work will probably drive modularization. Do we want to stabilize our systems architecture before declaring a 1.0?

What about API guarantees? State of CI? Documentation? Testing? I think it can definitely be reasonable for these things to be out of scope for this doc - but if so, I think it would be prudent to say so and at least hint at how and where those will end up being addressed.

Of course! I definitely agree with the sentiment - I have some concrete items under the MVP for an InstructLab CLI 1.0.0 - I will make some additions there and we can further refine them

@anastasds
Copy link
Contributor

I'm happy to help discuss architectural goals if we want to scope those into this document.

I would also point out that what you've put together here is in the direction of defining a technical vision to guide us for somewhere between 6 and 12 months probably. More would need to be done to make it that, but this a first step on the way there. I think it would be appropriate to spend some time on this the next time the org gets together in a planning meeting or some such - it might deserve its own one-off meeting.

Next steps - which would not be in the scope of this document - would be to work backwards from whatever goals are defined here to identify milestones etc with buy-in from the whole org.

@@ -0,0 +1,70 @@
# The Road to 1.0.0

_Or: How I Learned to Stop Worrying and Love to GA_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆

refer to this repo and Python package.

Today, InstructLab has gone from a scrappy research project to an upstream community serving as the basis for multiple downstreams, with the goal
to continuing to evolve the community to encourage more participation from additional stakeholders. To wit, it would behoove us to determine what exactly we should be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"behoove"
"to wit"

A++++ would buy again

docs/instructlab-cli-1.0.0.md Outdated Show resolved Hide resolved

1. An official goalpost for the community denoting the evolution of InstructLab from a pre-1.0 project lacking the stability and supportability typically seen from 1.0-and-beyond projects.
1. A dedicated set of V1 interfaces that can be counted on for continuous usage of InstructLab 1.0 with future provisions made for backwards compatibility for subsequent Y-Streams and Z-Streams.
1. A commitment from the Oversight Committee and Maintainer teams to continue to maintain InstructLab throughout a 1.y cycle and work towards an eventual 2.0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this in doubt? What is being suggested here, a governance change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not in doubt, I just think it more signifies our existing commitment - a 1.0+ piece of software is less likely to be abandoned and more likely to be maintained than a pre-1.0 piece of software (maybe that's my own personal bias 😄)


### A consistent naming scheme

As noted in the `Context and Goals` section, InstructLab started off as just as a CLI - however, we are planning for this package to serve as a more general "Engine" -
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some projects name this *-core. Eg was looking up jupyter packages in fedora today... they have a jupyter-core package. Eg. https://packages.fedoraproject.org/search?query=*core and https://pypi.org/search/?q=*-core

*-base also appears popular

https://packages.fedoraproject.org/search?query=*base and https://pypi.org/search/?q=*-base (first page skip on pypi)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @jwm4 had suggested Core as well!


As noted in the `Context and Goals` section, InstructLab started off as just as a CLI - however, we are planning for this package to serve as a more general "Engine" -
being a place where a future REST API can be defined that is used by both the CLI aspect as well as an official GUI for orchestrating the entire LAB workflow. Despite
this, the repo is often still referred to as "the CLI". We as an organization need a better term to refer to this repo as, and should adopt the relevant documentation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

We need to have a defined v1 REST API schema - while this does not preclude future updates, something mature enough to serve as a v1 API throughout subsequent Y-Streams
for an InstructLab 1.0 is a must for such a milestone.

### Integration of InstructLab with RAG
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably better define what this actually means.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love for the folks working on RAG to chime in here and suggest more detail, I've been pretty far removed from that work

docs/instructlab-cli-1.0.0.md Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create document detailing pre-reqs for an InstructLab CLI 1.0.0
9 participants