-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added document detailing org roadmap to CLI 1.0.0 #159
base: main
Are you sure you want to change the base?
Conversation
@RobotSail I didn't include the UI piece in this doc since it's only for the CLI, but let me know if you wanna talk that out or have me mention something in the doc somewhere about a UI anyway, maybe under the Q&A |
@nathan-weinberg Ah gotcha, we can find a better place for it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just two small comments, I like the very general approach here.
docs/instructlab-cli-1.0.0.md
Outdated
|
||
**Q. What about the libraries? Will they 1.0.0 as well?** | ||
|
||
A. It depends - we historically have not aligned the CLI and Library releases on a particular version numbering scheme, apart from matching Y-Streams to Y-Streams (e.g., InstructLab CLI 0.20 used SDG 0.4, Training 0.5, and Eval 0.3). At this stage, this document scopes only the prereqs we want for the CLI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the libraries will inherently need a large version bump as we standardize around a REST API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good thought! I will let others weigh-in before changing (particular folks from the Library Maintainer teams) but I'm definitely amicable to this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to plan this but I agree, the libraries will have to catch up to the backgroundable/RESTy version of ilab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the CLI would be able to meet any of its backwards compatibility goals stated here unless the libraries also committed to those goals? This is perhaps a more general issue outside of the scope of just a CLI 1.0.0 as to what level of maturity and backwards compatibility of our Python APIs we want to commit to across repos in the org, as that ultimately determines what level of backwards compatibility consumers of those APIs (like CLI) can commit to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also a good point @bbrowning - what do you think might be a reasonable expectation, both for ourselves and the community?
docs/instructlab-cli-1.0.0.md
Outdated
- An official v1 REST API schema | ||
- Integration of the InstructLab CLI with RAG | ||
- An upgrade path to subsequent Y-Streams and an eventual 2.0 | ||
- Backwards compatibility across the 1.y stream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is a really big commitment that we don't necessarily have to make. It's common for consumers of python projects to expect minimal backward compat and to take the most recent version as that one with fewest feature errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a compromise be to commit to backward compatibility across all 1.0.y versions? I.e., that API breaking changes will not happen on third digit point releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, the vagueness of this makes it quite the lift - maybe we can commit to backwards compatibility for N-1 Y-Streams?
E.g., 1.4
is backwards-compatible with 1.3
but not necessarily 1.2
Or do you think even that might be more than worth committing to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N-1-Y sounds complicated to manage to me because breaking changes need to be staged out across multiple releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Def good with backwards compatibility across Z-Streams only - @JamesKunstle wdyt?
docs/instructlab-cli-1.0.0.md
Outdated
- Integration of the InstructLab CLI with RAG | ||
- An upgrade path to subsequent Y-Streams and an eventual 2.0 | ||
- Backwards compatibility across the 1.y stream | ||
- An official hardware support matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kelbrown20 is actually working on this now! instructlab/instructlab#2670
@@ -0,0 +1,35 @@ | |||
# The Road to 1.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not anxious to push for a 1.0.0 release for the sake of having a >1.0.0 project. I don't personally believe that semver major versions are all that meaningful to the end user.
However, these are good objectives for us and would represent a major change in the implementation and design of the project, which I'm certainly on board with.
docs/instructlab-cli-1.0.0.md
Outdated
|
||
## Context and Goals | ||
|
||
At the time of this writing, the InstructLab CLI has gone from a scrappy research project to an upstream community serving as the basis for multiple downstreams, with the goal to continuing to evolve the community to encourage more participation from additional stakeholders. To wit, it would behoove us to determine what exactly we should be roadmapping between now and a proper 1.0.0 release, which demonstrates the following to existing and potential community members: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that is missing from this context section that came up in the discussion of the SDG refactoring draft dev doc is the fact that the repo used to be instructlab/cli
and largely focused on just the CLI layer and now it is instructlab/instructlab
and is more of a core integrating element that contains a lot of the glue functionality for all of InstructLab. I think this document should either state this or link to a document that states this.
I also have a proposal that I think might be more controversial. Can we stop calling this component the CLI and provide a new name for it? The repo could still be instructlab/instructlab
but we would need to update the README, the upstream Slack channel name, start using the new name in dev docs and Jira issues, update the labels in meeting invites, etc.; so it is not a small undertaking, but I think it is one that will provide a lot of value in the long term because having accurate names for the major components makes it easier for new participants in the project to understand the system.
I think InstructLab core would be a good name, but I would be open to other names that convey a sense that the is the central orchestrator for all of the components of InstructLab .
Finally, I really think the name issue should be sorted out in this dev doc because settling on a name and providing consistent communication around the name should be a prereq for declaring the repo 1.0.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that is missing from this context section that came up in the discussion of the SDG refactoring draft dev doc is the fact that the repo used to be instructlab/cli and largely focused on just the CLI layer and now it is instructlab/instructlab and is more of a core integrating element that contains a lot of the glue functionality for all of InstructLab. I think this document should either state this or link to a document that states this.
I agree with this sentiment, though candidly I don't know where the decision of record is for this - I'll need to do some digging.
I also have a proposal that I think might be more controversial. Can we stop calling this component the CLI and provide a new name for it? The repo could still be instructlab/instructlab but we would need to update the README, the upstream Slack channel name, start using the new name in dev docs and Jira issues, update the labels in meeting invites, etc.; so it is not a small undertaking, but I think it is one that will provide a lot of value in the long term because having accurate names for the major components makes it easier for new participants in the project to understand the system.
I can add a section about this and update the existing references - I agree that the "CLI" is really just a part of the overall instructlab
package that serves as a means for controlling the orchestration, which will in the future have other entrypoints such as a GUI and direct API calls. That being said, I don't feel an entirely holistic renaming of things here is necessary - some things like parts of the README and the meeting name yes, but the CLI portion of this component is always going to exist and need a place for discussion/way to be referred to.
I think InstructLab core would be a good name, but I would be open to other names that convey a sense that the is the central orchestrator for all of the components of InstructLab .
Engine might also be a good term for it
Finally, I really think the name issue should be sorted out in this dev doc because settling on a name and providing consistent communication around the name should be a prereq for declaring the repo 1.0.0.
Let's definitely flesh this out in the document - I'll add a new section for it. That being said, I am wary this issue might bog down the rest of the document, and reserve the right to postpone a final decision to after merging this, while specifying that a decision must be made prior to 1.0.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seem to be a few open issues here especially around naming and backward compatibility.
docs/instructlab-cli-1.0.0.md
Outdated
At a high-level, these are the items the Maintainer teams believe should serve as prereqs for releasing an InstructLab CLI 1.0.0: | ||
|
||
- Removal of the need for a `config.yaml` with a fully-realized configuration scheme, centered around the usage of system profiles | ||
- An official v1 REST API schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This REST API looks nice. I think this encourages people's contributions, and people with a less power laptop can train the dataset faster.
Thanks for putting this together! I suggest considering architectural goals even if they are implied by things already listed, e.g. the REST API work will probably drive modularization. Do we want to stabilize our systems architecture before declaring a 1.0? What about API guarantees? State of CI? Documentation? Testing? I think it can definitely be reasonable for these things to be out of scope for this doc - but if so, I think it would be prudent to say so and at least hint at how and where those will end up being addressed. |
Of course! I definitely agree with the sentiment - I have some concrete items under the |
86624bb
to
aead764
Compare
I'm happy to help discuss architectural goals if we want to scope those into this document. I would also point out that what you've put together here is in the direction of defining a technical vision to guide us for somewhere between 6 and 12 months probably. More would need to be done to make it that, but this a first step on the way there. I think it would be appropriate to spend some time on this the next time the org gets together in a planning meeting or some such - it might deserve its own one-off meeting. Next steps - which would not be in the scope of this document - would be to work backwards from whatever goals are defined here to identify milestones etc with buy-in from the whole org. |
@@ -0,0 +1,70 @@ | |||
# The Road to 1.0.0 | |||
|
|||
_Or: How I Learned to Stop Worrying and Love to GA_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😆
refer to this repo and Python package. | ||
|
||
Today, InstructLab has gone from a scrappy research project to an upstream community serving as the basis for multiple downstreams, with the goal | ||
to continuing to evolve the community to encourage more participation from additional stakeholders. To wit, it would behoove us to determine what exactly we should be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"behoove"
"to wit"
A++++ would buy again
|
||
1. An official goalpost for the community denoting the evolution of InstructLab from a pre-1.0 project lacking the stability and supportability typically seen from 1.0-and-beyond projects. | ||
1. A dedicated set of V1 interfaces that can be counted on for continuous usage of InstructLab 1.0 with future provisions made for backwards compatibility for subsequent Y-Streams and Z-Streams. | ||
1. A commitment from the Oversight Committee and Maintainer teams to continue to maintain InstructLab throughout a 1.y cycle and work towards an eventual 2.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this in doubt? What is being suggested here, a governance change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely not in doubt, I just think it more signifies our existing commitment - a 1.0+ piece of software is less likely to be abandoned and more likely to be maintained than a pre-1.0 piece of software (maybe that's my own personal bias 😄)
|
||
### A consistent naming scheme | ||
|
||
As noted in the `Context and Goals` section, InstructLab started off as just as a CLI - however, we are planning for this package to serve as a more general "Engine" - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some projects name this *-core. Eg was looking up jupyter packages in fedora today... they have a jupyter-core package. Eg. https://packages.fedoraproject.org/search?query=*core and https://pypi.org/search/?q=*-core
*-base also appears popular
https://packages.fedoraproject.org/search?query=*base and https://pypi.org/search/?q=*-base (first page skip on pypi)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe @jwm4 had suggested Core as well!
|
||
As noted in the `Context and Goals` section, InstructLab started off as just as a CLI - however, we are planning for this package to serve as a more general "Engine" - | ||
being a place where a future REST API can be defined that is used by both the CLI aspect as well as an official GUI for orchestrating the entire LAB workflow. Despite | ||
this, the repo is often still referred to as "the CLI". We as an organization need a better term to refer to this repo as, and should adopt the relevant documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
We need to have a defined v1 REST API schema - while this does not preclude future updates, something mature enough to serve as a v1 API throughout subsequent Y-Streams | ||
for an InstructLab 1.0 is a must for such a milestone. | ||
|
||
### Integration of InstructLab with RAG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably better define what this actually means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love for the folks working on RAG to chime in here and suggest more detail, I've been pretty far removed from that work
aead764
to
8d5ad47
Compare
Signed-off-by: Nathan Weinberg <[email protected]>
8d5ad47
to
468b1f4
Compare
Resolves #157