Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create proposal for training repo #85

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

RobotSail
Copy link
Member

Creates a proposal for a new training repo.

Signed-off-by: Oleg S [email protected]

@mergify mergify bot added the backend InstructLab Backend Services label Jun 10, 2024
@nathan-weinberg nathan-weinberg self-requested a review June 10, 2024 15:34
Copy link
Member

@nathan-weinberg nathan-weinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, one comment

Comment on lines 41 to 43
The maintainers should be the folks who currently work on training.
Namely:

- [Aldo Pareja](https://github.com/aldopareja)
- [James Kunstle](https://github.com/orgs/instructlab/people/JamesKunstle)
- [Oleg Silkin](https://github.com/orgs/instructlab/people/RobotSail)
- [Mustafa Eyceoz](https://github.com/orgs/instructlab/people/Maxusmusti)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create a new GitHub Team for this, something like Training Maintainers? And these folks can be defined as the initial members

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that makes sense to me.

Copy link
Member

@nathan-weinberg nathan-weinberg Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this made it into the doc?

Copy link
Member

@russellb russellb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lgtm, though I'll hold off on approval until more folks weigh in.

Please take a look at the lining errors when you have some time.

I'd also suggest moving the file to a new docs/training/ directory, similar to the docs/cli/ directory. I want to create a similar docs/sdg/ directory. I think this organization will help.

@RobotSail RobotSail force-pushed the add-training-repo branch 3 times, most recently from 1bef61d to 03ec16b Compare June 10, 2024 17:54
@danmcp
Copy link
Member

danmcp commented Jun 10, 2024

LGTM

@nathan-weinberg
Copy link
Member

This lgtm, though I'll hold off on approval until more folks weigh in.

Please take a look at the lining errors when you have some time.

I'd also suggest moving the file to a new docs/training/ directory, similar to the docs/cli/ directory. I want to create a similar docs/sdg/ directory. I think this organization will help.

I made a docs/backend/ directory in #79 - should I change that to docs/evaluation/?

Copy link
Contributor

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting a proposal! The PR title/Commit title are a bit confusing at first. "create proposal for training repo" is too generic, please rephrase to highlight the proposal.

@@ -2,3 +2,5 @@

# Spelling
dictionary.dic

venv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was merged in #90. Please rebase.

@@ -0,0 +1,77 @@
# Modify Repository Proposal: Training
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about simply docs/training/repo.md? We are in the training directory already so no need to repeat it.


Today, we have training implemented for different use-cases
existing in different repos. LoRA/QLoRA training currently lives
in the [`instructlab`](https://github.com/instructlab/instructlab)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this logic will have to be extracted to instructlab/training once we agree on this proposal, right?

Comment on lines +41 to +43
- [James Kunstle](https://github.com/orgs/instructlab/people/JamesKunstle)
- [Oleg Silkin](https://github.com/orgs/instructlab/people/RobotSail)
- [Mustafa Eyceoz](https://github.com/orgs/instructlab/people/Maxusmusti)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 3 links return 404 for me.

@lhawthorn
Copy link
Member

Modulo feedback from @leseb, this LGTM. When feedback addressed I will take another look.

Copy link
Contributor

@cdoern cdoern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some initial thoughts

The maintainers should be the folks who currently work on training.
Namely:

- [Aldo Pareja](https://github.com/aldopareja)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to be added to this as the community train rep currently. Or @alimaredia. If we are planning on moving that code to the library, I would appreciate a part in that 🙏

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sdg and eval repos bootstrapped access by copying the existing Backend Maintainers team. That's an alternative that could be used here.

Rather than consolidating the logic, we can keep the existing
logic as-is.

This would mean that the [CLI repo](https://github.com/instructlab/instructlab) and the [training repo](https://github.com/instructlab/training) would both maintain their own implementations of training.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more what I was thinking, and this overlaps with #52 alot. Something we need to reconcile

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this comment mean that this alternative is what you had in mind?

@cdoern
Copy link
Contributor

cdoern commented Jun 11, 2024

This shouldn't be merged until it is reconciled with #52 which pre-exists this

@cdoern cdoern added the hold label Jun 11, 2024
@russellb
Copy link
Member

@RobotSail would you mind rebasing this? The main edit I see a need for is just reflecting that "Backend Maintainers" was used to create the initial maintainers list.

Otherwise, I think we should just merge this. This work has already moved on and is well in progress. cc @cdoern since you felt this should block on #52 , though this doc doesn't really talk about CLI integration details, so I think it's OK to merge it. Let me know if there's some other conflicting info you'd like to comment on more specifically.

Copy link
Member

@hickeyma hickeyma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @RobotSail for pushing the PR. Its looks ok if it is just covering making the existing repo into a library.

I am bit confused by "Move everything into one repo but don't design an interface" section. Can you explain what you mean here.

As @russellb said, the PR should probably be merged as the library is now in situ.

@nathan-weinberg
Copy link
Member

This happened but the doc was never merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend InstructLab Backend Services hold
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants