Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplifying Open Source Contributions Through Operator Tiering from Dev aspect #510

Open
2 tasks done
yxdyc opened this issue Dec 11, 2024 · 1 comment
Open
2 tasks done
Assignees
Labels
dj:op issues/PRs about some specific OPs enhancement New feature or request good first issue Good for newcomers

Comments

@yxdyc
Copy link
Collaborator

yxdyc commented Dec 11, 2024

Search before continuing 先搜索,再继续

  • I have searched the Data-Juicer issues and found no similar feature requests. 我已经搜索了 Data-Juicer 的 issue 列表但是没有发现类似的功能需求。

Description 描述

Current state

The complete process for developing an operator can be quite demanding. This involves many things like

  • adhering to coding styles
  • adding new StatsKeys
  • creating a new operator file, considering decorators, unified model management of HF, handling batched operations, managing paths for Mapper operators
  • writing unit tests
  • documenting the operator
  • and ensuring it is fusible.

Although some of these steps are optional and can be automatically finished, they present additional cognitive and development burdens for new contributors, especially when compared to writing a simple demo operator from scratch.

A potential solution

To reduce the entry barriers, I propose implementing a tiered labeling system for operators, such as alpha_op, beta_op, and stable_op.

  • Alpha Operators: Only need to showcase new functionalities and serve as simple demos contributed by the community.
  • Beta Operators: Must meet the mandatory requirements outlined in the DJ-developer guide.
  • Stable Operators: In addition to satisfying beta_op criteria, they should fulfill optional recommendations in the DJ-developer guide, and write and pass both single-node and distributed unit tests.

Welcome better suggestions and more detailed implementation plans to enhance this proposal further.

Use case 使用场景

No response

Additional 额外信息

No response

Are you willing to submit a PR for this feature? 您是否乐意为此功能提交一个 PR?

  • Yes I'd like to help by submitting a PR! 是的!我愿意提供帮助并提交一个PR!
@yxdyc yxdyc added the enhancement New feature or request label Dec 11, 2024
@yxdyc yxdyc added good first issue Good for newcomers dj:op issues/PRs about some specific OPs labels Dec 11, 2024
@yxdyc
Copy link
Collaborator Author

yxdyc commented Dec 12, 2024

Cc: @Cathy0908 @SYSUzhouting @Qirui-jiao
If this feature is added, you will be able to simply add OPs in the alpha state first, then we can improve and iterate them into beta and stable.

@HYLcool, to implement this, we may need to figure out how to automatically label these tags based on some CI/CD mechanisms.

Please review this issue, and any discussions are welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dj:op issues/PRs about some specific OPs enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants