Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flink-27826] Support training very high dimensional logistic regression #237

Closed
wants to merge 18 commits into from

Conversation

zhipeng93
Copy link
Contributor

@zhipeng93 zhipeng93 commented May 5, 2023

What is the purpose of the change

This PR aims to (1) support training high dimensional logistic regression models with a parameter-server style infrasture, with flink-ml-iteration infra. (2) Abstract infrasture that could be reused by other machine learning algorithms.

Brief change log

  • Added WorkerOperator and ServerOperator to support distributed communication among parallel tasks, following the idea of parameter servers.
  • Added IterationStages to model iterative training process as a sequence of local computation and global communication process, so as to ease the programming of using flink-ml-iterations.
  • Added MLSession to store the information that can be shared among different IterationStages.
  • Expanded the model data format of LogisticRegression from a single vector as a several sliced vectors, by adding startIndex and endIndex in model data format.
  • Used third party jar fastutil to accelerate the process of primitive collections.
  • Added unit test to verify the infra.
  • Left several TODOs for later PRs.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? (avaDocs)

@zhipeng93 zhipeng93 marked this pull request as draft May 5, 2023 01:44
@zhipeng93 zhipeng93 force-pushed the FLINK-27826 branch 2 times, most recently from d57c3de to c8cf098 Compare May 8, 2023 10:05
@zhipeng93 zhipeng93 force-pushed the FLINK-27826 branch 2 times, most recently from 81b8391 to 3138661 Compare May 11, 2023 07:36
@zhipeng93 zhipeng93 changed the title [Flink-27826] Support machine learning training for very high dimesional models [Flink-27826] Support training very high dimensional logisticRegression May 11, 2023
@zhipeng93 zhipeng93 marked this pull request as ready for review May 11, 2023 07:50
@zhipeng93 zhipeng93 changed the title [Flink-27826] Support training very high dimensional logisticRegression [Flink-27826] Support training very high dimensional logistic regression May 11, 2023
Copy link
Member

@lindong28 lindong28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Left some comments below.

Copy link
Member

@lindong28 lindong28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Left some comments below.

@Fanoid
Copy link
Contributor

Fanoid commented May 26, 2023

Hi, @zhipeng93 . As you have mentioned the proposed solution in this PR could be an alternative to implement GBDT in #210, can we make the target applications/scenarios clearer in the description of this PR?

Although the description states the infrastructure can be reused by other ML algorithms, the infra codes seem not easy to be adapted. Many APIs/enums/codes in the infra codes are specifically designed/hard-coded for gradient-based algorithms, like MessageType, model format (double[]), ModelUpdater, codes in ServerOperator, etc. It is very difficult to extend current codes to support POJO messages and POJO model types unless developers make changes in the infra codes.

If the PS infra is targeting to algorithms more than gradient-based algorithms, the infra APIs may need to reflect the considerations on those cases.

@zhipeng93 zhipeng93 marked this pull request as draft June 9, 2023 10:42
@zhipeng93
Copy link
Contributor Author

zhipeng93 commented Aug 15, 2023

This PR introduces too many changes (i.e., communication infra, vectors, algorithms) and it is hard to review. I will open another PR [1] to introduce the communication infra first and address the comments raised in this PR.

[1] #251

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants