[Flink-27826] Support training very high dimensional logistic regression #237

zhipeng93 · 2023-05-05T01:44:15Z

What is the purpose of the change

This PR aims to (1) support training high dimensional logistic regression models with a parameter-server style infrasture, with flink-ml-iteration infra. (2) Abstract infrasture that could be reused by other machine learning algorithms.

Brief change log

Added WorkerOperator and ServerOperator to support distributed communication among parallel tasks, following the idea of parameter servers.
Added IterationStages to model iterative training process as a sequence of local computation and global communication process, so as to ease the programming of using flink-ml-iterations.
Added MLSession to store the information that can be shared among different IterationStages.
Expanded the model data format of LogisticRegression from a single vector as a several sliced vectors, by adding startIndex and endIndex in model data format.
Used third party jar fastutil to accelerate the process of primitive collections.
Added unit test to verify the infra.
Left several TODOs for later PRs.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (no)

Documentation

Does this pull request introduce a new feature? (yes)
If yes, how is the feature documented? (avaDocs)

lindong28

Thanks for the PR. Left some comments below.

flink-ml-lib/src/main/java/org/apache/flink/ml/common/updater/ModelUpdater.java

flink-ml-lib/src/main/java/org/apache/flink/ml/common/updater/FTRL.java

flink-ml-lib/src/main/java/org/apache/flink/ml/common/updater/ModelUpdater.java

...n/java/org/apache/flink/ml/classification/logisticregression/LogisticRegressionWithFtrl.java

flink-ml-lib/src/main/java/org/apache/flink/ml/common/ps/message/MessageType.java

flink-ml-lib/src/main/java/org/apache/flink/ml/common/ps/message/ZerosToPushM.java

flink-ml-lib/src/main/java/org/apache/flink/ml/common/ps/message/IndicesToPullM.java

flink-ml-lib/src/main/java/org/apache/flink/ml/common/ps/message/MessageUtils.java

flink-ml-lib/src/main/java/org/apache/flink/ml/common/ps/MirrorWorkerOperator.java

lindong28

Thanks for the PR. Left some comments below.

Fanoid · 2023-05-26T07:13:27Z

Hi, @zhipeng93 . As you have mentioned the proposed solution in this PR could be an alternative to implement GBDT in #210, can we make the target applications/scenarios clearer in the description of this PR?

Although the description states the infrastructure can be reused by other ML algorithms, the infra codes seem not easy to be adapted. Many APIs/enums/codes in the infra codes are specifically designed/hard-coded for gradient-based algorithms, like MessageType, model format (double[]), ModelUpdater, codes in ServerOperator, etc. It is very difficult to extend current codes to support POJO messages and POJO model types unless developers make changes in the infra codes.

If the PS infra is targeting to algorithms more than gradient-based algorithms, the infra APIs may need to reflect the considerations on those cases.

zhipeng93 · 2023-08-15T01:53:23Z

This PR introduces too many changes (i.e., communication infra, vectors, algorithms) and it is hard to review. I will open another PR [1] to introduce the communication infra first and address the comments raised in this PR.

[1] #251

zhipeng93 marked this pull request as draft May 5, 2023 01:44

Support SparseVector as input for LogisticRegression

ce947de

zhipeng93 force-pushed the FLINK-27826 branch 2 times, most recently from d57c3de to c8cf098 Compare May 8, 2023 10:05

[hotfix] Fix TableUtils.getRowTypeInfo when the input contains Tuple

9d58fc7

zhipeng93 force-pushed the FLINK-27826 branch 2 times, most recently from 81b8391 to 3138661 Compare May 11, 2023 07:36

zhipeng93 changed the title ~~[Flink-27826] Support machine learning training for very high dimesional models~~ [Flink-27826] Support training very high dimensional logisticRegression May 11, 2023

zhipeng93 marked this pull request as ready for review May 11, 2023 07:50

zhipeng93 changed the title ~~[Flink-27826] Support training very high dimensional logisticRegression~~ [Flink-27826] Support training very high dimensional logistic regression May 11, 2023

Expand LogisticRegressionModelData as many pieces

a92671c

zhipeng93 force-pushed the FLINK-27826 branch from f281f94 to dde302e Compare May 12, 2023 09:09

[FLINK-27826] Support training very high dimensional logisticRegression

74b4b7c

zhipeng93 force-pushed the FLINK-27826 branch from dde302e to 74b4b7c Compare May 12, 2023 09:24

lindong28 reviewed May 16, 2023

View reviewed changes

zhipeng93 mentioned this pull request May 25, 2023

[FLINK-31010] Add Transformer and Estimator for GBTClassifier and GBTRegressor #210

Open

zhipeng93 added 3 commits May 29, 2023 10:13

Average the gradient from workers

e70c3fe

Support pull/push value as array

3966321

resolve comments

b04cd84

zhipeng93 force-pushed the FLINK-27826 branch from 8a52d8c to b04cd84 Compare May 30, 2023 09:06

zhipeng93 added 8 commits June 5, 2023 10:53

add allreduce stage impl

d5dd3a9

Reorganize Vectors and add SparseLongDoubleVector

5c1d7aa

support allreduce aggregator for double[]

7325477

support allreduce aggregator for double[]

e938e18

FTRL should not be aware of numWorkers

3f45880

Reorganize message infra

ea4e159

Support output from worker operator

6b4df2a

Add test for trainingUtils.java

dce8b9b

zhipeng93 added 3 commits June 8, 2023 14:36

Rename LogisticRegressionModelData as LogisticRegressionModelDataSegment

4e77b2c

add bench io stuff, should be deleted in the real PR

d853828

cp

b83939c

zhipeng93 marked this pull request as draft June 9, 2023 10:42

zhipeng93 closed this Aug 15, 2023

zhipeng93 mentioned this pull request Aug 15, 2023

[FLINK-27286] Add infra to support training high dimension models #251

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Flink-27826] Support training very high dimensional logistic regression #237

[Flink-27826] Support training very high dimensional logistic regression #237

zhipeng93 commented May 5, 2023 •

edited

Loading

lindong28 left a comment

lindong28 left a comment

Fanoid commented May 26, 2023 •

edited

Loading

zhipeng93 commented Aug 15, 2023 •

edited

Loading

[Flink-27826] Support training very high dimensional logistic regression #237

[Flink-27826] Support training very high dimensional logistic regression #237

Conversation

zhipeng93 commented May 5, 2023 • edited Loading

What is the purpose of the change

Brief change log

Does this pull request potentially affect one of the following parts:

Documentation

lindong28 left a comment

Choose a reason for hiding this comment

lindong28 left a comment

Choose a reason for hiding this comment

Fanoid commented May 26, 2023 • edited Loading

zhipeng93 commented Aug 15, 2023 • edited Loading

zhipeng93 commented May 5, 2023 •

edited

Loading

Fanoid commented May 26, 2023 •

edited

Loading

zhipeng93 commented Aug 15, 2023 •

edited

Loading