Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-31010] Add Transformer and Estimator for GBTClassifier and GBTRegressor #210

Open
wants to merge 47 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
cde4b9c
Add preprocess for GBT algorithms
Fanoid Feb 8, 2023
7d561c3
Add training and prediction
Fanoid Feb 8, 2023
ee76f33
Support checkpoint for operator states
Fanoid Feb 10, 2023
69e0774
Add GBTClassifier
Fanoid Feb 10, 2023
2c43fa6
Add GBTRegressor
Fanoid Feb 7, 2023
2fe1f72
Fixing some missing Javadoc comment.
Fanoid Feb 13, 2023
dbe0009
[NO MERGE] Ad-hoc fix of KBinsDiscretizer
Fanoid Feb 7, 2023
638ebee
Fix tests according to ad-hoc fix of KBinsDiscretizer.
Fanoid Feb 13, 2023
34797f0
Change LocalState datastream to JVM static memory
Fanoid Feb 13, 2023
4a76eda
Change features storage in BinnedInstance
Fanoid Feb 16, 2023
4b97064
[NO MERGE] Ignore GBT operators to pass Python completeness tests.
Fanoid Feb 21, 2023
c7156a4
Improve feature splitter.
Fanoid Feb 17, 2023
5525b85
Improve hist builder when no feature subsampling.
Fanoid Feb 17, 2023
8932392
Add optimized serializer for double arrays.
Fanoid Feb 17, 2023
74fb657
Fixed checkpoint problem.
Fanoid Feb 20, 2023
99061bc
Rewrite shared storage.
Fanoid Feb 21, 2023
ad8d4a7
Replace inputCols and featuresCol with featuresCols.
Fanoid Feb 28, 2023
b0309e4
Fix cases when reading a shared item earlier than its initialization.
Fanoid Feb 28, 2023
9d3f10f
Improve GBT params.
Fanoid Feb 28, 2023
1f592eb
Remove unused HasLossType.
Fanoid Feb 28, 2023
bd79176
Improve loss func.
Fanoid Feb 28, 2023
fdaa732
Refactor params, BoostingStrategy, and Distributor.
Fanoid Mar 1, 2023
f236703
Remove GBTRunnerTest.
Fanoid Mar 1, 2023
b21a23b
Only call ListStateWithCache#update just before snapshot.
Fanoid Mar 1, 2023
768820a
Refine some TODOs.
Fanoid Mar 1, 2023
132cca8
Improve javadoc for GBTClassifier and GBTRegressor.
Fanoid Mar 1, 2023
546591e
Improve categorical feature splitter by ignoring less frequent catego…
Fanoid Mar 6, 2023
5562e80
Change PredGradHess to double[].
Fanoid Mar 7, 2023
01c4211
Improve Histogram to remove scattering.
Fanoid Mar 7, 2023
7f73aa1
Add eclipse collection jars to uber jar
Fanoid Mar 16, 2023
075262d
Support output feature importance
Fanoid Mar 20, 2023
28cdd28
Update setModelData to support featureImportanceTable.
Fanoid Mar 20, 2023
bc6465d
Remove duplicated model data
Fanoid Mar 20, 2023
d4f1dd5
Fix save/load for feature importance.
Fanoid Mar 21, 2023
d8c71fc
Fix get label when type is not double.
Fanoid Mar 24, 2023
adf3115
Reduce computation for nodes with max depth.
Fanoid Mar 28, 2023
a105e35
[NO MERGE] Ad-hoc fix for NaN values in KBinsDiscretizer
Fanoid Apr 3, 2023
1006812
Fix out-of-bound exception for nodeFeaturePairs
Fanoid Apr 3, 2023
ad646c4
Change to streaming processing from histogram building to global spli…
Fanoid Apr 4, 2023
513ed96
Simplify APIs in SharedStorageContext
Fanoid Mar 13, 2023
e709a35
Fix after merging master
Fanoid May 5, 2023
c109c76
Fix SharedStorageUtilsTest
Fanoid May 5, 2023
5313d51
Rename shared storage to shared objects and change according to comments
Fanoid May 5, 2023
e098994
Update codes according to comments.
Fanoid May 29, 2023
233b885
Refactor share objects infra to resolve challenges.
Fanoid May 15, 2023
4db52f3
Remove unused files.
Fanoid Aug 9, 2023
0109bb8
Fix imports.
Fanoid Aug 15, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.flink.ml.common.sharedobjects;

import org.apache.flink.streaming.api.operators.OneInputStreamOperator;

import java.util.List;

/** The base class for {@link OneInputStreamOperator}s where shared objects are accessed. */
public abstract class AbstractSharedObjectsOneInputStreamOperator<IN, OUT>
extends AbstractSharedObjectsStreamOperator<OUT>
implements OneInputStreamOperator<IN, OUT> {

public abstract List<ReadRequest<?>> readRequestsInProcessElement();
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.flink.ml.common.sharedobjects;

import org.apache.flink.streaming.api.operators.AbstractStreamOperator;

import java.util.UUID;

/**
* A base class of stream operators where shared objects are required.
*
* <p>Official subclasses, i.e., {@link AbstractSharedObjectsOneInputStreamOperator} and {@link
* AbstractSharedObjectsTwoInputStreamOperator}, are strongly recommended.
*
* <p>If you are going to implement a subclass by yourself, you have to handle potential deadlocks.
*/
public abstract class AbstractSharedObjectsStreamOperator<OUT> extends AbstractStreamOperator<OUT> {

/**
* A unique identifier for the instance, which is kept unchanged between client side and
* runtime.
*/
private final String accessorID;

/** The context for shared objects reads/writes. */
protected transient SharedObjectsContext context;

AbstractSharedObjectsStreamOperator() {
super();
accessorID = getClass().getSimpleName() + "-" + UUID.randomUUID();
}

void onSharedObjectsContextSet(SharedObjectsContext context) {
this.context = context;
}

String getAccessorID() {
return accessorID;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.flink.ml.common.sharedobjects;

import org.apache.flink.streaming.api.operators.TwoInputStreamOperator;

import java.util.List;

/** The base class for {@link TwoInputStreamOperator}s where shared objects are accessed. */
public abstract class AbstractSharedObjectsTwoInputStreamOperator<IN1, IN2, OUT>
extends AbstractSharedObjectsStreamOperator<OUT>
implements TwoInputStreamOperator<IN1, IN2, OUT> {

public abstract List<ReadRequest<?>> readRequestsInProcessElement1();

public abstract List<ReadRequest<?>> readRequestsInProcessElement2();
}
Loading