Skip to content

Commit

Permalink
Enforce counter value to double type in rowwise_counter
Browse files Browse the repository at this point in the history
Summary:
Enforce counter value to double type in rowwise_counter.

**Context:**
The existing implementation is using float type for counter value. But due to the precision limit of a floating number [1], we observed that the counter value can't increment beyond 16777216.0 (i.e., the max value is 16777216.0) in our earlier experiments. We decide to enforce double type to avoid this issue.

[1] https://stackoverflow.com/questions/12596695/why-does-a-float-variable-stop-incrementing-at-16777216-in-c

Test Plan:
op test
```
ruixliu@devvm1997:~/fbsource/fbcode/caffe2/caffe2/python/operator_test(f0b0b48c)$ buck test :rowwise_counter_test
Trace available for this run at /tmp/testpilot.20200728-083200.729292.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision cd2638f1f47250eac058b8c36561760027d16add fbpkg f88726c8ebde4ba288e1172a348c7f46 at Mon Jul 27 18:11:43 2020 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/887/t.par
Discovering tests
Running 1 test
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/7881299364977047
      ✓ caffe2/caffe2/python/operator_test:rowwise_counter_test - test_rowwise_counter (caffe2.caffe2.python.operator_test.rowwise_counter_test.TestRowWiseCounter) 0.265 1/1 (passed)
      ✓ caffe2/caffe2/python/operator_test:rowwise_counter_test - main 14.414 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/7881299364977047
Summary (total time 18.51s):
  PASS: 2
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

optimizer test
```
ruixliu@devvm1997:~/fbsource/fbcode/caffe2/caffe2/python(7d66fbb9)$ buck test :optimizer_test
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/7036874434841896
Summary (total time 64.87s):
  PASS: 48
  FAIL: 0
  SKIP: 24
    caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestMomentumSgd)
    caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestGFtrl)
    caffe2/caffe2/python:optimizer_test - test_caffe2_cpu_vs_numpy (caffe2.caffe2.python.optimizer_test.TestYellowFin)
    caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestSparseRAdam)
    caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestRowWiseAdagradWithCounter)
    caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestAdagrad)
    caffe2/caffe2/python:optimizer_test - test_caffe2_gpu_vs_numpy (caffe2.caffe2.python.optimizer_test.TestYellowFin)
    caffe2/caffe2/python:optimizer_test - testDense (caffe2.caffe2.python.optimizer_test.TestRowWiseAdagrad)
    caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestFtrl)
    caffe2/caffe2/python:optimizer_test - testSparse (caffe2.caffe2.python.optimizer_test.TestRmsProp)
    ...and 14 more not shown...
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

param download test
```
ruixliu@devvm1997:~/fbsource/fbcode/caffe2/caffe2/fb/net_transforms/tests(7ef20a38)$ sudo buck test :param_download_test
Finished test run: Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/6473924481526935
```

e2e flow:
f208394929
f207991149
f207967273

ANP notebook to check the counter value loaded from the flows
https://fburl.com/anp/5fdcbnoi

screenshot of the loaded counter (note that counter max is larger than 16777216.0)

{F250926501}

Reviewed By: ellie-wen

Differential Revision: D22711514

fbshipit-source-id: 426fed7415270aa3f276dda8141907534734337f
  • Loading branch information
Rui Liu authored and facebook-github-bot committed Aug 6, 2020
1 parent c14fbc3 commit 92b7347
Show file tree
Hide file tree
Showing 5 changed files with 10 additions and 5 deletions.
6 changes: 5 additions & 1 deletion caffe2/operators/utility_ops.h
Original file line number Diff line number Diff line change
Expand Up @@ -643,6 +643,8 @@ class ScatterAssignOp : public Operator<Context> {
&ScatterAssignOp::DoRun<int32_t, int32_t>},
{{TensorProto_DataType_INT32, TensorProto_DataType_INT64},
&ScatterAssignOp::DoRun<int32_t, int64_t>},
{{TensorProto_DataType_INT32, TensorProto_DataType_DOUBLE},
&ScatterAssignOp::DoRun<int32_t, double>},
{{TensorProto_DataType_INT64, TensorProto_DataType_FLOAT},
&ScatterAssignOp::DoRun<int64_t, float>},
{{TensorProto_DataType_INT64, TensorProto_DataType_FLOAT16},
Expand All @@ -652,7 +654,9 @@ class ScatterAssignOp : public Operator<Context> {
{{TensorProto_DataType_INT64, TensorProto_DataType_INT32},
&ScatterAssignOp::DoRun<int64_t, int32_t>},
{{TensorProto_DataType_INT64, TensorProto_DataType_INT64},
&ScatterAssignOp::DoRun<int64_t, int64_t>}}) {}
&ScatterAssignOp::DoRun<int64_t, int64_t>},
{{TensorProto_DataType_INT64, TensorProto_DataType_DOUBLE},
&ScatterAssignOp::DoRun<int64_t, double>}}) {}

bool RunOnDevice() override {
const auto& data = Input(DATA);
Expand Down
2 changes: 1 addition & 1 deletion caffe2/python/operator_test/rowwise_counter_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ def test_rowwise_counter(self):
n = 5
curr_iter = np.array([100], dtype=np.int64)

update_counter = np.random.randint(99, size=h).astype(np.float32)
update_counter = np.random.randint(99, size=h).astype(np.float64)
prev_iter = np.random.rand(h, 1).astype(np.int64)
indices = np.unique(np.random.randint(0, h, size=n))
indices.sort(axis=0)
Expand Down
2 changes: 2 additions & 0 deletions caffe2/python/optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -750,6 +750,7 @@ def _run(self, net, param_init_net, param_info):
str(param) + "_update_counter",
input_as_shape=1,
value=0.0,
dtype=core.DataType.DOUBLE,
)
prev_update_iter = param_init_net.ConstantFill(
num_rows,
Expand All @@ -764,6 +765,7 @@ def _run(self, net, param_init_net, param_info):
str(param) + "_update_counter",
shape=[shapes[str(param)][0]],
value=0.0,
dtype=core.DataType.DOUBLE,
)
prev_update_iter = param_init_net.ConstantFill(
[],
Expand Down
2 changes: 1 addition & 1 deletion caffe2/sgd/rowwise_counter.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

namespace caffe2 {

REGISTER_CPU_OPERATOR(RowWiseCounter, RowWiseCounterOp<float>);
REGISTER_CPU_OPERATOR(RowWiseCounter, RowWiseCounterOp);
OPERATOR_SCHEMA(RowWiseCounter)
.NumInputs(4)
.NumOutputs(2)
Expand Down
3 changes: 1 addition & 2 deletions caffe2/sgd/rowwise_counter.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@

namespace caffe2 {

template <typename T>
class RowWiseCounterOp final : public Operator<CPUContext> {
public:
RowWiseCounterOp(const OperatorDef& operator_def, Workspace* ws)
Expand All @@ -28,7 +27,7 @@ class RowWiseCounterOp final : public Operator<CPUContext> {
bool DoRunWithType() {
auto* prev_iter =
Output(OUTPUT_PREV_ITER)->template mutable_data<int64_t>();
auto* counter = Output(OUTPUT_COUNTER)->template mutable_data<T>();
auto* counter = Output(OUTPUT_COUNTER)->template mutable_data<double>();

const int64_t curr_iter = Input(ITER).template data<int64_t>()[0];
const auto* indices = Input(INDICES).template data<SIndex>();
Expand Down

0 comments on commit 92b7347

Please sign in to comment.