feat: Synthetic difference in differences #2095

memoryz · 2023-10-12T09:08:20Z

What changes are proposed in this pull request?

This PR contains Spark implementation for 3 causal estimation methods: difference in differences, synthetic control and synthetic difference in differences.

Additional contributors to this PR:
@sarahshy @andrewnaber-msft

How is this patch tested?

I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works.

Does this PR change any dependencies?

No. You can skip this section.
Yes. Make sure the dependencies are resolved correctly, and list changes here.

Does this PR add a new feature? If so, have you added samples on website?

No. You can skip this section.
Yes. Make sure you have added samples following below steps.

Find the corresponding markdown file for your new feature in website/docs/documentation folder.
Make sure you choose the correct class estimators/transformers and namespace.
Follow the pattern in markdown file and add another section for your new API, including pyspark, scala (and .NET potentially) samples.
Make sure the DocTable points to correct API link.
Navigate to website folder, and run yarn run start to make sure the website renders correctly.
Don't forget to add  before each python code blocks to enable auto-tests for python samples.
Make sure the WebsiteSamplesTests job pass in the pipeline.

…diff

Signed-off-by: Jason Wang <[email protected]>

github-actions · 2023-10-12T09:08:34Z

Hey @memoryz 👋!
Thank you so much for contributing to our repository 🙌.
Someone from SynapseML Team will be reviewing this pull request soon.

We use semantic commit messages to streamline the release process.
Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix.
This helps us to create release messages and credit you for your hard work!

Examples of commit messages with semantic prefixes:

fix: Fix LightGBM crashes with empty partitions
feat: Make HTTP on Spark back-offs configurable
docs: Update Spark Serving usage
build: Add codecov support
perf: improve LightGBM memory usage
refactor: make python code generation rely on classes
style: Remove nulls from CNTKModel
test: Add test coverage for CNTKModel

To test your commit locally, please follow our guild on building from source.
Check out the developer guide for additional guidance on testing your change.

core/src/main/scala/com/microsoft/azure/synapse/ml/codegen/Wrappable.scala

memoryz · 2023-10-26T20:30:22Z

/azp run

azure-pipelines · 2023-10-26T20:30:32Z

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Jason Wang <[email protected]>

memoryz · 2023-10-26T23:29:45Z

/azp run

azure-pipelines · 2023-10-26T23:29:55Z

Azure Pipelines successfully started running 1 pipeline(s).

memoryz · 2023-10-26T23:34:46Z

/azp run

mhamilton723 · 2023-12-11T18:07:55Z

core/src/main/scala/com/microsoft/azure/synapse/ml/causal/BaseDiffInDiffEstimator.scala

+  }
+
+  override def copy(extra: ParamMap): DiffInDiffModel = {
+    copyValues(new DiffInDiffModel(uid))


you dont seem to handle extra, you can use defaultcopy i think

Good catch! I'm now passing extra to copyValues. The effect is same as defaultCopy.

mhamilton723 · 2023-12-11T18:10:23Z

core/src/main/scala/com/microsoft/azure/synapse/ml/causal/SharedParams.scala

+
+trait HasUnitCol extends Params {
+  final val unitCol = new Param[String](this, "unitCol",
+    "Column that identifies the units in panel data")


could you expand on this doc

mhamilton723 · 2023-12-11T18:12:20Z

core/src/main/scala/com/microsoft/azure/synapse/ml/causal/SyntheticDiffInDiffEstimator.scala

+      lossHistoryUnitWeights = Some(lossHistoryUnitWeights.toList)
+    )
+
+    copyValues(new DiffInDiffModel(this.uid))


why do you wrap in a copy?

In order to copy the shared params. The DiffInDiffModel and SyntheticDiffInDiffEstimator share these two params: unitCol and timeCol. Wrapping in a copyValues will make sure the values of these two params are copied over to the DiffInDiffModel instance.

mhamilton723 · 2023-12-11T18:15:40Z

...rc/test/scala/com/microsoft/azure/synapse/ml/causal/VerifySyntheticDiffInDiffEstimator.scala

+  // Spark mode and breeze mode should get same loss history and same solution
+  // .setLocalSolverThreshold(1)
+
+  test("SyntheticDiffInDiffEstimator can estimate the treatment effect") {


do we test both spark and breeze fitting branches?

I implemented the unit test for the linalg routine for both breeze mode and Spark mode, and since the MirrorDescent code is abstracted on top of linalg routines, I didn't implement the unit tests for the Spark fitting branch specifically. But I've just added a unit test for that with latest commit.

docs/Explore Algorithms/Causal Inference/Quickstart - Synthetic difference in differences.ipynb

mhamilton723 · 2023-12-11T18:20:27Z

docs/Explore Algorithms/Causal Inference/Quickstart - Synthetic difference in differences.ipynb

+      "name": "python"
+    },
+    "save_output": true,
+    "synapse_widget": {


can you remove the output so it doesent blow up size of NB?

mhamilton723

A few minor comments, thank you so much for this lovely contribution :)

…c difference in differences.ipynb Co-authored-by: Mark Hamilton <[email protected]>

memoryz · 2023-12-18T21:22:22Z

/azp run

memoryz · 2023-12-18T21:23:03Z

@mhamilton723 I've addressed all the comments so far. Ready for review.

mhamilton723 · 2023-12-22T17:54:21Z

/azp run

mhamilton723 · 2024-01-03T18:15:09Z

/azp run

mhamilton723 · 2024-01-03T21:00:56Z

/azp run

mhamilton723 · 2024-01-03T21:06:56Z

/azp run

azure-pipelines · 2024-01-03T21:07:08Z

Azure Pipelines successfully started running 1 pipeline(s).

memoryz · 2024-01-04T02:23:46Z

/azp run

azure-pipelines · 2024-01-04T02:23:57Z

Azure Pipelines successfully started running 1 pipeline(s).

memoryz · 2024-01-08T19:58:12Z

@mhamilton723 can you merge this PR? The failing unit tests have nothing to do with this change.

mhamilton723 · 2024-01-12T05:25:08Z

/azp run

azure-pipelines · 2024-01-12T05:25:19Z

Azure Pipelines successfully started running 1 pipeline(s).

memoryz added 18 commits October 3, 2023 21:27

Estimators for diff-in-diff, synthetic control and synthetic diff-in-…

80762ec

…diff

add more params

2dbd448

refactor

1de7aa1

Signed-off-by: Jason Wang <[email protected]>

adding unit tests for linalg

98a6659

more unit tests

29deccc

Unit test for DiffInDiffEstimator

b7293c2

more unit tests

0eb6b62

unit test for SyntheticControlEstimator

5e93636

unit test for SyntheticDiffInDiffEstimator

2c7a6a0

logClass

697b578

Python code gen

0aa9fc2

Signed-off-by: Jason Wang <[email protected]>

pyspark wrapper

5660ace

expose loss history

b2d9477

fix bugs for synthetic control

ca354cc

Merge branch 'microsoft:master' into master

37b138d

fix time effects for synthetic control estimator

b02aa6f

fix unit test

e55a8ac

add notebook

b76641b

memoryz commented Oct 12, 2023

View reviewed changes

core/src/main/scala/com/microsoft/azure/synapse/ml/codegen/Wrappable.scala Show resolved Hide resolved

fixing indexing logic

1a2e320

add file headers

03b8bbc

Signed-off-by: Jason Wang <[email protected]>

memoryz force-pushed the master branch from 30cc7df to 03b8bbc Compare October 26, 2023 22:57

Merge branch 'master' into master

3bee3d7

Add feature name to logClass call

5787e9e

mhamilton723 reviewed Dec 11, 2023

View reviewed changes

docs/Explore Algorithms/Causal Inference/Quickstart - Synthetic difference in differences.ipynb Outdated Show resolved Hide resolved

mhamilton723 reviewed Dec 11, 2023

View reviewed changes

mhamilton723 requested changes Dec 11, 2023

View reviewed changes

memoryz and others added 4 commits December 18, 2023 10:15

Merge branch 'master' into master

af8c88e

address code review comments

666d1ed

Update docs/Explore Algorithms/Causal Inference/Quickstart - Syntheti…

5cc9027

…c difference in differences.ipynb Co-authored-by: Mark Hamilton <[email protected]>

clean synapse widget output state

f9bb83c

memoryz enabled auto-merge (squash) December 18, 2023 21:22

mhamilton723 previously approved these changes Dec 22, 2023

View reviewed changes

remove invalid image links

f1bf701

memoryz dismissed mhamilton723’s stale review via f1bf701 January 4, 2024 02:23

Merge branch 'master' into master

cb8fd47

mhamilton723 disabled auto-merge January 12, 2024 05:25

mhamilton723 merged commit cbc022c into microsoft:master Jan 12, 2024
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Synthetic difference in differences #2095

feat: Synthetic difference in differences #2095

memoryz commented Oct 12, 2023 •

edited

Loading

github-actions bot commented Oct 12, 2023

memoryz commented Oct 26, 2023

azure-pipelines bot commented Oct 26, 2023

memoryz commented Oct 26, 2023

azure-pipelines bot commented Oct 26, 2023

memoryz commented Oct 26, 2023

mhamilton723 Dec 11, 2023

memoryz Dec 18, 2023

mhamilton723 Dec 11, 2023

memoryz Dec 18, 2023

mhamilton723 Dec 11, 2023

memoryz Dec 18, 2023 •

edited

Loading

mhamilton723 Dec 11, 2023 •

edited

Loading

memoryz Dec 18, 2023

mhamilton723 Dec 11, 2023

memoryz Dec 18, 2023

mhamilton723 left a comment

memoryz commented Dec 18, 2023

memoryz commented Dec 18, 2023

mhamilton723 commented Dec 22, 2023

mhamilton723 commented Jan 3, 2024

mhamilton723 commented Jan 3, 2024

mhamilton723 commented Jan 3, 2024

azure-pipelines bot commented Jan 3, 2024

memoryz commented Jan 4, 2024

azure-pipelines bot commented Jan 4, 2024

memoryz commented Jan 8, 2024

mhamilton723 commented Jan 12, 2024

azure-pipelines bot commented Jan 12, 2024

feat: Synthetic difference in differences #2095

feat: Synthetic difference in differences #2095

Conversation

memoryz commented Oct 12, 2023 • edited Loading

What changes are proposed in this pull request?

How is this patch tested?

Does this PR change any dependencies?

Does this PR add a new feature? If so, have you added samples on website?

github-actions bot commented Oct 12, 2023

memoryz commented Oct 26, 2023

azure-pipelines bot commented Oct 26, 2023

memoryz commented Oct 26, 2023

azure-pipelines bot commented Oct 26, 2023

memoryz commented Oct 26, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

memoryz Dec 18, 2023 • edited Loading

Choose a reason for hiding this comment

mhamilton723 Dec 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhamilton723 left a comment

Choose a reason for hiding this comment

memoryz commented Dec 18, 2023

memoryz commented Dec 18, 2023

mhamilton723 commented Dec 22, 2023

mhamilton723 commented Jan 3, 2024

mhamilton723 commented Jan 3, 2024

mhamilton723 commented Jan 3, 2024

azure-pipelines bot commented Jan 3, 2024

memoryz commented Jan 4, 2024

azure-pipelines bot commented Jan 4, 2024

memoryz commented Jan 8, 2024

mhamilton723 commented Jan 12, 2024

azure-pipelines bot commented Jan 12, 2024

memoryz commented Oct 12, 2023 •

edited

Loading

memoryz Dec 18, 2023 •

edited

Loading

mhamilton723 Dec 11, 2023 •

edited

Loading