Different ATE estimates on doubleML from EconML v/s Dowhy #1278

ankur-tutlani · 2024-11-08T09:24:13Z

I tried using same dataset on both EconML and Dowhy functions. I am getting different ATE estimates. There is a difference of about 10-20% on average, sometimes more between the ATE estimates from both. All the variables in dataset are continuous including treatment. I have kept parameters consistent with both the frameworks along with random seed. What could explain this divergence?

EconML code:

import numpy as np
import pandas as pd
from econml.dml import DML
import xgboost as xgb
from econml.sklearn_extensions.linear_model import StatsModelsRLM

# Define the model
model_y = xgb.XGBRegressor(random_state=578,max_depth=3,n_estimators=100)
model_t = xgb.XGBRegressor(random_state=578,max_depth=3,n_estimators=100)
model_final=StatsModelsRLM(fit_intercept=True)

# Instantiate the DoubleML model
dml = DML(model_y=model_y, model_t=model_t, model_final=model_final,discrete_treatment=False,random_state=587921,cv=3)

# Fit the model
dml.fit(Y=data1['Y'], T=data1['X'], X=data1['Z'])
dml.ate(X=data1['Z'],T0=19.32,T1=19.13)

DoWhy code:

model=CausalModel(
    data = data1,
    treatment='X',
    outcome='Y',
    common_causes = 'Z'
    )
    
identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)


causal_estimate = model.estimate_effect(identified_estimand,
          method_name="backdoor.econml.dml.DML",
          confidence_intervals=False,
          control_value = 19.32,
          treatment_value = 19.13,
                                        method_params={
                      "init_params":{
                        'model_y':xgb.XGBRegressor(random_state=578,max_depth=3,n_estimators=100),
                                    'model_t': xgb.XGBRegressor(random_state=578,max_depth=3,n_estimators=100),
                                    'model_final':StatsModelsRLM(fit_intercept=True), 
                                    'discrete_treatment' : False,
                                    'random_state':587921,
                                    'cv':3
                      },
                      "fit_params":{},
                      'num_null_simulations':399,'num_simulations':399})

print(causal_estimate.value)

Version information:

DoWhy version :0.11.1
EconML version: 0.15.1

The text was updated successfully, but these errors were encountered:

drawlinson · 2024-11-21T23:28:44Z

The order of samples drawn from the (potentially multiple) PRNGs could be slightly different between the two versions of the code. Even if the algorithm is conceptually identical, there would then be differences in output.

I suggest you perform the experiment 100 times with each library and plot the resulting estimate distributions. If the estimate distributions are not significantly different, then there is no bug and your estimate simply has high variance.

github-actions · 2024-12-22T02:04:25Z

This issue is stale because it has been open for 30 days with no activity.

ankur-tutlani added the question Further information is requested label Nov 8, 2024

github-actions bot added the stale label Dec 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different ATE estimates on doubleML from EconML v/s Dowhy #1278

Different ATE estimates on doubleML from EconML v/s Dowhy #1278

ankur-tutlani commented Nov 8, 2024

drawlinson commented Nov 21, 2024

github-actions bot commented Dec 22, 2024

Different ATE estimates on doubleML from EconML v/s Dowhy #1278

Different ATE estimates on doubleML from EconML v/s Dowhy #1278

Comments

ankur-tutlani commented Nov 8, 2024

drawlinson commented Nov 21, 2024

github-actions bot commented Dec 22, 2024