You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried using same dataset on both EconML and Dowhy functions. I am getting different ATE estimates. There is a difference of about 10-20% on average, sometimes more between the ATE estimates from both. All the variables in dataset are continuous including treatment. I have kept parameters consistent with both the frameworks along with random seed. What could explain this divergence?
EconML code:
import numpy as np
import pandas as pd
from econml.dml import DML
import xgboost as xgb
from econml.sklearn_extensions.linear_model import StatsModelsRLM
# Define the model
model_y = xgb.XGBRegressor(random_state=578,max_depth=3,n_estimators=100)
model_t = xgb.XGBRegressor(random_state=578,max_depth=3,n_estimators=100)
model_final=StatsModelsRLM(fit_intercept=True)
# Instantiate the DoubleML model
dml = DML(model_y=model_y, model_t=model_t, model_final=model_final,discrete_treatment=False,random_state=587921,cv=3)
# Fit the model
dml.fit(Y=data1['Y'], T=data1['X'], X=data1['Z'])
dml.ate(X=data1['Z'],T0=19.32,T1=19.13)
The order of samples drawn from the (potentially multiple) PRNGs could be slightly different between the two versions of the code. Even if the algorithm is conceptually identical, there would then be differences in output.
I suggest you perform the experiment 100 times with each library and plot the resulting estimate distributions. If the estimate distributions are not significantly different, then there is no bug and your estimate simply has high variance.
I tried using same dataset on both EconML and Dowhy functions. I am getting different ATE estimates. There is a difference of about 10-20% on average, sometimes more between the ATE estimates from both. All the variables in dataset are continuous including treatment. I have kept parameters consistent with both the frameworks along with random seed. What could explain this divergence?
EconML code:
DoWhy code:
Version information:
The text was updated successfully, but these errors were encountered: