Skip to content

Commit 8b94442

Browse files
authored
Create test_addition.irnb
1 parent 62e77db commit 8b94442

File tree

1 file changed

+214
-0
lines changed

1 file changed

+214
-0
lines changed

PM2/test_addition.irnb

+214
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {
6+
"id": "7HCJkA2ifjEk"
7+
},
8+
"source": [
9+
"# Simulation on Orthogonal Estimation\n"
10+
]
11+
},
12+
{
13+
"cell_type": "markdown",
14+
"metadata": {
15+
"id": "4sldk16nfXw9"
16+
},
17+
"source": [
18+
"We compare the performance of the naive and orthogonal methods in a computational experiment where\n",
19+
"$p=n=100$, $\\beta_j = 1/j^2$, $(\\gamma_{DW})_j = 1/j^2$ and $$Y = 1 \\cdot D + \\beta' W + \\epsilon_Y$$\n",
20+
"\n",
21+
"where $W \\sim N(0,I)$, $\\epsilon_Y \\sim N(0,1)$, and $$D = \\gamma'_{DW} W + \\tilde{D}$$ where $\\tilde{D} \\sim N(0,1)/4$.\n",
22+
"\n",
23+
"The true treatment effect here is 1. From the plots produced in this notebook (estimate minus ground truth), we show that the naive single-selection estimator is heavily biased (lack of Neyman orthogonality in its estimation strategy), while the orthogonal estimator based on partialling out, is approximately unbiased and Gaussian."
24+
]
25+
},
26+
{
27+
"cell_type": "code",
28+
"execution_count": null,
29+
"metadata": {
30+
"id": "dSvVz5Z6D14H",
31+
"vscode": {
32+
"languageId": "r"
33+
}
34+
},
35+
"outputs": [],
36+
"source": [
37+
"install.packages(\"hdm\")\n",
38+
"install.packages(\"ggplot2\")"
39+
]
40+
},
41+
{
42+
"cell_type": "code",
43+
"execution_count": null,
44+
"metadata": {
45+
"vscode": {
46+
"languageId": "r"
47+
}
48+
},
49+
"outputs": [],
50+
"source": [
51+
"library(hdm)\n",
52+
"library(ggplot2)"
53+
]
54+
},
55+
{
56+
"cell_type": "code",
57+
"execution_count": null,
58+
"metadata": {
59+
"_execution_state": "idle",
60+
"_uuid": "051d70d956493feee0c6d64651c6a088724dca2a",
61+
"id": "fAe2EP5VCFN_",
62+
"vscode": {
63+
"languageId": "r"
64+
}
65+
},
66+
"outputs": [],
67+
"source": [
68+
"# Initialize constants\n",
69+
"B <- 10000 # Number of iterations\n",
70+
"n <- 100 # Sample size\n",
71+
"p <- 100 # Number of features\n",
72+
"\n",
73+
"# Initialize arrays to store results\n",
74+
"Naive <- rep(0, B)\n",
75+
"Orthogonal <- rep(0, B)\n",
76+
"\n",
77+
"\n",
78+
"lambdaYs <- rep(0, B)\n",
79+
"lambdaDs <- rep(0, B)\n",
80+
"\n",
81+
"for (i in 1:B) {\n",
82+
" # Generate parameters\n",
83+
" beta <- 1 / (1:p)^2\n",
84+
" gamma <- 1 / (1:p)^2\n",
85+
"\n",
86+
" # Generate covariates / random data\n",
87+
" X <- matrix(rnorm(n * p), n, p)\n",
88+
" D <- X %*% gamma + rnorm(n) / 4\n",
89+
"\n",
90+
" # Generate Y using DGP\n",
91+
" Y <- D + X %*% beta + rnorm(n)\n",
92+
"\n",
93+
" # Single selection method\n",
94+
" rlasso_result <- hdm::rlasso(Y ~ D + X) # Fit lasso regression\n",
95+
" sx_ids <- which(rlasso_result$coef[-c(1, 2)] != 0) # Selected covariates\n",
96+
"\n",
97+
" # Check if any Xs are selected\n",
98+
" if (sum(sx_ids) == 0) {\n",
99+
" Naive[i] <- lm(Y ~ D)$coef[2] # Fit linear regression with only D if no Xs are selected\n",
100+
" } else {\n",
101+
" Naive[i] <- lm(Y ~ D + X[, sx_ids])$coef[2] # Fit linear regression with selected X otherwise\n",
102+
" }\n",
103+
"\n",
104+
" # Partialling out / Double Lasso\n",
105+
"\n",
106+
" fitY <- hdm::rlasso(Y ~ X, post = TRUE)\n",
107+
" resY <- fitY$res\n",
108+
"\n",
109+
" fitD <- hdm::rlasso(D ~ X, post = TRUE)\n",
110+
" resD <- fitD$res\n",
111+
"\n",
112+
" Orthogonal[i] <- lm(resY ~ resD)$coef[2] # Fit linear regression for residuals\n",
113+
"}"
114+
]
115+
},
116+
{
117+
"cell_type": "markdown",
118+
"metadata": {
119+
"id": "Bj174QuEaPb5"
120+
},
121+
"source": [
122+
"## Make a Nice Plot"
123+
]
124+
},
125+
{
126+
"cell_type": "code",
127+
"execution_count": null,
128+
"metadata": {
129+
"id": "MjB3qbGEaRnl",
130+
"vscode": {
131+
"languageId": "r"
132+
}
133+
},
134+
"outputs": [],
135+
"source": [
136+
"# Specify ratio\n",
137+
"img_width <- 15\n",
138+
"img_height <- img_width / 2"
139+
]
140+
},
141+
{
142+
"cell_type": "code",
143+
"execution_count": null,
144+
"metadata": {
145+
"id": "N7bdztt1CFOE",
146+
"vscode": {
147+
"languageId": "r"
148+
}
149+
},
150+
"outputs": [],
151+
"source": [
152+
"# Create a data frame for the estimates\n",
153+
"df <- data.frame(Method = rep(c(\"Naive\", \"Orthogonal\"), each = B),\n",
154+
" Value = c(Naive - 1, Orthogonal - 1))\n",
155+
"\n",
156+
"# Create the histogram using ggplot2\n",
157+
"hist_plot <- ggplot(df, aes(x = Value, fill = Method)) +\n",
158+
" geom_histogram(binwidth = 0.1, color = \"black\", alpha = 0.7) +\n",
159+
" facet_wrap(~Method, scales = \"fixed\") +\n",
160+
" labs(\n",
161+
" title = \"Distribution of Estimates (Centered around Ground Truth)\",\n",
162+
" x = \"Bias\",\n",
163+
" y = \"Frequency\"\n",
164+
" ) +\n",
165+
" scale_x_continuous(breaks = seq(-2, 1.5, 0.5)) +\n",
166+
" theme_minimal() +\n",
167+
" theme(\n",
168+
" plot.title = element_text(hjust = 0.5), # Center the plot title\n",
169+
" strip.text = element_text(size = 10), # Increase text size in facet labels\n",
170+
" legend.position = \"none\", # Remove the legend\n",
171+
" panel.grid.major = element_blank(), # Make major grid lines invisible\n",
172+
" # panel.grid.minor = element_blank(), # Make minor grid lines invisible\n",
173+
" strip.background = element_blank() # Make the strip background transparent\n",
174+
" ) +\n",
175+
" theme(panel.spacing = unit(2, \"lines\")) # Adjust the ratio to separate subplots wider\n",
176+
"\n",
177+
"# Set a wider plot size\n",
178+
"options(repr.plot.width = img_width, repr.plot.height = img_height)\n",
179+
"\n",
180+
"# Display the histogram\n",
181+
"print(hist_plot)\n"
182+
]
183+
},
184+
{
185+
"cell_type": "markdown",
186+
"metadata": {
187+
"id": "8hrJ3M5mrD8_"
188+
},
189+
"source": [
190+
"As we can see from the above bias plots (estimates minus the ground truth effect of 1), the double lasso procedure concentrates around zero whereas the naive estimator does not."
191+
]
192+
}
193+
],
194+
"metadata": {
195+
"colab": {
196+
"provenance": []
197+
},
198+
"kernelspec": {
199+
"display_name": "R",
200+
"language": "R",
201+
"name": "ir"
202+
},
203+
"language_info": {
204+
"codemirror_mode": "r",
205+
"file_extension": ".r",
206+
"mimetype": "text/x-r-source",
207+
"name": "R",
208+
"pygments_lexer": "r",
209+
"version": "3.6.3"
210+
}
211+
},
212+
"nbformat": 4,
213+
"nbformat_minor": 0
214+
}

0 commit comments

Comments
 (0)