-
Notifications
You must be signed in to change notification settings - Fork 20
/
15_stratification.Rmd
1504 lines (1274 loc) · 81.9 KB
/
15_stratification.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Stratification {#stratification}
Stratification occurs when we are sampling units in the population of interest within some prespecified categories.
The main goal of stratification is to ensure that the surveyed sample reflects as much as possible the composition of the original population, at least in terms of the categories used to stratify.
In randomized experiments, stratification is used to ensure that the control and treatment groups are as similar as possible.
At least in theory, stratification helps to improve precision, since it decreases the amount of variation in the unobserved confounders that might not be aligned between treatment and control groups and that are the cause of sampling noise.
Stratification is thus a very powerful tool to improve the precision of treatment effect estimates, without having to increase sample size.
There are several key issues around stratification.
The first is how to implement it in practice.
One key issue is which covariates to choose and how to combine them in order not to end up with too many small cells.
A second key issue is how to estimate sampling noise after stratification.
An easy mistake is to stratify and then forget about stratification when estimating sampling noise, thereby inflating sampling noise, and missing the improvements brought about by the stratification process.
One solution to these issues that I particularly like is to conduct a pairwise RCT, that is to have strata of size two.
The main advantage of this solution is that it theoretically solves the problem of combining covariates optimally: just find the pairs of observations that look most alike and randomize the treatment between them.
One difficulty with this approach is that finding the pairs that minimize the overall distance between them is a difficult task.
Until now, it was solved using undirected search algorithms which simply go through all the possible combinations to try to find the optimal one.
This poses two problems: the pairing you find is not ensured to be the best one and finding it might take a lot of time.
In this chapter, I introduce a pairing algorithm that finds the optimal pairs in polynomial (e.g. reasonable) time.
Another approach would be to generate the strata by k-means clustering.
Intuitively, the pairwise RCT approach should minimize residual errors, but a formal proof of that result does not exist.
A full treatment of optimal stratification is still missing, at least to my knowledge, in the literature.
I am going to first examine the analysis of classical stratified experiments (with more than 2 units per strata) first, then I'll turn to pairwise RCTs and finally I'll examine how you stratify in practice.
## Analysis of classical stratified experiments {#ClassicalStratification}
We are first going to look at an example to see how stratification can help with sampling noise, basically decreasing it by a starkly large factor.
We will then study the estimation of treatment effects under stratified randomization and then the estimation of sampling noise, which is a critical phase in order for our precision estimates to reflect the increase in precision brought about by stratification.
### An example
Let us first generate a stratified random sample in order to see what stratification can do for us, and what a naive approach to the analysis of stratified experiments misses.
We are going to study the case of a Brute Forced design, and we will vary the amount of stratification, from none to a lot of strata.
In practice, there are several ways to build strata and to draw observations within each of them.
Here, I am only going to stratify on one pre-treatment variable, pre-treatment outcome.
But, in practice, stratification is done on several variables that we need to interact.
One way to build groups with similar values of these variables is to use $k$-means clustering.
$k$-means clustering finds the optimal separation of points in the sample between a pre-specified number of $k$ clusters such that the within-cluster variance is minimized (so that the strata (i.e. *clusters*) are as homogeneous as possible, which is our goal when stratifying among similar units of observations).
Whether using $k$-means clustering, or simply by selecting sets of observations which have similar values for their covariates, we end up with a set $\mathcal{S}$ of strata.
Once the strata are built, there are several ways to select the treated units within each strata.
Section 3 in [Bugny, Canay and shaikh (2018)](https://doi.org/10.1080/01621459.2017.1375934) gives several examples of stratified assignments.
The one we are going to use here is stratified block randomization.
In this approach we select, in each strata $s\in\mathcal{S}$ of size $N_s$, a number $N^1_s$ of observations for receiving the treatment, with $N^1_s=\lfloor \pi N_s \rfloor$, with $\pi$ the target proportion of treated and control units in the sample.
```{remark}
Note that with stratified block randomization and $\pi$ constant with $s$, each strata will be represented in the sample in proportion to its weight in the population.
Other designs have $\pi$ vary with $s$, but we are not going to examine them here.
See [Imbens and Rubin (2015)](https://doi.org/10.1017/CBO9781139025751) for more details on how to analyze them.
```
```{remark}
Another point worthy of notice is that stratified block randomization is not what we have used when we have drawn random samples for RCTs in Chapter \@ref(RCT).
In that chapter, we have used a simple coin design, in which each unit is affected either to the treatment or to the control as a function of an independent coin toss.
As a result, the actual proportion of of treated and control in the sample is not exactly $\pi$.
But the Law of Large Numbers ensures that the actual proportion is not very far from the target $\pi$.
Since strata are small, the Law of Large Number does not apply as fast, and imbalances might arise in pratice because of variations in the proportion of treated units across strata.
Hence, we prefer to use stratified block randomization.
```
```{example}
Let's see how this works in our ususal example of a brute force design, with stratification on the pre-treatment outcomes $y^B_i$.
```
Let us first generate a random sample.
```{r paramStrata,eval=TRUE,echo=TRUE,results='hide'}
param <- c(8,.5,.28,1500,0.9,0.01,0.05,0.05,0.05,0.1)
names(param) <- c("barmu","sigma2mu","sigma2U","barY","rho","theta","sigma2epsilon","sigma2eta","delta","baralpha")
```
```{r simulStrata,eval=TRUE,echo=TRUE,results='hide'}
set.seed(1234)
N <-1000
mu <- rnorm(N,param["barmu"],sqrt(param["sigma2mu"]))
UB <- rnorm(N,0,sqrt(param["sigma2U"]))
yB <- mu + UB
YB <- exp(yB)
epsilon <- rnorm(N,0,sqrt(param["sigma2epsilon"]))
eta<- rnorm(N,0,sqrt(param["sigma2eta"]))
U0 <- param["rho"]*UB + epsilon
y0 <- mu + U0 + param["delta"]
alpha <- param["baralpha"]+ param["theta"]*mu + eta
y1 <- y0+alpha
Y0 <- exp(y0)
Y1 <- exp(y1)
data.strata <- data.frame(yB=yB,y1=y1,y0=y0) %>%
mutate(
id = 1:N
)
```
Let us now generate several sets of strata using $k$-means clustering, with the command `stats::kmeans`.
We are going to vary the number of strata, from $1$ to $100$.
```{r BuildStrata}
# function generating $k$ strata based on a dataset of pre-treatment variables using k-means clustering
# k: number of strata
# data: the dataset used for stratification
# var: the list of variables used for stratification
stratification <- function(k,var,data){
data.strat <- data %>%
select(contains(var))
k.means <- stats::kmeans(data.strat,centers=k,nstart=k)
strata.k <- as.data.frame(k.means$cluster)
colnames(strata.k) <- c(paste("strata",k,sep='_'))
return(strata.k)
}
# test
strata.2 <- stratification(k=2,var="yB",data=data.strata)
strata.5 <- stratification(k=5,var="yB",data=data.strata)
#data <- cbind(data,strata=strata.2$cluster)
# let us now map across values of $k$
k.vec <- c(1,2,5,10,25,50,100)
test.map <- map(k.vec,stratification,var="yB",data=data.strata) %>%
bind_cols()
# final data
data.strata <- cbind(data.strata,test.map)
# counting strata size
#strata.size <- data.strata %>%
```
Let us finally visualize the grouping with 5 and 50 strata.
```{r StrataPlot,fig.cap=c('Grouping in strata using $k$-means clustering'),fig.subcap=c('5 strata','50 strata'),fig.align='center',out.width='50%',fig.pos='htbp',fig.height=8,fig.width=12}
ggplot(data.strata,aes(x=yB,y=y0,group=as.factor(strata_5),color=as.factor(strata_5),shape=as.factor(strata_5)))+
geom_point()+
theme_bw()
ggplot(data.strata,aes(x=yB,y=y0,group=as.factor(strata_50),color=as.factor(strata_50)))+
geom_point()+
theme_bw()
```
Each strata is defined as an upper and lower bound on the value of $y_i^B$, so as to be mutually exclusive.
The effectiveness of the strata at increasing precision and power stems from the fact that $y_i^B$ is strongly correlated with $y_i^0$, thereby decreasing the likelihood that treated and control units have different distributions of potential outcomes.
Because of that property, we would like the number of strata to be as big as possible.
But this generates one problem: we might end up with strata that only have one unit.
In that case, we cannot randomly allocate the treatment within the strata, and we have to exclude it from our sample, thereby losing its representativity.
This problem is even more serious if we have several treatments that we want to test against our control.
As a consequence, we cannot increase the number of strata indefinitely.
Let's see how the strata are composed in our example.
```{r StrataSize}
StrataSize <- data.strata %>%
pivot_longer(cols=contains('strata'),names_to = "strata_type",values_to="strata_value") %>%
group_by(strata_type,strata_value) %>%
summarise(
Ns = n()
) %>%
mutate(
strata_N = str_split_fixed(strata_type,'_',2)[,2],
strata_N = factor(strata_N,levels=c('1','2','5','10','25','50','100')),
strata_value = as.factor(strata_value)
)
StrataSizeMinMedMax <- StrataSize %>%
group_by(strata_type,strata_N) %>%
summarise(
Min = min(Ns),
Max = max(Ns),
Mean = mean(Ns)
) %>%
pivot_longer(cols= Min:Mean,names_to = "Type",values_to = "Stat") %>%
mutate(
Type = factor(Type,levels=c('Min','Mean','Max'))
)
```
Let us plot the resulting minimum value of strata size:
```{r StrataSizePlot,fig.cap=c('Number of strata and strata size'),fig.subcap=c('Size of each strata','Min, max and mean of strata size'),fig.align='center',out.width=c('95%','75%'),fig.pos='htbp',fig.height=8,fig.width=12}
ggplot(StrataSize %>% filter(strata_N!='1'),aes(x=strata_value,y=Ns,fill=strata_value)) +
geom_bar(stat='identity') +
geom_hline(aes(yintercept=1),color='red',linetype='dotted') +
#scale_y_continuous(trans=log_trans(),breaks=c(1,10,100))+
#expand_limits(y=0) +
scale_fill_discrete(name='Strata') +
xlab('Strata')+
ylab('Strata size')+
theme_bw() +
facet_grid(strata_N~.)
ggplot(StrataSizeMinMedMax,aes(x=strata_N,y=Stat,fill=Type,group=Type)) +
geom_bar(stat='identity',position=position_dodge()) +
geom_hline(aes(yintercept=1),color='red',linetype='dotted') +
#scale_y_continuous(trans=log_trans(),breaks=c(1,10,100))+
#expand_limits(y=0) +
#scale_fill_discrete(name='Strata') +
xlab('Number of strata')+
ylab('Strata size')+
theme_bw() #+
#facet_grid(Type~.)
```
Let us now randomly allocate the treatment among strata using block randomization.
The way I am going to do that is by randomly allocating a uniform random variable, compute its median in each strata and allocate everyone strictly below the median to the treatment, and everyone equal or above to the control.
Let's see how this works.
```{r StratifiedBlock}
# number of strata types
NStrataTypes <- data.strata %>% select(contains('strata')) %>% colnames(.) %>% length(.)
set.seed(1234)
data.strata <- data.strata %>%
pivot_longer(cols=contains('strata'),names_to = "strata_type",values_to="strata_value") %>%
mutate(
strata_type_value = str_c(strata_type,strata_value,sep='_'),
strata_value = as.factor(strata_value),
strata_type = as.factor(strata_type)
) %>%
mutate(
randU = runif(N*NStrataTypes)
) %>%
group_by(strata_type_value) %>%
mutate(
randUMedStrata = stats::median(randU),
Ds = case_when(
randU<randUMedStrata ~ 1,
TRUE ~ 0
),
y = y1*Ds+(1-Ds)*y0,
Ns = n(),
pis = mean(Ds)
) %>%
arrange(strata_type,Ns,strata_value,randU)
```
Let us now examine how to estimate the treatment effect in this sample.
One key question here is how to compute the treatment effect estimate.
There are at least four different ways:
1. Use the simple with/without estimator.
2. Use a weighted average of the treatment effects in each strata, with weights proportionnal to the inverse of their variance.
3. Use an OLS regression with the treatment dummy and a set of strata fixed effects.
4. Use an OLS regression with the treatment dummy and a set of strata fixed effects interacted with the treatment indicator (this is a fully saturated model).
We will examine later the rationale for each of these estimators.
For now, I am going to compute the distribution of the simple with/without estimator and of the strata fixed effects estimator.
```{r StrataReg}
# WW
regWW <- data.strata %>%
group_by(strata_type) %>%
nest(.) %>%
# rowwise(.) %>%
mutate(
reg = map(data,~lm(y~Ds,data=.)),
tidied = map(reg,tidy)
) %>%
unnest(tidied) %>%
filter(term == 'Ds') %>%
rename(
Coef = estimate,
Se = std.error
) %>%
select(strata_type,Coef,Se)%>%
mutate(Type='WW')
# strata fixed effects
regSFE <- data.strata %>%
filter(strata_type!='strata_1') %>%
group_by(strata_type) %>%
nest(.) %>%
# rowwise(.) %>%
mutate(
reg = map(data,~feols(y~Ds|strata_value,vcov="HC1",data=.)),
tidied = map(reg,tidy)
) %>%
unnest(tidied) %>%
filter(term == 'Ds') %>%
rename(
Coef = estimate,
Se = std.error
) %>%
select(strata_type,Coef,Se) %>%
mutate(Type='SFE')
# fully staturated model
# run a regression in each strata
regSAT <- data.strata %>%
# filter(strata_type!='strata_1') %>%
group_by(strata_type_value) %>%
# generating the number of treated observations in each strata
# and the proportion of observations in each strata
mutate(
Ns1 = sum(Ds),
qs = Ns/N
) %>%
filter(Ns1>1,N-Ns1>1) %>%
nest(.) %>%
# rowwise(.) %>%
mutate(
reg = map(data,~feols(y~Ds|1,vcov="HC1",data=.)),
tidied = map(reg,tidy)
) %>%
unnest(tidied) %>%
filter(term == 'Ds') %>%
rename(
Coef = estimate,
Se = std.error
) %>%
select(strata_type_value,Coef,Se)
# weighted mean function over a dataset with a formula for use in nest
# formula: formula with outcome as regressand and weigths as regressors
# data: the dataset used
weighted.mean.nest <- function(form,data,...){
name.y <- as.character(form[[2]])
name.w <- as.character(form[[3]])
w.mean <- weighted.mean(x=data[name.y],w=data[name.w],...)
return(w.mean)
}
# testing
test.wmean <- weighted.mean.nest(Coef~Se,data=regSAT,na.rm=TRUE)
# averaging back using strata proportions
regSAT.inter <- data.strata %>%
group_by(strata_type,strata_value,strata_type_value) %>%
summarize(
qs = mean(Ns/N),
Ns=mean(Ns),
Ns1 = sum(Ds)
) %>%
# joigning with regression estimates
left_join(regSAT,by='strata_type_value') %>%
# creating variance terms
mutate(
Var = Ns*Se^2
) %>%
group_by(strata_type) %>%
nest(.) %>%
mutate(
Coef = map_dbl(data,~weighted.mean.nest(Coef~qs,data=.,na.rm=TRUE)),
Var = map_dbl(data,~weighted.mean.nest(Var~qs,data=.,na.rm=TRUE))
) %>%
mutate(
Se = sqrt(Var/N)
) %>%
select(strata_type,Coef,Se) %>%
mutate(Type='SAT')
# Bugni, Canay and Shaikh correction (adding VH)
regSAT.BCS <- data.strata %>%
group_by(strata_type,strata_value,strata_type_value) %>%
summarize(
qs = mean(Ns/N),
Ns=mean(Ns),
Ns1 = sum(Ds)
) %>%
# joigning with regression estimates
left_join(regSAT,by='strata_type_value') %>%
select(contains('strata'),qs,Coef) %>%
rename(WW=Coef) %>%
left_join(regWW,by='strata_type') %>%
select(contains('strata'),qs,Coef,WW) %>%
mutate(
diffCoefSq = (Coef-WW)^2
) %>%
group_by(strata_type) %>%
nest(.) %>%
mutate(
VH = map_dbl(data,~weighted.mean.nest(diffCoefSq~qs,data=.,na.rm=TRUE))
) %>%
select(-data) %>%
left_join(regSAT.inter,by='strata_type') %>%
mutate(
Se = sqrt(Se^2+VH/N),
Type = "SAT.BCS"
) %>%
select(-VH)
# regrouping
reg <- rbind(regWW,regSFE,regSAT.inter,regSAT.BCS)
```
Let us now write a function to parallelize these simulations.
```{r montecarlobruteforcewwyBStrata,dependson='paramStrata',cache=TRUE}
# function generating one set of simulated estimates of the WW and SFE estimator under stratification
# s: random seed
# N: sample size
# Nstrata: number of strata (can be a vector)
# param: parameter to generate sample
monte.carlo.brute.force.ww.yB.strata <- function(s,N,Nstrata,param){
# generating random sample
set.seed(s)
mu <- rnorm(N,param["barmu"],sqrt(param["sigma2mu"]))
UB <- rnorm(N,0,sqrt(param["sigma2U"]))
yB <- mu + UB
YB <- exp(yB)
epsilon <- rnorm(N,0,sqrt(param["sigma2epsilon"]))
eta<- rnorm(N,0,sqrt(param["sigma2eta"]))
U0 <- param["rho"]*UB + epsilon
y0 <- mu + U0 + param["delta"]
alpha <- param["baralpha"]+ param["theta"]*mu + eta
y1 <- y0+alpha
Y0 <- exp(y0)
Y1 <- exp(y1)
data.strata <- data.frame(yB=yB,y1=y1,y0=y0) %>%
mutate(
id = 1:N
)
# generating strata
strata <- map(Nstrata,stratification,var="yB",data=data.strata) %>%
bind_cols()
data.strata <- cbind(data.strata,strata)
# allocating the treatment
# number of strata types
NStrataTypes <- data.strata %>% select(contains('strata')) %>% colnames(.) %>% length(.)
set.seed(s)
data.strata <- data.strata %>%
pivot_longer(cols=contains('strata'),names_to = "strata_type",values_to="strata_value") %>%
mutate(
strata_type_value = str_c(strata_type,strata_value,sep='_'),
strata_value = as.factor(strata_value),
strata_type = as.factor(strata_type)
) %>%
mutate(
randU = runif(N*NStrataTypes)
) %>%
group_by(strata_type_value) %>%
mutate(
randUMedStrata = stats::median(randU),
Ds = case_when(
randU<randUMedStrata ~ 1,
TRUE ~ 0
),
y = y1*Ds+(1-Ds)*y0,
Ns = n(),
pis = mean(Ds)
) %>%
arrange(strata_type,Ns,strata_value,randU)
# estimating the treatment effects
# WW
regWW <- data.strata %>%
group_by(strata_type) %>%
nest(.) %>%
# rowwise(.) %>%
mutate(
reg = map(data,~feols(y~Ds|1,vcov="HC1",data=.)),
tidied = map(reg,tidy)
) %>%
unnest(tidied) %>%
filter(term == 'Ds') %>%
rename(
Coef = estimate,
Se = std.error
) %>%
select(strata_type,Coef,Se)%>%
mutate(Type='WW')
# strata fixed effects
regSFE <- data.strata %>%
filter(strata_type!='strata_1') %>%
group_by(strata_type) %>%
nest(.) %>%
# rowwise(.) %>%
mutate(
reg = map(data,~feols(y~Ds|strata_value,vcov="HC1",data=.)),
tidied = map(reg,tidy)
) %>%
unnest(tidied) %>%
filter(term == 'Ds') %>%
rename(
Coef = estimate,
Se = std.error
) %>%
select(strata_type,Coef,Se) %>%
mutate(Type='SFE')
# fully staturated model
# run a regression in each strata
regSAT <- data.strata %>%
# filter(strata_type!='strata_1') %>%
group_by(strata_type_value) %>%
# generating the number of treated observations in each strata
# and the proportion of observations in each strata
mutate(
Ns1 = sum(Ds),
qs = Ns/N
) %>%
filter(Ns1>1,N-Ns1>1) %>%
nest(.) %>%
# rowwise(.) %>%
mutate(
reg = map(data,~feols(y~Ds|1,vcov="HC1",data=.)),
tidied = map(reg,tidy)
) %>%
unnest(tidied) %>%
filter(term == 'Ds') %>%
rename(
Coef = estimate,
Se = std.error
) %>%
select(strata_type_value,Coef,Se)
# averaging back using strata proportions
regSAT.inter <- data.strata %>%
group_by(strata_type,strata_value,strata_type_value) %>%
summarize(
qs = mean(Ns/N),
Ns=mean(Ns),
Ns1 = sum(Ds)
) %>%
# joigning with regression estimates
left_join(regSAT,by='strata_type_value') %>%
# creating variance
mutate(
Var = Se^2*Ns
) %>%
group_by(strata_type) %>%
nest(.) %>%
mutate(
Coef = map_dbl(data,~weighted.mean.nest(Coef~qs,data=.,na.rm=TRUE)),
Var = map_dbl(data,~weighted.mean.nest(Var~qs,data=.,na.rm=TRUE))
) %>%
mutate(
Se = sqrt(Var/N)
) %>%
select(strata_type,Coef,Se) %>%
mutate(Type='SAT')
# Bugni, Canay and Shaikh correction (adding VH)
regSAT.BCS <- data.strata %>%
group_by(strata_type,strata_value,strata_type_value) %>%
summarize(
qs = mean(Ns/N),
Ns=mean(Ns),
Ns1 = sum(Ds)
) %>%
# joigning with regression estimates
left_join(regSAT,by='strata_type_value') %>%
select(contains('strata'),qs,Coef) %>%
rename(WW=Coef) %>%
left_join(regWW,by='strata_type') %>%
select(contains('strata'),qs,Coef,WW) %>%
mutate(
diffCoefSq = (Coef-WW)^2
) %>%
group_by(strata_type) %>%
nest(.) %>%
mutate(
VH = map_dbl(data,~weighted.mean.nest(diffCoefSq~qs,data=.,na.rm=TRUE))
) %>%
select(-data) %>%
left_join(regSAT.inter,by='strata_type') %>%
mutate(
Se = sqrt(Se^2+VH/N),
Type = "SAT.BCS"
) %>%
select(-VH)
# regrouping
reg <- rbind(regWW,regSFE,regSAT.inter,regSAT.BCS)
# adding the SFE estimator for strata 1
regStrata1 <- reg %>%
filter(strata_type=='strata_1') %>%
mutate(
Type="SFE"
)
reg <- rbind(reg,regStrata1) %>%
mutate(
IdSim=s
)
return(reg)
}
simuls.brute.force.ww.yB.N.strata <- function(N,Nsim,Nstrata,param){
# simuls.brute.force.ww.yB.N.strata <- as.data.frame(matrix(unlist(lapply(1:Nsim,monte.carlo.brute.force.ww.yB.strata,N=N,param=param,Nstrata=Nstrata)),nrow=2*length(Nstrata)*Nsim,ncol=5,byrow=TRUE))
simuls.brute.force.ww.yB.N.strata <- lapply(1:Nsim,monte.carlo.brute.force.ww.yB.strata,N=N,param=param,Nstrata=Nstrata) %>%
bind_rows(.)
# colnames(simuls.brute.force.ww.yB.N.strata) <- c('strata_type','Coef','Se','Type','IdSim')
return(simuls.brute.force.ww.yB.N.strata)
}
sf.simuls.brute.force.ww.yB.N.strata <- function(N,Nsim,Nstrata,param,ncpus){
sfInit(parallel=TRUE,cpus=ncpus)
sfExport('stratification')
sfExport('weighted.mean.nest')
sfLibrary(tidyverse)
sfLibrary(stats)
sfLibrary(fixest)
sfLibrary(broom)
# sim <- as.data.frame(matrix(unlist(sfLapply(1:Nsim,monte.carlo.brute.force.ww.yB.strata,N=N,param=param,Nstrata=Nstrata)),nrow=2*length(Nstrata)*Nsim,ncol=5,byrow=TRUE))
sim <- sfLapply(1:Nsim,monte.carlo.brute.force.ww.yB.strata,N=N,param=param,Nstrata=Nstrata) %>%
bind_rows(.)
sfStop()
return(sim)
}
Nsim <- 1000
#Nsim <- 10
#N.sample <- c(100,1000,10000,100000)
#N.sample <- c(100,1000,10000)
#N.sample <- c(100,1000)
N.sample <- c(1000)
Nstrata <- k.vec
#test <- simuls.brute.force.ww.yB.N.strata(N=N.sample,Nsim=Nsim,Nstrata=Nstrata,param=param)
simuls.brute.force.ww.yB.strata <- lapply(N.sample,sf.simuls.brute.force.ww.yB.N.strata,Nsim=Nsim,Nstrata=Nstrata,param=param,ncpus=2*ncpus)
names(simuls.brute.force.ww.yB.strata) <- N.sample
```
Let us now check the results.
```{r montecarlobruteforcewwyBStrataSummary}
SummaryStrata <- simuls.brute.force.ww.yB.strata[[1]] %>%
group_by(strata_type,Type) %>%
summarize(
MeanCoef = mean(Coef,na.rm=TRUE),
SdCoef = sd(Coef,na.rm=TRUE),
MeanSe = mean(Se,na.rm=TRUE)
) %>%
mutate(
strata_size = str_split_fixed(strata_type,'\\_',n=2)[,2],
strata_size = factor(strata_size,levels=k.vec),
strata_type = factor(strata_type,levels=paste('strata',k.vec,sep='_')),
ATE = delta.y.ate(param),
Type = factor(Type,levels=c('WW','SFE','SAT','SAT.BCS'))
)
```
```{r montecarlobruteforcewwyBStrataPlot,fig.cap=c('Distribution of estimators under various levels of stratification'),fig.align='center',out.width=cst,fig.pos='htbp',fig.height=8,fig.width=12}
ggplot(SummaryStrata %>% filter(Type!='SAT.BCS'),aes(x=strata_size,y=MeanCoef,color=Type,shape=Type))+
#geom_histogram(binwidth=.01, alpha=0.5,position='identity') +
#geom_density(alpha=0.3)+
#geom_bar(stat='identity',position=position_dodge())+
geom_pointrange(aes(ymin=MeanCoef-1.96*SdCoef,ymax=MeanCoef+1.96*SdCoef,color=Type,shape=Type),position=position_dodge(0.2))+
geom_hline(aes(yintercept=ATE),color='red',linetype='dotted') +
expand_limits(y=0)+
xlab('Number of strata')+
theme_bw()
```
Figure \@ref(fig:montecarlobruteforcewwyBStrataPlot) clearly shows that stratification can increase precision.
For the $WW$ estimator, 99\% sampling noise moves from `r round(2*qnorm((1+delta)/2)*(SummaryStrata %>% filter(strata_type=='strata_1',Type=='WW') %>% pull(SdCoef)),2)` with one strata to `r round(2*qnorm((1+delta)/2)*(SummaryStrata %>% filter(strata_type=='strata_50',Type=='WW') %>% pull(SdCoef)),2)` with 50 strata.
### Estimating treatment effects with stratified randomized controlled trials
There are several ways to estimate average treatment effects under stratification.
We have seen two un our example: the simple $WW$ estimator and the strata fixed effects estimator.
We are going to see that we could also use a fully saturated model, that in general, is not necessary, except when the target proportion of treated individuals changes with strata.
Finally, we will study the precision weighted estimator, which is an instance of a meta-analytic estimator.
#### Using the $WW$ estimator
The $WW$ estimator simply compares the mean outcome of the treated and untreated individuals.
As we have already seen in Chapter \@ref(RCT), this estimator can be computed using a simple OLS regression.
We know thanks to Theorem \@ref(thm:BFATE) that the $WW$ estimator is consistent under randomization, whatever the randomization device.
```{remark}
There is confusion as to whether this estimator will reflect the increase in precision brought about by stratification.
Figure \@ref(fig:montecarlobruteforcewwyBStrataPlot) shows that it is indeed the case: the precision of the $WW$ estimator increases with the number of strata (up to 25 to 50 strata at least).
Intuitively, this makes sense since the sample on which the $WW$ estimator is estimated is already balanced on the observed characteristics used for stratification, and therefore its distribution is less affected by imbalances in these observed factors.
```
```{remark}
A different question is whether the default estimators of sampling noise of the $WW$ estimator reflect the true extent of sampling noise, and especially are able to reflect the increase in precision due to stratification.
The answer is in general no, and we will study estimators of sampling noise in the next section.
```
#### Using the strata fixed effects estimator
The strata fixed effects estimator is defined as the OLS estimate of $\beta^{SFE}$ in the following regression:
\begin{align*}
Y_i & = \sum_{s\in\mathcal{S}}\alpha^{SFE}_s\uns{\mathbf{s}(i)=s}+\beta^{SFE}R_i + \epsilon^{SFE}_i,
\end{align*}
where $\mathbf{s}(i)$ associates to each observation $i$ in the sample the strata $s$ to which it belongs.
Conversely, and for later uses, $\mathcal{I}_s$ is the set of units that belong to strata $s$.
What does $\beta^{SFE}$ estimate?
We can first show that it is a weighted average of strata-level $WW$ comparisons.
For that, we need a slightly stronger assumption as the usual full rank assumption we have been using to analyze RCTs:
```{hypothesis,fullrankStrata,name='Full rank within strata'}
We assume that there is at least one observation in each strata that receives the treatment and one observation that does not receive it:
\begin{align*}
\forall s \in \mathcal{S}, \exists i,j\in \mathcal{I}_s \text{ such that } & D_i=1 \& D_j=0.
\end{align*}
```
```{theorem,SFEdecomp,name='Decomposing the strata fixed effects estimator'}
Under Assumption \@ref(hyp:fullrankStrata), the strata fixed effects estimator is a weighted average of strata-level $WW$ estimators:
\begin{align*}
\hat{\beta}^{SFE} & = \sum_{s\in\mathcal{S}}w^{SFE}_s\hat{\Delta}^Y_{WW_s}\\
\hat{\Delta}^Y_{WW_s} & =\frac{\sum_{i:\mathbf{s}(i)=s}R_iY_i}{\sum_{i:\mathbf{s}(i)=s}R_i}-\frac{\sum_{i:\mathbf{s}(i)=s}(1-R_i)Y_i}{1-\sum_{i:\mathbf{s}(i)=s}R_i}\\
w^{SFE}_s & = \frac{N_s\bar{R}_s(1-\bar{R}_s)}{\sum_{s\in\mathcal{S}}N_s\bar{R}_s(1-\bar{R}_s)}
\end{align*}
with $\bar{R}_s=\frac{1}{N_s}\sum_{i:\mathbf{s}(i)=s}R_i$ and $N_s$ is the size of strata $s$.
```
```{proof}
See in Appendix \@ref(proofSFEdecomp).
```
In order to prove the consistency of the strata fixed effects estimator, we need to slightly alter the usual independence assumption so that it reflects the fact that randomization occurs within each strata:
```{hypothesis,independenceStrata,name="Independence within strata"}
We assume that the allocation of the program is independent of potential outcomes within each strata:
\begin{align*}
R_i\Ind(Y_i^0,Y_i^1)|\mathbf{s}(i).
\end{align*}
```
We can now prove the consistency of the strata fixed effects estimator:
```{theorem,SFEconsistent,name='Consistency of the strata fixed effects estimator'}
Under Assumptions \@ref(hyp:independenceStrata), \@ref(def:BF) and \@ref(hyp:fullrankStrata), the strata fixed effects estimator is a consistent estimator of the Average Treatment Effect:
\begin{align*}
\plims\hat{\beta}^{SFE} & = \Delta^Y_{ATE}.
\end{align*}
```
```{proof}
See in Appendix \@ref(proofSFEconsistent).
```
```{remark}
Note that Theorem \@ref(thm:SFEconsistent) does not prove that the strata fixed effects estimator is unbiased.
I am very close to proving that using an approach similar to the one developed in the proof of Lemma \@ref(lem:unbiasww).
One thing that makes me fail in completing the proof is that the weights $w_s$ are not independent of each other, neither is the numerator and denominator of each of them (because they share one strata in the denominator).
Assuming this away (because each strata is small relative to the others) enables to prove unbiasedness.
I am not aware of a general proof of unbiasedness for the strata fixed effects estimator.
Let me know if you can find one.
```
#### Using the fully saturated estimator
A natural approach, similar to the one we used with DID with staggered designs in Section \@ref(DIDStaggered), is to estimate a treatment effect within each strata, and then to take a weighted average of these estimates with the appropriate weights in order to recover an adequate average treatment effect.
This approach is akin to estimating a fully saturated model, since each strata has its own separate set of intercept and slope.
The way we implement this estimator is by estimating the following model by OLS:
\begin{align*}
Y_i & = \sum_{s\in\mathcal{S}}\alpha^{SAT}_s\uns{\mathbf{s}(i)=s}+\sum_{s\in\mathcal{S}}\beta_s^{SAT}\uns{\mathbf{s}(i)=s}R_i + \epsilon^{SAT}_i,
\end{align*}
and then to reaggregate the individual strata estimates using the proportion of the observations that lie in each strata:
\begin{align*}
\Delta^Y_{SAT} & = \sum_{s\in\mathcal{S}}w^{SAT}_s\beta_s^{SAT}\\
w^{SAT}_s & = \frac{N_s}{N}.
\end{align*}
We can prove the following result:
```{theorem,SATUnbiasedConsistent,name='Unbiasedness and consistency of the fully saturated model'}
Under Assumptions \@ref(def:independence), \@ref(def:BF) and \@ref(hyp:fullrankStrata), the fully saturated model is an unbiased and consistent estimator of the Average Treatment Effect:
\begin{align*}
\esp{\hat{\Delta}^{Y}_{SAT}} & = \Delta^Y_{ATE}\\
\plims\hat{\Delta}^{Y}_{SAT} & = \Delta^Y_{ATE}.
\end{align*}
```
```{proof}
See in Appendix \@ref(proofSATUnbiasedConsistent).
```
```{remark}
[Imbens and Rubin (2015)](https://www.cambridge.org/core/books/causal-inference-for-statistics-social-and-biomedical-sciences/71126BE90C58F1A431FE9B2DD07938AB) define the fully saturated model as follows:
\begin{align*}
Y_i & = \sum_{s\in\mathcal{S}}\alpha^{SAT}_s\uns{\mathbf{s}(i)=s}+\sum_{s\in\mathcal{S}\setminus{S}}\gamma^{SAT}_sR_i\left(\uns{\mathbf{s}(i)=s}-\uns{\mathbf{s}(i)=S}\frac{N_s}{N_S}\right)\\
& \phantom{=} +\beta^{SAT}R_i\frac{\uns{\mathbf{s}(i)=S}}{\frac{N_S}{N}} + \epsilon^{SAT}_i.
\end{align*}
Their Theorem 9.2 shows that this definition is numerically equivalent to that in Theorem \@ref(thm:SATUnbiasedConsistent).
I prefer using the OLS estimator in each strata and then averaging rather than building the complex estimator that Imbens and Rubin use.
```
### Estimating sampling noise in stratified randomized controlled trials
We now have consistent (and even sometimes unbiased) estimators of the average treatment effect in a stratified experiment.
Can we now estimate the sampling noise around these estimators?
```{remark}
One especially important question is whether we can form estimates of sampling noise that reflect the fact that stratification decreases sampling noise (if the stratification variables are correlated with potential outcomes).
For example, the usual CLT-based estimator that we derived in Theorem \@ref(thm:asympnoiseWW) does not take into account that some stratification occurred.
It will give an estimate of sampling noise that is correct for the case of one unique strata, but it fail at delivering estimates reflecting the increases in precision that occur after stratification.
```
```{remark}
Note that stratification improvemes precision also for the simple WW estimator, as Figure \@ref(fig:montecarlobruteforcewwyBStrataPlot) shows.
This means that even if the simple OLS estimator is consistent for the average treatment effect, its standard error will not.
In general, it will be too big, and therefore confidence intervals based on it will be too large, which is a shame.
That is why using consistent estimators of sampling noise matters.
```
```{remark}
A second concern is whether the default (heteroskedasticity robust) standard errors of the strata fixed effects estimator (SFE) provides an accurate estimate of the sampling noise of that estimator.
[Imbens and Rubin (2015)](https://www.cambridge.org/core/books/causal-inference-for-statistics-social-and-biomedical-sciences/71126BE90C58F1A431FE9B2DD07938AB) argue that it is the case while [Bugny, Canay and shaikh (2018)](https://doi.org/10.1080/01621459.2017.1375934) and [Bugny, Canay and shaikh (2019)](https://doi.org/10.3982/QE1150) disagree.
As [Bugny, Canay and shaikh (2019)](https://doi.org/10.3982/QE1150) explain in their Remark 5.1, the main difference between their approaches is whether you assume that your sampling scheme can be characterized by $\frac{N_s}{N}$ being constant with $N$.
One way in which this might be true is that, each time we draw a new sample, we do not increase the number strata with sample size, and we keep the same definition for each strata and the same proportion of observations in each strata (up to rounding errors).
Otherwise, if we either increase the number of strate with sample size, or if we cannot keep the relative size of each strata constant as sample size grows (for example if we change the definition of strata with each sample), we might have a problem.
In what follows, I will present estimators of sampling noise under the simpler assumption of $\frac{N_s}{N}$ being constant with $N$ and I will then explain what relaxing that assumption does to our estimates.
```
#### Estimating sampling noise under constant strata proportions
##### Estimating sampling noise of the fully saturated estimator
To estimate the distribution of the fully saturated estimator under stratification, we need to slightly alter our basic assumptions:
```{hypothesis,iidStrata,name='i.i.d. sampling with strata'}
We assume that the observations in the sample are identically and independently distributed within and across strata:
\begin{align*}
\forall i,j\leq N\text{, }i\neq j\text{, } & (Y_i,R_i,\mathbf{s}(i))\Ind(Y_j,R_j,\mathbf{s}(j)),\\
& (Y_i,R_i,\mathbf{s}(i))\&(Y_j,R_j,\mathbf{s}(j))\sim F_{Y,R,\mathbf{s}}.
\end{align*}
```
```{hypothesis,finitevarStrata,name='Finite variance of potential outcomes within strata'}
We assume that $\var{Y^1|R_i=1,\mathbf{s}(i)=s}$ and $\var{Y^0|R_i=0,\mathbf{s}(i)=s}$ are finite, $\forall s\in\mathcal{S}$.
```
Finally, we are going to make an additional strong assumption that the proportion of observations in each strata in the sample is equal to the proportion of observations in each strata in the population:
```{hypothesis,cstStrataProportion,name='Constant strata proportions'}
We assume that $\frac{N_s}{N}=p_s$, where $p_s$ is the proportion of units of strata $s$ in the population.
```
```{remark}
Assumption \@ref(hyp:cstStrataProportion) holds when you choose the proportion of each strata before collecting the data based on some additional information (for example, the full census of units you are going to survey).
If you know for example the actual proportions of men and women in your target population from administrative data counting all of these observations, Assumption \@ref(hyp:cstStrataProportion) is very close to holding.
The only issue is that the total population is actually one sample from a super population where the actual proportion of each strata might be slightly different, but for all purposes, this is not operating here, at least as long as the actual strata proportions are the ones that are relevant for your full census.
Things are different when you build the strata once the sample has been collected and you use the sample data to compute the strata proportions.
In that case, your estimates of the strata proportions are not equal to the population ones, and Assumption \@ref(hyp:cstStrataProportion) does not hold.
Note that in our numerical example, Assumption \@ref(hyp:cstStrataProportion) does not hold.
```
Equipped with our assumptions, we can now prove the following result:
```{theorem,asympnoiseSATStrata,name="Asymptotic estimate of sampling noise of the fully saturated estimator under stratification"}
Under Assumptions \@ref(hyp:fullrankStrata), \@ref(hyp:independenceStrata), \@ref(hyp:iidStrata), \@ref(hyp:finitevarStrata) and \@ref(hyp:cstStrataProportion), the asymptotic distribution of the fully saturated estimator can be approximated as follows:
\begin{align*}
\sqrt{N}\left(\hat{\Delta}^{Y}_{SAT}-\Delta^{Y}_{ATE}\right) & \distr \mathcal{N}\left(0,\sum_{s\in\mathcal{S}}p_s\left(\frac{\var{Y_i^1|R_i=1,\mathbf{s}(i)=s}}{\Pr(R_i=1)}+\frac{\var{Y_i^0|R_i=0,\mathbf{s}(i)=s}}{1-\Pr(R_i=1)}\right)\right).
\end{align*}
```
```{proof}
See in Appendix \@ref(proofasympnoiseSATStrata).
```
```{remark}
The shape of the sampling noise estimate stemming from Theorem \@ref(thm:asympnoiseSATStrata) is very close to the one we have developed for observational estimators.
```
```{remark}
How to form an estimator of sampling noise based on Theorem \@ref(thm:asympnoiseSATStrata)?
One way to do it is to recover the heteroskedasticity-robust estimate of sampling noise within each strata, to multiply it by the strata size, to average the resulting terms across strata using $p_s$ as weights and to divide the resulting estimate by the total sample size and tehn to take the square root to obtain the estimated standard error.
Note that the estimator of teh stanbdard error within each strata can only be formed when there are at least 2 treated and 2 untreated observations in each strata, otherwise the variance of the outcomes for each group in each strata cannot be computed.
```
```{example}
We have already ran this estimator in our numerical example using the `feols` command with the `vcov='HC1'` option.
Here are the results:
```
```{r montecarlobruteforcewwyBStrataEstimSePlot,fig.cap=c('Distribution of estimators under various levels of stratification and estimated sampling noise (dashed lines)'),fig.align='center',out.width=cst,fig.pos='htbp',fig.height=8,fig.width=12}
ggplot(SummaryStrata,aes(x=strata_size,y=MeanCoef,color=Type,shape=Type))+
#geom_histogram(binwidth=.01, alpha=0.5,position='identity') +
#geom_density(alpha=0.3)+
#geom_bar(stat='identity',position=position_dodge())+
geom_pointrange(aes(ymin=MeanCoef-1.96*SdCoef,ymax=MeanCoef+1.96*SdCoef,color=Type,shape=Type),position=position_dodge(0.2))+
geom_errorbar(aes(ymin=MeanCoef-1.96*MeanSe,ymax=MeanCoef+1.96*MeanSe,color=Type,shape=Type),position=position_dodge(0.2),linetype='dashed',width=0.2)+
geom_hline(aes(yintercept=ATE),color='red',linetype='dotted') +
expand_limits(y=0)+
xlab('Number of strata')+
theme_bw()
```
As expected, the fully saturated model performs very well at estimating the actual level of sampling noise.
Note that the default OLS estimator of sampling noise completely fails to reflect the precision gains due to stratification.
##### Estimating sampling noise of the strata fixed effects estimator
The strata fixed effects estimator is very similar to the fully saturated estimator, as Theorem \@ref(thm:SFEdecomp) shows.
They former differs only by having the proportion of treated and untreated in each strata appear in the formula for the weights.
There are two key questions that one can then ask:
1. What is the heteroskedasticity-consistent CLT-based approximation to the distribution of the strata fixed effects estimator?
2. Does this distribution differ from the one of the fully saturated estimator and if yes how?
The following theorem answers the first question:
```{theorem,asympnoiseSFEStrata,name="Asymptotic estimate of sampling noise of the strata fixed effects estimator"}
Under Assumptions \@ref(hyp:fullrankStrata), \@ref(hyp:independenceStrata), \@ref(hyp:iidStrata), \@ref(hyp:finitevarStrata) and \@ref(hyp:cstStrataProportion), the asymptotic distribution of the fully saturated estimator can be approximated as follows:
\begin{align*}
\sqrt{N}\left(\hat{\beta}^{SFE}-\Delta^{Y}_{ATE}\right) & \distr \mathcal{N}\left(0,\sum_{s\in\mathcal{S}}\frac{(w^{SFE}_s)^2}{p_s}\left(\frac{\var{Y_i^1|R_i=1,\mathbf{s}(i)=s}}{\Pr(R_i=1|\mathbf{s}(i)=s)}+\frac{\var{Y_i^0|R_i=0,\mathbf{s}(i)=s}}{1-\Pr(R_i=1|\mathbf{s}(i)=s)}\right)\right).
\end{align*}
```
```{proof}
See in Appendix \@ref(proofasympnoiseSFEStrata).
```
The following corollary answers the second question:
```{corollary,asympnoiseSFEStrataCor,name="Equality of sampling noise for the strata fixed effects estimator and the fully saturated estimator"}
Under Assumptions \@ref(hyp:fullrankStrata), \@ref(hyp:independenceStrata), \@ref(hyp:iidStrata), \@ref(hyp:finitevarStrata) and \@ref(hyp:cstStrataProportion), and assuming that the proportion of treated units is equal to $\pi$ in each strata, we have:
\begin{align*}
\sqrt{N}\left(\hat{\beta}^{SFE}-\Delta^{Y}_{ATE}\right) & \distr \mathcal{N}\left(0,\sum_{s\in\mathcal{S}}p_s\left(\frac{\var{Y_i^1|R_i=1,\mathbf{s}(i)=s}}{\Pr(R_i=1)}+\frac{\var{Y_i^0|R_i=0,\mathbf{s}(i)=s}}{1-\Pr(R_i=1)}\right)\right).
\end{align*}
```
```{proof}
If $\bar{R}_s=\pi$, we have $w^{SFE}_s=p^s$ and $\Pr(R_i=1|\mathbf{s}(i)=s)=\Pr(R_i=1)$, under Assumption \@ref(hyp:cstStrataProportion).
Using Theorem \@ref(thm:asympnoiseSFEStrata) proves the result.
```
```{remark}
As a consequence, if the proportion of treated in each strata is roughly constant, both the strata fixed effect estimator and the fully saturated estimator will have similar sampling noise.
```
```{remark}
How to estimate the sampling noise of the strata fixed effect estimator?
Simply using the heteroskedasticity-robust estimator, for example the `vcov="HC1"` in `feols`, as we did in the numerical example above.
The estimates in Figure \@ref(fig:montecarlobruteforcewwyBStrataEstimSePlot) suggest that this approach performs very well indeed.
```
##### Estimating sampling noise of the with/without estimator
What is the sampling noise of the WW estimator?
How can we estimate it in practice?
I leave the answers to these questions as an exercise.
The OLS estimator is probably some weighted average of the $\hat{\Delta}^Y_{WW_s}$ parameters, with weights that are going to drive the asymptotic distribution.
Under plausible assumptions (constant strata poportions and constant proportion of treated), we probably get that its asymptotic distribution should be equal to that of the other two estimators.
This is what Figure \@ref(fig:montecarlobruteforcewwyBStrataEstimSePlot) suggests.
```{remark}
The default (even heteroskedasticity robust) standard error of the WW estimator estimated by OLS is NOT an adequate measure of sampling noise under stratification, since it does not account for the fact that stratification has reduced sampling noise by capturing part of the vaariation in potential outcomes.
This appears clearly on Figure \@ref(fig:montecarlobruteforcewwyBStrataEstimSePlot), where the estimated sampling noise of the WW estimator is always equal to the one without stratification.
Ignoring stratification when estimating precision might leave a lot of precision on the table.
```
```{remark}
Figure \@ref(fig:montecarlobruteforcewwyBStrataEstimSePlot) also shows another problem with the WW estimator: its precision starts decreasing when we increase the number of strata above 25. What is happening is that some strata start having no treated or no untreated units.
The SAT and SFE estimators exclude these strata from the estimation (thereby biasing the final estimate) whereas the WW estimator do not exclude them and ends up using an unbalanced, noisier sample.
Note that the standard errors that we used so far do not reflect that risk.
```
#### Estimating sampling noise under varying strata proportions
What happens when we relax Assumption \@ref(hyp:cstStrataProportion), that is when we acknowledge that the proportion of observations in each strata might vary with $N$?
The reason why this might happen is first because, in small samples, and with small strata, rounding errors might make the actual strata proportions far away from their theoretical values, even with stratified block randomization.
Another reason is that we might use a stratification procedure that does not have the property of keeping strata proportions constant.
A final reason is that we might be delimiting strata in a first step from a given sample, a thus strata are an estimated, whose true proportion in the population are unkwnown objects around which we can at best form a noisy estimator.
This is for example the case when delimiting strata using $k-$means clustering as we did in our example.
```{example}
In order to illustrate the first source of failure of Assumption \@ref(hyp:cstStrataProportion), let us plot the proportion of each strata in our samples, along with their theoretical values.
```
```{r BuildStrataProportions}
data.strata.prop <- data.strata %>%
mutate(
qs = Ns/N
) %>%
group_by(strata_type,strata_value,strata_type_value) %>%
summarize(
qs = mean(qs)
) %>%
mutate(
strata_type=factor(strata_type,levels=c(paste("strata",c(1,2,5,10,25,50,100),sep='_'))),
Nstrata = as.numeric(str_split_fixed(strata_type,'\\_',2)[,2]),
qsstar = 1/Nstrata
)
```
Let us plot the result:
```{r StrataPropPlot,fig.cap=c('Strata proportions'),fig.align='center',out.width='50%',fig.pos='htbp',fig.height=8,fig.width=12}
ggplot(data.strata.prop %>% filter(strata_type!="strata_1",strata_type!="strata_2"),aes(x=qs))+
geom_histogram(binwidth=0.01,color='black',fill='white')+
geom_vline(aes(xintercept=qsstar),color='red',linetype='dashed')+
xlim(0,0.35)+
theme_bw() +
facet_wrap(.~strata_type)
```
As Figure \@ref(fig:StrataPropPlot) shows, there is substantial variation in a given sample around the theoretical strata proportion.
```{remark}
Note that it is not clear whether we would expect strata proportions to be constant at $\frac{1}{S}$ with strata defined by $k$-means clustering.
A more rigorous check would define strata as based on a finite set of threshold values on the support of $\ln y_b$ and would derive the strata proportions in the population before checking the actual distribution of the proportion of these strata in each sample.
We expect a substantial amount of variation.
This is left as an exercise.
```
```{remark}