Calculate two-stage difference-in-differences following Gardner (2021)

did2s(
  data,
  yname,
  first_stage,
  second_stage,
  treatment,
  cluster_var,
  weights = NULL,
  bootstrap = FALSE,
  n_bootstraps = 250,
  verbose = TRUE
)

Arguments

data

The dataframe containing all the variables

yname

Outcome variable

first_stage

Fixed effects and other covariates you want to residualize with in first stage. Formula following fixest::feols. Fixed effects specified after "|".

second_stage

Second stage, these should be the treatment indicator(s) (e.g. treatment variable or event-study leads/lags). Formula following fixest::feols. Use i() for factor variables, see fixest::i.

treatment

A variable that = 1 if treated, = 0 otherwise

cluster_var

What variable to cluster standard errors. This can be IDs or a higher aggregate level (state for example)

weights

Optional. Variable name for regression weights.

bootstrap

Optional. Should standard errors be calculated using bootstrap? Default is FALSE.

n_bootstraps

Optional. How many bootstraps to run. Default is 250.

verbose

Optional. Logical. Should information about the two-stage procedure be printed back to the user? Default is TRUE.

Value

fixest object with adjusted standard errors (either by formula or by bootstrap). All the methods from fixest package will work, including fixest::esttable and fixest::coefplot

Examples

Load example dataset which has two treatment groups and homogeneous treatment effects

# Load Example Dataset
data("df_hom")

Static TWFE

You can run a static TWFE fixed effect model for a simple treatment indicator

static <- did2s(df_hom,
    yname = "dep_var", treatment = "treat", cluster_var = "state",
    first_stage = ~ 0 | unit + year,
    second_stage = ~ i(treat, ref=FALSE))
#> 
#> ── Two-stage Difference-in-Differences ──────────────────────────────────────────────────────────────────────────────────────────────────
#> → Running with first stage formula `~ 0 | unit + year` and second stage formula `~ i(treat, ref = FALSE)`
#> → The indicator variable that denotes when treatment is on is `treat`
#> → Standard errors will be clustered by `state`

fixest::esttable(static)
#>                            static
#> Dependent Var.:           dep_var
#>                                  
#> treat = TRUE    2.025*** (0.0307)
#> _______________ _________________
#> S.E. type                  Custom
#> Observations               31,000
#> R2                        0.31846
#> Adj. R2                   0.31846

Event Study

Or you can use relative-treatment indicators to estimate an event study estimate

es <- did2s(df_hom,
    yname = "dep_var", treatment = "treat", cluster_var = "state",
    first_stage = ~ 0 | unit + year,
    second_stage = ~ i(rel_year, ref=c(-1, Inf)))
#> 
#> ── Two-stage Difference-in-Differences ──────────────────────────────────────────────────────────────────────────────────────────────────
#> → Running with first stage formula `~ 0 | unit + year` and second stage formula `~ i(rel_year, ref = c(-1, Inf))`
#> → The indicator variable that denotes when treatment is on is `treat`
#> → Standard errors will be clustered by `state`

fixest::esttable(es)
#>                                es
#> Dependent Var.:           dep_var
#>                                  
#> rel_year = -20   -0.0335 (0.0697)
#> rel_year = -19    0.0581 (0.0588)
#> rel_year = -18    0.0348 (0.0578)
#> rel_year = -17    0.0236 (0.0670)
#> rel_year = -16    0.0115 (0.0542)
#> rel_year = -15   -0.0148 (0.0769)
#> rel_year = -14   0.1150. (0.0613)
#> rel_year = -13   -0.0108 (0.0720)
#> rel_year = -12   -0.0727 (0.0635)
#> rel_year = -11    0.0666 (0.0559)
#> rel_year = -10    0.0396 (0.0382)
#> rel_year = -9    -0.0109 (0.0379)
#> rel_year = -8     0.0105 (0.0388)
#> rel_year = -7    -0.0001 (0.0445)
#> rel_year = -6   -0.0829* (0.0388)
#> rel_year = -5     0.0189 (0.0429)
#> rel_year = -4    -0.0664 (0.0437)
#> rel_year = -3    -0.0144 (0.0302)
#> rel_year = -2     0.0223 (0.0442)
#> rel_year = 0    2.117*** (0.0622)
#> rel_year = 1    1.857*** (0.0720)
#> rel_year = 2    1.986*** (0.0665)
#> rel_year = 3    2.005*** (0.0708)
#> rel_year = 4    1.950*** (0.0748)
#> rel_year = 5    2.038*** (0.0797)
#> rel_year = 6    2.032*** (0.0732)
#> rel_year = 7    2.025*** (0.0858)
#> rel_year = 8    1.976*** (0.0621)
#> rel_year = 9    2.121*** (0.0619)
#> rel_year = 10   2.088*** (0.0913)
#> rel_year = 11   1.943*** (0.1148)
#> rel_year = 12   1.941*** (0.1131)
#> rel_year = 13   1.965*** (0.1076)
#> rel_year = 14   2.023*** (0.0972)
#> rel_year = 15   2.235*** (0.1174)
#> rel_year = 16   2.178*** (0.1282)
#> rel_year = 17   1.936*** (0.1044)
#> rel_year = 18   2.135*** (0.1115)
#> rel_year = 19   2.112*** (0.1038)
#> rel_year = 20   1.925*** (0.1071)
#> _______________ _________________
#> S.E. type                  Custom
#> Observations               31,000
#> R2                        0.31957
#> Adj. R2                   0.31872

# plot rel_year coefficients and standard errors
fixest::coefplot(es, keep = "rel_year::(.*)")

Example from Cheng and Hoekstra (2013)

Here's an example using data from Cheng and Hoekstra (2013)

# Castle Data
castle <- haven::read_dta("https://github.com/scunning1975/mixtape/raw/master/castle.dta")

did2s(
  data = castle,
  yname = "l_homicide",
  first_stage = ~ 0 | sid + year,
  second_stage = ~ i(post, ref=0),
  treatment = "post",
  cluster_var = "state", weights = "popwt"
)
#> 
#> ── Two-stage Difference-in-Differences ──────────────────────────────────────────────────────────────────────────────────────────────────
#> → Running with first stage formula `~ 0 | sid + year` and second stage formula `~ i(post, ref = 0)`
#> → The indicator variable that denotes when treatment is on is `post`
#> → Standard errors will be clustered by `state`
#> OLS estimation, Dep. Var.: l_homicide
#> Observations: 550 
#> Standard-errors: Custom 
#>         Estimate Std. Error t value Pr(>|t|))    
#> post::1 0.075142    0.03538  2.1239  0.034127 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 263.4   Adj. R2: 0.052465