The goal of did2s is to estimate TWFE models without running into the problem of staggered treatment adoption.

Installation

You can install did2s from github with:

devtools::install_github("kylebutts/did2s")

Two-stage Difference-in-differences (Gardner 2021)

For details on the methodology, view this vignette

I have created an R package with the help of John Gardner to estimate the two-stage difference-in-differences estimator. To install the package, run the following:

devtools::install_github("kylebutts/did2s")

To view the documentation, type ?did2s into the console.

The main function is did2s which estimates the two-stage did procedure. This function requires the following options:

  • yname: the outcome variable
  • first_stage: formula for first stage, can include fixed effects and covariates, but do not include treatment variable(s)!
  • second_stage: This should be the treatment variable or in the case of event studies, treatment variables.
  • treatment: This has to be the 0/1 treatment variable that marks when treatment turns on for a unit. If you suspect anticipation, see note above for accounting for this.
  • cluster_var: Which variables to cluster on

Optional options:

  • weights: Optional variable to run a weighted first- and second-stage regressions
  • bootstrap: Should standard errors be bootstrapped instead? Default is False.
  • n_bootstraps: How many clustered bootstraps to perform for standard errors. Default is 250.

did2s returns a list with two objects:

  1. fixest estimate for the second stage with corrected standard errors.

TWFE vs. Two-Stage DID Example

I will load example data from the package and plot the average outcome among the groups.

# Automatically loads fixest
library(did2s)
#> Loading required package: fixest
#> From fixest 0.9.0 onward, BREAKING changes! (Permanently remove this message with fixest_startup_msg(FALSE).) 
#> - In i():
#>     + the first two arguments have been swapped! Now it's i(factor_var, continuous_var) for interactions. 
#>     + argument 'drop' has been removed (put everything in 'ref' now).
#> - In feglm(): 
#>     + the default family becomes 'gaussian' to be in line with glm(). Hence, for Poisson estimations, please use fepois() instead.
#> ℹ did2s (v0.4.0). For more information on the methodology, visit <https://www.kylebutts.com/did2s>
#> To cite did2s in publications use:
#> 
#>   Butts, Kyle (2021).  did2s: Two-Stage Difference-in-Differences
#>   Following Gardner (2021). R package version 0.4.0.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {did2s: Two-Stage Difference-in-Differences Following Gardner (2021)},
#>     author = {Kyle Butts},
#>     year = {2021},
#>     url = {https://www.github.com/kylebutts/did2s/},
#>   }

# Load Data from R package
data("df_het", package = "did2s")

Here is a plot of the average outcome variable for each of the groups:

# Mean for treatment group-year
agg <- aggregate(df_het$dep_var, by=list(g = df_het$g, year = df_het$year), FUN = mean)

agg$g <- as.character(agg$g)
agg$g <- ifelse(agg$g == "0", "Never Treated", agg$g)

never <- agg[agg$g == "Never Treated", ]
g1 <- agg[agg$g == "2000", ]
g2 <- agg[agg$g == "2010", ]


plot(0, 0, xlim = c(1990,2020), ylim = c(4,7.2), type = "n",
     main = "Data-generating Process", ylab = "Outcome", xlab = "Year")
abline(v = c(1999.5, 2009.5), lty = 2)
lines(never$year, never$x, col = "#8e549f", type = "b", pch = 15)
lines(g1$year, g1$x, col = "#497eb3", type = "b", pch = 17)
lines(g2$year, g2$x, col = "#d2382c", type = "b", pch = 16)
legend(x=1990, y=7.1, col = c("#8e549f", "#497eb3", "#d2382c"), 
       pch = c(15, 17, 16),
       legend = c("Never Treated", "2000", "2010"))
Example data with heterogeneous treatment effects

Example data with heterogeneous treatment effects

Estimate Two-stage Difference-in-Differences

First, lets estimate a static did. There are two things to note here. First, note that I can use fixest::feols formula including the | for specifying fixed effects and fixest::i for improved factor variable support. Second, note that did2s returns a fixest estimate object, so fixest::esttable, fixest::coefplot, and fixest::iplot all work as expected.

# Static
static <- did2s(df_het, 
                yname = "dep_var", first_stage = ~ 0 | state + year, 
                second_stage = ~i(treat, ref=FALSE), treatment = "treat", 
                cluster_var = "state")
#> 
#> ── Two-stage Difference-in-Differences ─────────────────────────────────────────
#> → Running with first stage formula `~ 0 | state + year` and second stage formula `~ i(treat, ref = FALSE)`
#> → The indicator variable that denotes when treatment is on is `treat`
#> → Standard errors will be clustered by `state`

fixest::esttable(static)
#>                            static
#> Dependent Var.:           dep_var
#>                                  
#> treat = TRUE    2.203*** (0.0559)
#> _______________ _________________
#> S.E. type                  Custom
#> Observations               31,000
#> R2                        0.26097
#> Adj. R2                   0.26097

This is very close to the true treatment effect of ~2.23.

Then, let’s estimate an event study did. Note that relative year has a value of Inf for never treated, so I put this as a reference in the second stage formula.

# Event Study
es <- did2s(df_het,
            yname = "dep_var", first_stage = ~ 0 | state + year, 
            second_stage = ~i(rel_year, ref=c(-1, Inf)), treatment = "treat", 
            cluster_var = "state")
#> 
#> ── Two-stage Difference-in-Differences ─────────────────────────────────────────
#> → Running with first stage formula `~ 0 | state + year` and second stage formula `~ i(rel_year, ref = c(-1, Inf))`
#> → The indicator variable that denotes when treatment is on is `treat`
#> → Standard errors will be clustered by `state`

And plot the results:

fixest::iplot(es, main = "Event study: Staggered treatment", xlab = "Relative time to treatment", col = "steelblue", ref.line = -0.5)

# Add the (mean) true effects
true_effects = head(tapply((df_het$te + df_het$te_dynamic), df_het$rel_year, mean), -1)
points(-20:20, true_effects, pch = 20, col = "black")

# Legend
legend(x=-20, y=3, col = c("steelblue", "black"), pch = c(20, 20), 
       legend = c("Two-stage estimate", "True effect"))
Event-study plot with example data

Event-study plot with example data

Comparison to TWFE

twfe = feols(dep_var ~ i(rel_year, ref=c(-1, Inf)) | unit + year, data = df_het) 

fixest::iplot(list(es, twfe), sep = 0.2, ref.line = -0.5,
      col = c("steelblue", "#82b446"), pt.pch = c(20, 18), 
      xlab = "Relative time to treatment", 
      main = "Event study: Staggered treatment (comparison)")


# Legend
legend(x=-20, y=3, col = c("steelblue", "#82b446"), pch = c(20, 18), 
       legend = c("Two-stage estimate", "TWFE"))

Citation

If you use this package to produce scientific or commercial publications, please cite according to:

citation(package = "did2s")
#> 
#> To cite did2s in publications use:
#> 
#>   Butts, Kyle (2021).  did2s: Two-Stage Difference-in-Differences
#>   Following Gardner (2021). R package version 0.4.0.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {did2s: Two-Stage Difference-in-Differences Following Gardner (2021)},
#>     author = {Kyle Butts},
#>     year = {2021},
#>     url = {https://www.github.com/kylebutts/did2s/},
#>   }

References

Gardner, John. 2021. “Two-Stage Difference-in-Differences.” Working Paper. https://jrgcmu.github.io/2sdd_current.pdf.