--- title: "rdhte: Empirical Illustration" output: rmarkdown::html_vignette: toc: true toc_depth: 3 vignette: > %\VignetteIndexEntry{rdhte-illustration} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4, dpi = 90, message = FALSE, warning = FALSE ) ``` ## Overview `rdhte` estimates **heterogeneous treatment effects** in sharp Regression Discontinuity (RD) designs along one or more pretreatment covariates. The package exposes three main commands: - `rdhte()` -- point estimation and robust bias-corrected inference of group-conditional RD effects. - `rdbwhte()` -- automatic per-cell bandwidth selection. - `rdhte_lincom()` -- linear-combination tests over the estimated CATEs. This vignette illustrates each feature on a real-data extract from Granzier, Pons, and Tricaud (2023, *American Economic Journal: Applied*), bundled with the package as `rdhte_dataset`. ### The data and the design Granzier, Pons, and Tricaud study **coordination and bandwagon effects in French two-round elections**, where candidates must clear a qualifying-vote threshold in the first round to advance to the runoff. The institutional rule creates a sharp regression-discontinuity design on every candidate's first-round margin against that threshold: candidates just above the cutoff advance, those just below do not, and small differences in first-round support produce a discrete change in runoff participation. The authors use the design to ask whether being just-qualified causally affects subsequent voter and candidate behavior, and document substantive heterogeneity across ideology and candidate-strength dimensions. The bundled extract has 39,534 candidate-race observations with the following variables: - `y` -- 0/1 indicator for advancing to the runoff (the outcome). - `x` -- first-round margin against the qualifying threshold (the running variable; cutoff at zero). - `cluster_var` -- district identifier, used for cluster-robust inference. - `w_left` -- 0/1 indicator for left-of-center parties. - `w_ideology` -- unordered party-ideology bucket (4 categories). - `w_strength` -- continuous proxy for ex-ante candidate strength (average prior national-level vote share). - `w_strong` -- 0/1 indicator for above-median strength. - `w_strength_qrt` -- ordered quartile of `w_strength` (4 levels). These covariates support every covariate-incorporation pattern `rdhte` exposes: binary cells, multi-level (unordered and ordered) factor cells, factor-by-factor interactions, continuous slopes, and binary x continuous interactions. ```{r setup} library(rdhte) library(rdrobust) library(broom) data(rdhte_dataset) attach(rdhte_dataset) ``` ## Single binary variable When `covs.hte` is binary (or a factor with two levels), `rdhte` reports one CATE per cell. ```{r binary} summary(rd_left <- rdhte(y = y, x = x, covs.hte = factor(w_left), cluster = cluster_var)) ``` How large is the advantage for left-of-center candidates? Test the difference with `rdhte_lincom`: ```{r binary-lincom} rdhte_lincom(model = rd_left, linfct = c("`factor(w_left)1` - `factor(w_left)0` = 0")) ``` Forcing a common bandwidth across cells (`bw.joint = TRUE`) can distort the cell-specific inference -- here it makes the right-of-center effect (incorrectly) statistically significant: ```{r binary-bwjoint} summary(rdhte(y = y, x = x, covs.hte = w_left, cluster = cluster_var, bw.joint = TRUE)) ``` A 0/1 numeric variable is auto-promoted to a factor; coefficient names then carry the variable name as a prefix: ```{r binary-bare} summary(rd_left2 <- rdhte(y = y, x = x, covs.hte = w_left, cluster = cluster_var)) rdhte_lincom(model = rd_left2, linfct = c("`w_left1` - `w_left0` = 0")) ``` ## Single categorical variable -- unordered A multi-level factor produces one CATE per level. The reference category is the factor's first level. ```{r categorical-unordered} summary(rd_ideology <- rdhte(y = y, x = x, covs.hte = factor(w_ideology), cluster = cluster_var)) ``` Joint test that the three non-reference ideology cells are all indistinguishable from zero: ```{r categorical-lincom} rdhte_lincom(model = rd_ideology, linfct = c("`factor(w_ideology)2` = 0", "`factor(w_ideology)3` = 0", "`factor(w_ideology)4` = 0")) ``` ## Single categorical variable -- ordered For an ordered categorical variable, wrapping in `factor()` keeps the per-level CATE interpretation: ```{r categorical-ordered} summary(rdhte(y = y, x = x, covs.hte = factor(w_strength_qrt), cluster = cluster_var)) ``` The per-quartile CATE increases monotonically: candidates with higher average strength have larger treatment effects. A bare numeric variable (no `factor()`) is treated as continuous -- useful when the ordering is meaningful but you want the linear-in-W parametrization: ```{r categorical-as-continuous} summary(rdhte(y = y, x = x, covs.hte = w_strength_qrt, cluster = cluster_var)) ``` ## Two binary variables -- interaction Interactions of two factors produce one CATE per joint cell. The ordering of cells follows R's `interaction()` convention. ```{r factor-by-factor} summary(rdhte(y = y, x = x, covs.hte = factor(w_left):factor(w_strong), cluster = cluster_var)) ``` The same model can be expressed by pre-building the joint cell variable: ```{r factor-by-factor-alt} interactions <- 1*(w_left==0)*(w_strong==1) + 2*(w_left==0)*(w_strong==2) + 3*(w_left==1)*(w_strong==1) + 4*(w_left==1)*(w_strong==2) summary(rdhte(y = y, x = x, covs.hte = factor(interactions), cluster = cluster_var)) ``` ## Single continuous variable When `covs.hte` is continuous, `rdhte` switches to a linear-in-W parametrization of the CATE function: \[ \tau(w) = \beta_T + \beta_{T:W}\,w. \] ```{r continuous} summary(rd_continuous <- rdhte(y = y, x = x, covs.hte = w_strength, kernel = "uni", cluster = cluster_var)) ``` To aid interpretation, the slope coefficient is precisely a partial-linear regression slope. With the **uniform kernel** and a **fixed bandwidth**, the rdhte point estimates match plain `lm()` on the same in-bandwidth subset: ```{r continuous-vs-lm} trt <- (x > 0) new.data <- data.frame(y, x, w_strength, trt) using.lm <- coef(lm(y ~ trt * x * w_strength, data = new.data[abs(new.data$x) < rd_continuous$h[1], ])) rd_continuous$Estimate[1] - using.lm["trtTRUE"] rd_continuous$Estimate[2] - using.lm["trtTRUE:w_strength"] ``` (Inference, however, requires the robust bias correction in `rdhte` and cannot be obtained from the `lm()` fit alone.) ## Interaction effect: binary x continuous A fully interacted specification gives a separate intercept and slope for each level of the categorical covariate. ```{r binary-by-continuous} summary(rd_interaction <- rdhte(y = y, x = x, covs.hte = "w_left*w_strength", cluster = cluster_var)) ``` Wrapping the binary in `factor()` is also allowed but shifts the baseline category and therefore the reported intercepts: ```{r binary-by-continuous-factor} summary(rdhte(y = y, x = x, covs.hte = "factor(w_left)*w_strength", cluster = cluster_var)) ``` Each cell-specific coefficient may look insignificant individually while the joint test still rejects: ```{r binary-by-continuous-joint} rdhte_lincom(model = rd_interaction, linfct = c("`T` = 0", "`T:w_left` = 0", "`T:w_strength` = 0", "`T:w_left:w_strength` = 0")) ``` With the bandwidth fixed, the fully interacted model matches results from category-specific estimation. Use `rdhte_lincom` to read off the per-cell intercept and slope: ```{r binary-by-continuous-fixedbw} summary(rd_interaction <- rdhte(y = y, x = x, covs.hte = "w_left*w_strength", h = 0.1, cluster = cluster_var)) summary(rdhte(y = y[w_left == 0], x = x[w_left == 0], covs.hte = w_strength[w_left == 0], h = 0.1, cluster = cluster_var[w_left == 0])) summary(rdhte(y = y[w_left == 1], x = x[w_left == 1], covs.hte = w_strength[w_left == 1], h = 0.1, cluster = cluster_var[w_left == 1])) rdhte_lincom(model = rd_interaction, linfct = c("`T` + `T:w_left` = 0", "`T:w_strength` + `T:w_left:w_strength` = 0")) ``` ## Standalone bandwidth selection (`rdbwhte`) `rdbwhte` runs the same data-driven bandwidth selectors as `rdhte` but returns the bandwidths without estimating the CATEs. Useful when you want to inspect or fix `h` and then explore alternative variance estimators or compare specifications at a common bandwidth. ```{r rdbwhte} bw <- rdbwhte(y = y, x = x, covs.hte = factor(w_ideology), cluster = cluster_var) bw$h ``` `bw.joint = TRUE` forces a single shared bandwidth across cells: ```{r rdbwhte-joint} bw_joint <- rdbwhte(y = y, x = x, covs.hte = factor(w_ideology), cluster = cluster_var, bw.joint = TRUE) bw_joint$h ``` ## Efficiency-improving covariates (`covs.eff`) `covs.eff` adds covariates to the local-polynomial regression to **shrink standard errors** without changing the identification of the CATE. They enter additively (and as `covs.eff x W` interactions in the heterogeneity-aware paths) but never with the treatment indicator or the running-variable polynomial. Useful when you have strong pretreatment predictors of the outcome. ```{r covs-eff} m_no_eff <- rdhte(y = y, x = x, covs.hte = factor(w_ideology), cluster = cluster_var) m_eff <- rdhte(y = y, x = x, covs.hte = factor(w_ideology), covs.eff = w_strength, cluster = cluster_var) data.frame(cell = m_no_eff$W.lev, se_no_eff = round(m_no_eff$se.rb, 4), se_eff = round(m_eff$se.rb, 4), pct_change = round(100 * (m_eff$se.rb - m_no_eff$se.rb) / m_no_eff$se.rb, 1)) ``` ## Plotting `plot()` is a post-estimation method for categorical `covs.hte`: one point per cell at the conventional point estimate, with the robust bias-corrected CI as an error bar. ```{r plot-basic, fig.alt="rdhte forest plot, ideology buckets"} plot(rd_ideology) ``` `sort = TRUE` reorders cells along the x-axis by their point estimate -- often the more useful default for ranking-style narratives: ```{r plot-sort, fig.alt="sorted forest plot"} plot(rd_ideology, sort = TRUE) ``` The returned object is a `ggplot`, so additional layers compose naturally. For example, a custom title with horizontal orientation: ```{r plot-custom, fig.alt="custom-themed forest plot"} p <- plot(rd_ideology, sort = TRUE, title = "Heterogeneity by ideology bucket", ylab = "Sharp RD ITT") if (requireNamespace("ggplot2", quietly = TRUE)) { p + ggplot2::theme_minimal(base_size = 11) + ggplot2::coord_flip() } ``` ## Building publication-ready tables `rdhte` exposes the per-cell results through `tidy()` and a one-row summary through `glance()` (both broom-compatible). Combined with `knitr::kable()` / `xtable::xtable()` the same numbers can be turned into Markdown, HTML, or LaTeX tables for papers. ### (A) per-cell table from a single call `tidy()` returns one row per cell with the conventional point estimate, robust BC standard error, z / p-value, BC confidence interval, and per-side `h` and `Nh`. Renaming or formatting columns before `kable()` gives a clean publication panel: ```{r table-tidy} tab_A <- tidy(rd_ideology) tab_A ``` ```{r table-A-kable} knitr::kable( tab_A[, c("term", "estimate", "std.error", "conf.low", "conf.high", "n.eff.left", "n.eff.right", "h.left", "h.right")], digits = c(NA, 3, 3, 3, 3, 0, 0, 3, 3), col.names = c("Cell", "Estimate", "SE", "CI lo", "CI hi", "Nh-", "Nh+", "h-", "h+"), caption = "Per-cell RD effects by ideology bucket" ) ``` `glance()` complements `tidy()` with a one-row description of the fit (N, polynomial orders, kernel, VCE, bandwidth selector): ```{r table-glance} glance(rd_ideology) ``` ### (B) cell x specification comparison Fix the `covs.hte` argument and vary `vce` across columns to see how the choice of variance estimator moves the per-cell standard errors. ```{r table-spec-compare} specs <- list(HC0 = list(vce = "hc0"), HC2 = list(vce = "hc2"), HC3 = list(vce = "hc3"), CR1 = list(vce = "cr1", cluster = cluster_var)) fit_one <- function(args) { args <- c(list(y = y, x = x, covs.hte = factor(w_ideology)), args) do.call(rdhte, args) } cells <- sort(unique(w_ideology)) mat_pt <- sapply(specs, function(s) tidy(fit_one(s))$estimate) mat_se <- sapply(specs, function(s) tidy(fit_one(s))$std.error) rownames(mat_pt) <- rownames(mat_se) <- as.character(cells) mat_pt mat_se ``` A common publication layout interleaves the point estimate and its SE in parentheses below it: ```{r table-spec-stacked} n_cells <- nrow(mat_pt) out <- data.frame(matrix(NA_character_, nrow = 2 * n_cells, ncol = ncol(mat_pt))) for (i in seq_len(n_cells)) { out[2 * i - 1, ] <- sprintf("%.3f", mat_pt[i, ]) out[2 * i, ] <- sprintf("(%.3f)", mat_se[i, ]) } out <- cbind(Cell = rep(rownames(mat_pt), each = 2), out) out$Cell[seq(2, nrow(out), by = 2)] <- "" colnames(out)[-1] <- colnames(mat_pt) knitr::kable(out, caption = "Per-cell estimates by variance option (SE in parentheses)") ``` ### (C) LaTeX export `xtable` or `kableExtra` can render the same data frame as LaTeX. The snippet below produces a paper-ready table; uncomment the `writeLines()` call to write to a file: ```{r table-latex, eval = requireNamespace("xtable", quietly = TRUE)} tex <- xtable::xtable( tab_A[, c("term", "estimate", "std.error", "conf.low", "conf.high")], digits = c(0, 0, 3, 3, 3, 3), caption = "Per-cell RD effects by ideology bucket", label = "tab:rdhte-ideology" ) print(tex, include.rownames = FALSE, booktabs = TRUE, caption.placement = "top") # writeLines(capture.output(print(tex, include.rownames = FALSE, # booktabs = TRUE, # caption.placement = "top")), # con = "rdhte_table_A.tex") ``` ## Replicating rdrobust Average effects from `rdhte()` and `rdrobust()` will not match under default settings, because the two packages use different defaults for `rho` and `vce`: ```{r rdrobust-defaults} summary(rdhte(y = y, x = x)) summary(rdrobust(y = y, x = x)) ``` To replicate exactly with `rdrobust`, set `rho = 1`, choose a matching `vce`, and enforce the same bandwidth: ```{r rdrobust-match-ate} summary(rdhte(y = y, x = x, h = 0.1, vce = "hc3")) summary(rdrobust(y = y, x = x, h = 0.1, rho = 1, vce = "hc3")) ``` The same trick reproduces the per-cell `rdrobust` estimates from a single `rdhte` call with per-cell bandwidths: ```{r rdrobust-match-cells} summary(rdhte(y = y, x = x, covs.hte = w_left, h = c(0.078, 0.116), vce = "hc3")) summary(rdrobust(y = y[w_left == 1], x = x[w_left == 1], h = 0.116, rho = 1, vce = "hc3")) ``` ```{r cleanup, include = FALSE} detach(rdhte_dataset) ``` ## See also - `rdhte()`, `rdbwhte()`, `rdhte_lincom()`, `plot.rdhte()`. - The companion Stata package ships an `rdhte_plot` command that mirrors the R `plot()` method. ## Reference Calonico, S., Cattaneo, M.D., Farrell, M.H., Palomba, F., and Titiunik, R. (2025). *Treatment Effect Heterogeneity in Regression Discontinuity Designs.* [arXiv:2503.13696](https://arxiv.org/abs/2503.13696). Granzier, R., Pons, V., and Tricaud, C. (2023). *Coordination and Bandwagon Effects: How Past Rankings Shape the Behavior of Voters and Candidates.* **American Economic Journal: Applied Economics** 15(4): 177-217.