Visualization

Example 1

Let’s imagine an experiment that is modeled on the following data generating process:

So we have three covariates – \(w1\), \(w2\), and \(w3\) – that all predict the outcome \(y\) but which (thanks to randomization) are not correlated with the treatment \(tx\).

We can simulate the data generation process in the following way:

\[ Y_i(0) = 8 * W1_i + 6 * W2_i + 4 * W3_i + \epsilon_i \]

\[ \epsilon_i \sim \mathcal{N}(0, 15) \]

\[ W1_i, W2_i, W3_i \sim \mathcal{N}(0, 1) \]

\[ Y_i(1) = Y_i(0) + 20 \]

Let’s simulate the data (N = 500) and run a single experiment:

library(randomizr)

N <- 500
set.seed(1)

w1 <- rnorm(N)
w2 <- rnorm(N)
w3 <- rnorm(N)


dta <- tibble(w1,w2,w3) |> 
  mutate(y0 = 8*w1 + 6*w2 + 4*w3 + rnorm(N,0,15),
         y1 = y0 + 20)

exp <- dta |> 
  mutate(tx = complete_ra(N, prob=0.5),
         y = ifelse(tx == 1, y1, y0))

Now let’s estimate the model with and without covariates:

library(modelsummary)
library(broom)

m1 <- lm(y ~ tx, data=exp)
m2 <- lm(y ~ tx + w1 + w2 + w3, data=exp)

models <- list(m1,m2)

modelsummary(models, gof_map = c("nobs"))
 (1)   (2)
(Intercept) −0.544 −0.609
(1.252) (1.017)
tx 19.989 20.347
(1.771) (1.443)
w1 8.916
(0.713)
w2 6.608
(0.685)
w3 3.973
(0.716)
Num.Obs. 500 500
Understanding check and discussion:

Does it surprise you that the coefficient on \(tx\) didn’t really change between models 1 and 2?

Does it surprise you that the standard error on \(tx\) shrank between models 1 and 2?

Why might you want to report the coefficients on covariates?

Exercise:
  1. Make a prettier table.

  2. Write up these results as you would in your BA thesis.

# modify the modelsummary code

My answer here.


Coefficient Plots

Rather than reporting tables, it’s now becoming more standard to report graphical results in the main text (and put the tables in the appendix).

Here’s how to make a coefficient plot:

# reordering the models in the list
models <- list(m2, m1)

# making the y-axis labels
# start with the lowest one first
ylabs <- c('(Intercept)' = 'Intercept',
           'w3' = 'W3',
           'w2' = 'W2',
           'w1' = 'W1',
           'tx' = 'Treatment'
           )

# plotting
modelplot(models, coef_map = ylabs) +
  geom_vline(xintercept = 0,
             color="black",
             lty="dashed") +
  labs(x = "Coefficient Estimates", 
       y = "",
       color = "") +
  scale_color_discrete(labels=c("Model2", "Model1"),
                       guide = guide_legend(reverse = TRUE)) +
  theme_minimal() +
  theme(legend.position = "inside",
        legend.position.inside = c(0.85, 0.15))


Example 2

Suppose we run a simple experiment with a treatment and control group, but where we measure multiple outcome variables (\(A, B, C, D\)).

For simplicity, let’s draw each outcome from a normal distribution with mean = 0 and sd = 1.

The treatment effects are:

  • \(\tau_A = 0.5\)
  • \(\tau_B = 0\)
  • \(\tau_C = 0.9\)
  • \(\tau_D = -0.3\)
N <- 500
set.seed(1)

a0 <- rnorm(N)
b0 <- rnorm(N)
c0 <- rnorm(N)
d0 <- rnorm(N)


dta <- tibble(a0,b0,c0,d0) |> 
  mutate(a1 = a0 + 0.5,
         b1 = b0,
         c1 = c0 + 0.9,
         d1 = d0 - 0.3)

exp <- dta |> 
  mutate(tx = complete_ra(N, prob=0.5),
         a = ifelse(tx == 1, a1, a0),
         b = ifelse(tx == 1, b1, b0),
         c = ifelse(tx == 1, c1, c0),
         d = ifelse(tx == 1, d1, d0))

Now let’s estimate the models:

ma <- lm(a ~ tx, data=exp)
mb <- lm(b ~ tx, data=exp)
mc <- lm(c ~ tx, data=exp)
md <- lm(d ~ tx, data=exp)

#  modelsummary headings
models <- list(
  "Outcome A" = ma,
  "Outcome B" = mb,
  "Outcome C" = mc,
  "Outcome D" = md
)

# variable names
ylabs <- c(
  'tx' = 'Treatment Effect',
  '(Intercept)' = 'Mean in the Control Group'
)

# showing the regression output in a single table
modelsummary(models,
             coef_map = ylabs,
             gof_map = c("nobs"))
Outcome A Outcome B Outcome C Outcome D
Treatment Effect 0.581 −0.147 0.872 −0.278
(0.091) (0.094) (0.090) (0.096)
Mean in the Control Group −0.018 0.027 0.011 −0.041
(0.064) (0.067) (0.064) (0.068)
Num.Obs. 500 500 500 500

We can again use modelplot(), but it looks quite ugly:

# note: this omits the intercept from the coefplot.
modelplot(models,
          coef_map = c(tx = "Treatment Effect"))

Instead, we can create something “by hand”:

Exercise:

Can you write up some code to create the plot above?

Hint: you’ll need to grab the coefficients and confidence intervals from each of the 4 models using broom:tidy(), store them as a separate dataframe, and then use ggplot on that.

For the errorbars, you can use geom_errorbar.


Coefplot “by hand”

So here’s how I would do it:

# grabbing the results I want to plot from the separate models
ma_tidy <- tidy(ma, conf.int = TRUE) |> 
  mutate(outcome = "A")

mb_tidy <- tidy(mb, conf.int = TRUE) |> 
  mutate(outcome = "B")

mc_tidy <- tidy(mc, conf.int = TRUE) |> 
  mutate(outcome = "C")

md_tidy <- tidy(md, conf.int = TRUE) |> 
  mutate(outcome = "D")

# appending all of the results and dropping the intercept rows
toplot <- rbind(ma_tidy, mb_tidy, mc_tidy, md_tidy) |> 
  filter(term == "tx")

# plotting
ggplot(data=toplot, aes(x=outcome, color = outcome)) +
  geom_point(aes(y = estimate),
             size = 3) +
  geom_errorbar(aes(ymax = conf.high,
                    ymin = conf.low),
                width = 0) +
  geom_hline(aes(yintercept = 0),
             color = "red",
             linetype = "dashed") +
  theme_minimal() +
  labs(y = "Treatment Effect Estimate",
       x = "Outcomes") +
  coord_cartesian(ylim=(c(-0.5, 1.2))) +
  theme(legend.position="none")


Example 3

Suppose we run an experiment with three different experimental arms.

And suppose our outcome variable is an agree / disagree question coded on a 5-point likert scale.

To keep our lives simple, let’s just model the data generating process as a series of coin tosses.

More specifically, people in the first treatment arm (tx == 1) are given the first coin, people in the second treatment arm (tx == 2) are given the second coin, and people in the third treatment arm (tx == 3) are given the third coin.

Each person tosses “their” assigned coin 5 times and records the number of heads.

Coin 1 is is completely fair, coin 2 lands heads with probability = 0.25, and coin 3 lands heads with probability = 0.75.

Let’s generate the data:

N <- 300

y1 = rbinom(N, 4, prob=0.5)
y2 = rbinom(N, 4, prob=0.25)
y3 = rbinom(N, 4, prob=0.75)

dta <- tibble(y1,y2,y3) 

exp <- dta |> 
  mutate(tx = complete_ra(N, num_arms=3),
         y = case_when(
           tx == "T1" ~ y1,
           tx == "T2" ~ y2,
           tx == "T3" ~ y3
         ))

Let’s look at the cross tab of treatments vs. outcomes:

table(exp$tx, exp$y)
    
      0  1  2  3  4
  T1  4 17 45 28  6
  T2 39 35 22  3  1
  T3  0  4 24 51 21

So that kind of looks like likert-scale produced data!

More specifically, T2 seems to decrease “agreement” on the item, while T3 seems to increase “agreement.”

How should we analyze this data?

We can just run a regression:

mod <- lm(y ~ tx, data=exp)

summary(mod)

Call:
lm(formula = y ~ tx, data = exp)

Residuals:
   Min     1Q Median     3Q    Max 
 -2.15  -0.89   0.08   0.85   3.08 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.15000    0.08683  24.762  < 2e-16 ***
txT2        -1.23000    0.12279 -10.017  < 2e-16 ***
txT3         0.74000    0.12279   6.027 4.95e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8683 on 297 degrees of freedom
Multiple R-squared:  0.4694,    Adjusted R-squared:  0.4658 
F-statistic: 131.4 on 2 and 297 DF,  p-value: < 2.2e-16
Understanding check:

How would you interpret these coefficients?

Write up your “results” as you would in your BA thesis.

Exercise:

Now suppose you wanted to visualize these results in a graph.

Draw a “mock figure” using paper and pencil.


Plotting the raw data and conditional expectations

When simply comparing treatment effects, it’s a good idea to let your reader see the raw data. Here’s what that looks like:

library(ggforce)

ggplot(exp, aes(x=tx, y=y, color=tx)) +
  geom_point(position = position_jitternormal(sd_x = 0.05, sd_y = 0.1),
             alpha=0.5)

We can also show the conditional expectation of Y within each experimental arm:

\[ \mathbb{E}[Y_i | T_i = t] \] That’s just a fancy way of saying “the mean of Y in T1, T2 and T3”.

We also want to show the standard error of these (estimated) means.

To estimate the mean of Y (and its associated standard error) within a single sample, you just regress Y on a constant:

\[ Y_i = \beta_0 \]

So we’ll now just do that three times, once per treatment arm. We can do this using the group_by() function:

cond_means <- exp |> 
  group_by(tx) |> 
  # calculating the mean and standard deviation in each treatment arm
  # we do this by regressing y on a constant, telling lm() to use the same data in the pipe
  do(tidy(lm(y ~ 1, data = .), conf.int=TRUE)) 

Here’s what those conditional means look like:

ggplot(cond_means, aes(x=tx, y=estimate, color=tx)) +
  geom_point() +
  geom_errorbar(aes(ymin=conf.low,
                    ymax=conf.high),
                width = 0)

Now we just have to put both graphs together:

ggplot(mapping=aes(x=tx, color=tx)) +
  geom_point(data=exp,
             aes(y=y),
             position = position_jitternormal(sd_x = 0.05, sd_y = 0.1),
             alpha=0.3,
             size=0.5) +
  geom_point(data=cond_means,
             aes(y=estimate),
             size=3) +
  geom_errorbar(data=cond_means,
                aes(ymin=conf.low,
                    ymax=conf.high),
                width = 0) +
  theme_minimal() +
  scale_y_continuous(labels=c("Totally Disagree",
                              "Disagree",
                              "Neutral",
                              "Agree",
                              "Totally Agree")) +
  labs(x = "Treatment Arm",
       y = "") +
  theme(legend.position="none")

Here, although we didn’t plot the regression coefficients, it’s clear that the three groups are statistically different from each other (i.e. the confidence intervals don’t overlap).


Example 4

Finally, let’s try an example where the treatment effect differs across subgroups of the population.

Suppose we run an experiment in two neighborhoods. In neighborhood 1, the treatment effect is positive, but it is negative in neighborhood 2:

\[ Y_i(0) \sim \mathcal{N}(0,1) \]

\[ Y_i(1) = \begin{cases} Y_i(0) + 1 & \text{if neighborhood == 1} \\ Y_i(0) - 1 & \text{if neighborhood == 2} \\ \end{cases} \]

So let’s generate the data:

set.seed(1)
N <- 200

nhood <- sample(c("N1","N2"), N, replace=TRUE)
y0 <- rnorm(N,0,1)

dta <- tibble(nhood, y0) |> 
  mutate(y1 = ifelse(nhood == "N1", y0+1, y0-1))

exp <- dta |> 
  mutate(tx = complete_ra(N, prob=0.5),
         y = ifelse(tx==1, y1, y0))

Let’s estimate the following model:

\[ Y_i = \beta_0 + \beta_1 * Tx_i + \beta_2 * Nhood_i + \beta_3 * Tx_i * Nhood_i + \epsilon_i \]

summary(lm(y ~ tx*nhood, data=exp))

Call:
lm(formula = y ~ tx * nhood, data = exp)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.96250 -0.58512 -0.09142  0.65562  2.66306 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.07358    0.13818   0.533    0.595    
tx           0.91252    0.19736   4.624 6.83e-06 ***
nhoodN2     -0.25685    0.19944  -1.288    0.199    
tx:nhoodN2  -1.63222    0.28194  -5.789 2.77e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9964 on 196 degrees of freedom
Multiple R-squared:  0.3193,    Adjusted R-squared:  0.3089 
F-statistic: 30.65 on 3 and 196 DF,  p-value: 2.731e-16
Understanding check:

How do you interpret the coefficients?

What is the treatment effect in neighborhood 2?

Write up your “results” as you would in your BA thesis.

Exercise:

Now suppose you wanted to visualize these results in a graph.

Draw a “mock figure” using paper and pencil.


Facets

One way to think about this experiment is that it’s actually composed of two separate experiments: one per neighborhood.

So we actually want that design to be reflected in the visualization.

For that, we’ll use facets.

But first, we’ll have to again grab our conditional expectations.

# grabbing conditional expectations
# note that two variables are now in  group_by()
cond_exp <- exp |> 
  group_by(nhood, tx) |> 
  do(tidy(lm(y ~ 1, data = .), conf.int=TRUE)) 


# neighborhood names
nhood_names <- as_labeller(
  c('N1' = 'Brookline', 
    'N2' = 'Southie'))

# visualization
ggplot(mapping=aes(x=factor(tx), color=factor(tx))) +
  geom_point(data=exp,
             aes(y=y),
             position = position_jitternormal(sd_x = 0.1),
             alpha=0.5,
             size=0.5) +
  geom_point(data=cond_exp,
             aes(y=estimate),
             size=3) +
  geom_errorbar(data=cond_exp,
                aes(ymin=conf.low,
                    ymax=conf.high),
                width = 0) +
  facet_wrap(~nhood, labeller = nhood_names) +
  theme_minimal() +
  labs(x = "",
       y = "") +
  scale_x_discrete(labels=c("Control", "Treatment")) +
  theme(legend.position="none")