Instrumental Variables

A Canvassing Experiment

In the US, political campaigns frequently send volunteers door-to-door to encourage people to vote. This is called “canvassing”.

It the US, you can also find out whether someone voted (but not who they voted for).

Suppose we ran an experiment to determine the effectiveness of door-to-door canvassing.

We randomly select 2000 households, and then randomly allocate 1000 of them (the treatment group) to receive a visit from a volunteer. The remaining 1000 households are our control group.

As it turns out, however, the majority of people in the treatment group were not home when the canvasser came knocking.

Suppose here’s how the experiment actually turned out:

Discuss:

If we wanted to learn about the causal effect of canvassing on turnout, which boxes would we compare?

The Intent-to-Treat Effect (ITT)

The simplest strategy would be to completely ignore treatment non-compliance and simply compare those who were assigned to treatment vs. those assigned to control.

Let’s use the variable \(Z\) to denote treatment assignment:

\[ Z_i = \begin{cases} 1 & \text{if assigned to treatment} \\ 0 & \text{if assigned to control} \\ \end{cases} \]

Another name for \(Z\) is the instrument.

Since treatment assignment is randomly determined by the researcher, the instrument Z is uncorrelated with potential outcomes, as well as all possible confounders.

Thus, we can just compare the mean outcomes in these two groups determined by random treatment assignment:

\[ \mathbb{E}[Y_i | Z_i = 1] - \mathbb{E}[Y_i | Z_i = 0] = ITT \] The difference constitutes the Intent-to-Treat effect (ITT).

Sometimes, the ITT is what we are after. After all, “real life” canvassing programs must deal with the problem that some people won’t answer the door.

This “real world impact” of such programs is measured by the ITT.

On the other hand, the ITT doesn’t tell us about the causal effect of talking to a canvasser on turnout. But theoretically, maybe that’s what we want to learn about.

So what can we do?

Compliance Types

As a first step to answering this question, it’s helpful to split our sample into two “types” of people:

compliers do what they should: they take the treatment if assigned to the treatment group, and they go untreated if assigned to the control group
never-takers do not take the treatment, no matter what group they are assigned to

More formally, let’s define \(D_i\) as an individuals treatment status (that is, whether or not they were actually treated).

We can think of \(D_i\) in terms of potential outcomes, just like \(Y_i\).

So \(D_i(Z_i = 0)\) denotes the potential treatment status of someone who is assigned to control, and \(D_i(Z_i = 1)\) denotes the potential treatment status of someone who is assigned to treatment.

Thus a complier is someone for whom \(D_i(Z_i = 1) = 1\) and \(D_i(Z_i = 0) = 0\).

By contrast, a never-taker is someone for whom \(D_i(Z_i = 1) = 0\) and \(D_i(Z_i = 0) = 0\).

Now let’s go back to our graph:

Understanding Check:

How many compliers are there in our sample of 2000 people?

Important:

We don’t need to know exactly who the compliers are in order to be able to estimate the proportion of compliers (\(\pi_c\)).

Complier Average Causal Effect (CACE)

While we cannot estimate the ATE for the entire population, we can estimate the treatment effect for the subgroup of compliers.

This is called the Complier Average Causal Effect (CACE), or sometimes the Local Average Treatment Effect (LATE).

Let’s just go with CACE.

We can calculate the CACE as:

\[ CACE = \frac{ITT}{\pi_c} \]

Why is that true?

Here’s a graphical illustration:

On the left side, we see the potential outcomes if everyone were assigned to control (Z = 0).

And on the right side, we see the potential outcomes if everyone were assigned to treatment (Z=1).

We can calculate \(\mathbb{E}(Y_i(Z_i =0))\) as the weighted average of the proportion of compliers times the average potential control outcome for compliers, plus the proportion of never-takers times the average potential outcome for never-takers.

In other words:

\[ \mathbb{E}(Y_i(Z_i=0)) = \pi_c \times 0.5 + \pi_{NT} \times 0.4 \]

Geometrically, \(\mathbb{E}(Y_i(Z_i=0))\) is represented by the area of the left side of the graph.

Similarly, we can represent \(\mathbb{E}(Y_i(Z_i=1))\) as area of the right side of the graph:

\[ \mathbb{E}(Y_i(Z_i=1)) = \pi_c \times 0.6 + \pi_{NT} \times 0.4 \]

Notice that the average outcomes for never-takers doesn’t change as we go for the left to the right side, since never-takers don’t “take” the treatment!

By contrast, compliers do respond to treatment, and so their potential outcomes change from 0.5 to 0.6.

The difference in the two areas – represented by the blue box – represents the ITT.

\[ ITT = \mathbb{E}(Y_i(Z_i=1)) - \mathbb{E}(Y_i(Z_i=0)) = (0.6 - 0.5) \times \pi_c \]

The CACE is represented by the height of the blue box.

So we know (or can estimate) the area of the box (ITT), and we know (or can estimate) the width of the box (\(\pi_c\)).

Since \(width \times height = area\), the \(ITT = CACE \times \pi_c\).

Re-arranging, we get the formula for \(CACE = \frac{ITT}{\pi_c}\).

Important:

We just did something really cool: even though we don’t know exactly which individuals are compliers, we can nonetheless “back out” the complier average treatment effect!

So even though people self-select into whether they actually “take” the treatment, we can still say something causal about the treatment effect for the sub-population of individuals whom we have defined as “compliers”.

However, we cannot generalize our results to non-compliers.

One-sided vs. Two-sided Non-compliance

Let’s do a slightly more complicated example:

Suppose we are interested in the causal effect of military service on an individuals’ political attitudes (measured on a liberal-conservative dimension).

We might think that military services makes people more conservative.

On the other hand, maybe people who are more conservative in the first place volunteer to serve in the military.

Thus, the comparison between military veterans and non-veterans is likely to be biased by self-selection into treatment.

To get around this bias, scholars have exploited draft lotteries which randomize an individual’s chances of military service:

The logic is that people who have “bad” draft numbers are, on average, the same as people with “good” draft numbers.

However, just because you are drafted doesn’t mean that you will 100% go into the military. You might, for example, obtain an exemption based on physical health, education, family status, etc. You might also just “dodge” the draft.

And on the flip side, even people who are not drafted can still volunteer.

Here’s what that situation looks like:

Note that we now have more compliance “types” to deal with:

compliers join the military if drafted, but stay out if not drafted
always-takers join the military, regardless of their draft status
never-takers stay out of the military, regardless of their draft status
defiers volunteer for the military if not drafted, but stay out if drafted

With four types, we can no longer “back out” the proportion of compliers without an additional assumption: namely, that there exist NO DEFIERS. This is sometimes called the monotonicity assumption (I will explain why below).

For now, once we rule out the existence of defiers, we can figure out the proportions of always-takers (\(\pi_{AT}\)) and never-takers (\(\pi_{NT}\)), and thus “back out” the proportion of compliers (\(\pi_{C}\)).

From there, everything else is the same.

We know that the ITT is only influenced by the response of compliers:

And thus we estimate the CACE in the same way:

\[ CACE = \frac{ITT}{\pi_C} \]

CACE vs. ATE

At this point, it’s worth stressing something:

We are estimating the treatment effect for compliers only. This may or may not be the same as the treatment effect for non-compliers. If it’s different, then the CACE \(\neq\) the ATE.

Let’s look at a hypothetical example:

Name	Type	Effect of being drafted on military service	Effect of military service on conservatism
Axel	Complier	1	0.5
Barbara	Always Taker	0	0.1
Chris	Never Taker	0	0.3

Note that the last column shows the treatment effect that would theoretically obtain if it were possible to switch the treatment status for each individual (e.g. if we could make somehow make Barbara stay out of the military, even though she is an “always taker”).

From this, it’s clear that the ATE = 0.3. However, the CACE = 0.5.

Of course, in the real world, we don’t get to observe the last column. We can estimate the CACE, but we don’t anything about the average treatment effects for “never-takers” and “always-takers”.

As a result, we cannot make inferences about the ATE, and we cannot generalize from the CACE to the ATE.

Profiling Compliers

Given the above, it’s maybe useful to figure out how the demographic profile of compliers (e.g. in terms of sex, age, etc.) differs from the “never-takers”, “always-takers”, and the sample as a whole.

As it turns out, this is pretty straightforward.

Suppose we care about a characteristic like age. The average age in our sample (\(\mu\)) is simply the mean of the average age amongst compliers, “never-takers” and “always-takers”, weighted by the proportion of each group:

\[ \mu_{sample} = (\mu_{c} * \pi_c) + (\mu_{at} * \pi_{at}) + (\mu_{nt} * \pi_{nt}) \]

The only “unknown” in this equation is \(\mu_{c}\). We can estimate everything else.

Consequently, we can just solve for:

\[ \mu_{c} = \frac{1}{\pi_c}\mu_{sample} - \frac{\pi_{nt}}{\pi_c}\mu_{nt} - \frac{\pi_{at}}{\pi_c}\mu_{at} \]

You can implement the procedure easily using the ivdesc package from Marbach and Hangartner.

Simulate Some Data

To see this whole machinery in action, let’s simulate some data.

We’ll make a binary instrument (\(drafted_i\)) which denotes whether or not a person was drafted.

We can also make a binary treatment (\(veteran_i\)) which denotes whether a person actually served in the military.

We’ll have compliers, always-takers, and never-takers in the following proportions:

\(\pi_c = 0.7\)
\(\pi_{nt} = 0.2\)
\(\pi_{at} = 0.1\)

Potential untreated outcomes (i.e. potential outcomes if the person doesn’t join the military) are drawn from the following normal distributions:

\[ Y_i(veteran_i=0) = \begin{cases} \mathcal{N}(5,1) & \text{if complier} \\ \mathcal{N}(3,1) & \text{if never-taker} \\ \mathcal{N}(7,1) & \text{if always-taker} \\ \end{cases} \] Finally, potential treated outcomes (i.e. the potential outcomes if the person joins the military) are:

\[ Y_i(veteran_i=1) = \begin{cases} Y_i(veteran_i=0) + 2 & \text{if complier} \\ Y_i(veteran_i=0) + 1 & \text{if never-taker} \\ Y_i(veteran_i=0) & \text{if always-taker} \\ \end{cases} \] Let’s simulate the data:

library(tidyverse)

set.seed(1)

# parameters
pi_c <-  0.7
pi_nt <-  0.2
pi_at <-  0.1
N <-  500

# making the dataset
type <- c(rep("c", pi_c*N), rep("nt", pi_nt*N), rep("at", pi_at*N))
dta <- tibble(type)

# adding treatment status as a function of type and draft status
# veteran_draft0 is the treatment status when undrafted
# veteran1 is the treatment status when drafted
dta <- dta |> 
  mutate(
    veteran_draft0 = case_when(
      type == "c" ~ 0,
      type == "nt" ~ 0,
      type == "at" ~ 1),
    veteran_draft1 = case_when(
      type == "c" ~ 1,
      type == "nt" ~ 0,
      type == "at" ~ 1)      
    )


# adding potential outcomes as a function of type and veteran status
dta <- dta |> 
  mutate(
    y_veteran0 = case_when(
      type == "c" ~ rnorm(N, mean=5, sd=1),
      type == "nt" ~ rnorm(N, mean=3, sd=1),
      type == "at" ~ rnorm(N, mean=7, sd=1)),
    y_veteran1 = case_when(
      type == "c" ~ y_veteran0 + 2,
      type == "nt" ~ y_veteran0 + 1,
      type == "at" ~ y_veteran0)      
    )

So in this dataset, we know the “true” CACE is 2, and the “true” proportion of compliers is 0.7.

So the “true” ITT = 1.4

We can also create the potential outcomes as a function of draft status:

dta <- dta |> 
  mutate(
    y_draft0 = case_when(
      type == "c" ~ y_veteran0,   
      type == "nt" ~ y_veteran0,
      type == "at" ~ y_veteran1),
    y_draft1 = case_when(
      type == "c" ~ y_veteran1,
      type == "nt" ~ y_veteran0,
      type == "at" ~ y_veteran1)      
    )

# confirm that this equals the true ITT
mean(dta$y_draft1) - mean(dta$y_draft0)

[1] 1.4

Yay, our math works!

Estimation

Now let’s run a single experiment and estimate our quantities of interest.

For simplicity, we can set the risk of being drafted at 50%.

library(randomizr)

set.seed(1)

# assigning the instrument (draft status)
exp <- dta |> 
  mutate(draft = complete_ra(N, prob=0.5))
  
# revealing treatment status and potential outcomes
exp <- exp |> 
  mutate(veteran = ifelse(draft == 1, veteran_draft1, veteran_draft0),
         y = ifelse(veteran == 1, y_veteran1, y_veteran0)) |> 
  select(y, draft, veteran)

OK now let’s do a couple of things.

First, let’s calculate the “naive” OLS estimate and store the result:

library(broom)

# ols
tidy(lm(y~veteran, data=exp), conf.int=TRUE)

# A tibble: 2 × 7
  term        estimate std.error statistic   p.value conf.low conf.high
  <chr>          <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
1 (Intercept)     4.29    0.0730      58.7 4.97e-226     4.14      4.43
2 veteran         2.76    0.107       25.9 3.35e- 94     2.55      2.97

ols_est <- summary(lm(y~veteran, data=exp))$coefficients[2]

We see that, on average, veterans are 2.76 points further to the right than non-veterans. That’s much higher than the “truth” of 2, and the confidence interval does not contain the truth.

Maybe we just got unlucky?

Let’s simulate the data a bunch of times, and take the average of the “naive” estimates:

n_sims <- 1000

v_ols <- c()

for (i in 1:n_sims) {
  
  # assigning the instrument (draft status)
  exp <- dta |> 
    mutate(draft = complete_ra(N, prob=0.5))
  
  # revealing treatment status and potential outcomes
  exp <- exp |> 
    mutate(veteran = ifelse(draft == 1, veteran_draft1, veteran_draft0),
           y = ifelse(veteran == 1, y_veteran1, y_veteran0)) |> 
    select(y, draft, veteran)
  
  # storing the estimates
  v_ols[i] <- summary(lm(y~veteran, data=exp))$coefficients[2]
  
}

# mean of the OLS estimates
mean(v_ols)

[1] 2.715546

So the mean across our simulations is 2.72. It looks like we are consistently getting the wrong answer.

Understanding Check:

Can you explain why the “naive” estimate is higher than the truth?

HINT: think about the differences in potential outcomes amongst our three “types”, as well as how these types self-select into treatment.

Calculating the CACE “by hand”

Now let’s see if we can get the correct answer applying the IV formula.

We will again create a single experiment (in fact, the same one we used before):

set.seed(1)

# assigning the instrument (draft status)
exp <- dta |> 
  mutate(draft = complete_ra(N, prob=0.5))
  
# revealing treatment status and potential outcomes
exp <- exp |> 
  mutate(veteran = ifelse(draft == 1, veteran_draft1, veteran_draft0),
         y = ifelse(veteran == 1, y_veteran1, y_veteran0)) |> 
  select(y, draft, veteran)

This time, let’s begin by estimating the ITT:

# ITT: reg y on draft status
tidy(lm(y ~ draft, data=exp), conf.int = TRUE)

# A tibble: 2 × 7
  term        estimate std.error statistic   p.value conf.low conf.high
  <chr>          <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
1 (Intercept)     4.78     0.104      46.1 5.76e-182     4.58      4.99
2 draft           1.59     0.147      10.9 8.80e- 25     1.30      1.88

itt_est <- summary(lm(y~draft, data=exp))$coefficients[2]

In our case, the ITT is 1.59, and the confidence interval contains the “truth” (1.4).

What about compliance rate?

It’s actually easy to estimate: just regress \(veteran_i\) on \(draft_i\):

# compliance rate: reg veteran on draft status
tidy(lm(veteran ~ draft, data=exp), conf.int = TRUE)

# A tibble: 2 × 7
  term        estimate std.error statistic  p.value conf.low conf.high
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 (Intercept)   0.0960    0.0211      4.56 6.59e- 6   0.0546     0.137
2 draft         0.744     0.0298     25.0  8.39e-90   0.685      0.803

pi_est <- summary(lm(veteran~draft, data=exp))$coefficients[2]

We get a compliance rate of 0.74, and again, the confidence interval contains the “truth” (0.7).

If we put these two estimates together, we get 2.14. That’s pretty close to the CACE of 2.

In fact, if we simulate it lots of times, the average across those simulations is almost exactly 2:

v_cace <- c()

for (i in 1:n_sims) {
  
  exp <- dta |> 
    mutate(draft = complete_ra(N, prob=0.5),
           veteran = ifelse(draft == 1, veteran_draft1, veteran_draft0),
           y = ifelse(veteran == 1, y_veteran1, y_veteran0))
  
  
  # estimate and store the itt
  temp_itt <- summary(lm(y ~ draft, data=exp))$coefficients[2]
  
  # estimate and store the compliance rate
  temp_pi <- summary(lm(veteran ~ draft, data=exp))$coefficients[2]
  
  # store the cace
  v_cace[i] <- temp_itt / temp_pi
  
}

# average across our experiments
mean(v_cace)

[1] 1.994525

Estimation using ivreg()

There’s just one problem with the approach we just took. In real life, we only have one experiment.

We can seprately estimate the ITT and the compliance rate, and put them together to get the CACE, but how do we get a standard error for the CACE estimate?

Actually, there are lots of different ways to do this. We’ll use the ivreg() function from the AER package.

library(AER)

# again, creating our single experiment
set.seed(1)

exp <- dta |> 
  mutate(draft = complete_ra(N, prob=0.5),
         veteran = ifelse(draft == 1, veteran_draft1, veteran_draft0),
         y = ifelse(veteran == 1, y_veteran1, y_veteran0)) |> 
  select(y, draft, veteran)
  

# ivreg
tidy(ivreg(y ~ veteran | draft, data=exp), conf.int=TRUE)

# A tibble: 2 × 7
  term        estimate std.error statistic   p.value conf.low conf.high
  <chr>          <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
1 (Intercept)     4.58    0.0884      51.8 1.40e-202     4.40      4.75
2 veteran         2.14    0.148       14.5 7.96e- 40     1.85      2.43

OK so now we get an estimate of the CACE, as well as a standard error.

Two Stage Least Squares (2SLS) Explained

What did ivreg() do actually?

It’s implementing a method called “two stage least squares” (2sls) estimation. The name comes from the fact that we are estimating two regression models:

\[ Veteran_i = \gamma_0 + \gamma_1 \; Draft_i + \omega_i \] \[ Y_i = \gamma_0 + \gamma_1 \; \widehat{Veteran_i} + \epsilon_i \]

The first-stage model is just the model for the compliance rate.

After estimating it, we then predict veteran status, and use our predictions as regressors in the second-stage model.

The basic logic is captured in the following DAG:

In this setup, part of the treatment (veteran status) is endogenously driven by self-selection, and part of it is exogenously (i.e. randomly) driven by draft status.

The 2sls procedure uses only the exogenously determined variation in veteran status to explain variation in the outcome.

To get a feel for this, let’s do it “by hand”:

# first stage regression
firststage <- lm(veteran ~ draft, data = exp)
tidy(firststage)

# A tibble: 2 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)   0.0960    0.0211      4.56 6.59e- 6
2 draft         0.744     0.0298     25.0  8.39e-90

# grabbing the predicted values
exp <- exp |> 
  mutate(vet_hat = predict(firststage))

# secondstage
secondstage <- lm(y ~ vet_hat, data = exp)
tidy(secondstage)

# A tibble: 2 × 5
  term        estimate std.error statistic   p.value
  <chr>          <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)     4.58     0.118      38.9 6.71e-153
2 vet_hat         2.14     0.197      10.9 8.80e- 25

Notice that the coefficients are the same!

The standard errors, though, are slightly different depending on whether you use ivreg() or do it in two steps by hand.

And that’s because we didn’t actually measure \(vet\_hat\)…it’s an estimate, not data.

But the lm() command in second stage regression doesn’t know that.

So we have to adjust the standard errors accordingly, which ivreg and other packages automatically do for you.

Continuous Instruments and Treatments

So far, we have worked with binary instruments and binary treatments.

And that’s because it keeps the explanations simple.

But there’s no reason why the same logic cannot apply to continuous variables.

Think about instruments as random amounts of encouragement to take the treatment.

And think about the treatment as existing in different “dosages”.

So, in a continous variable context, receiving more encouragement (higher values of \(Z\)) causes an increase in dosage (higher values of \(D\)) amongst compliers, but no increase amongst non-compliers.

Importantly, more encouragement can never cause a decrease in dosage. This shows why the “no defiers” assumption is also called “monotonicity”.

Notice that there are now different “degrees” of compliance. Some people may be strong compliers, and others may be weak compliers.

Conceptually, the CACE is now a weighted average of individual treatments effects, weighted by how responsive that individual observation is to the instrument.

Is the Instrumental (As-If) Randomly Assigned?

You may encounter IV in two different contexts:

randomized experiments with non-compliance
natural experiments, where an “exogenous” source of variation is used to instrument an “endogenous” treatment

An famous example of the latter is provided by the work of AJR (who won the Nobel prize in 2024):

These authors are interested in the effects of institutions (e.g. protection of property rights, limited government intervention in free markets) on economic growth.

The problem is, richer places may be able to “afford” better institutions, so there’s endogeneity between X and Y.

To instrument for institutions, they look at a sample of ex-colonies, and use the colonial disease environment. Here’s the argument in brief.

Different colonization policies: “extractive states” (Belgian Congo) vs. “Neo-Europes” (colonial New England) \(\Rightarrow\) variation in institutions.
The colonization strategy was dependent on the feasibility of European settlement, as measured by settler mortality
Early institutions persisted even after colonial independence, setting the stage for “modern” economic growth

The question is, since settler mortality is not randomly assigned, are there any “backdoors” which can bias the IV estimates?

Here’s the DAG:

Of course, it’s possible to control for such backdoors (you just have to stick these covariates in both the first stage and second stage regressions).

The assumption is that, conditional upon these controls, the instrument is randomly assigned.

But this is where IV starts to break down. How do we know that we have controlled for all possible sources of omitted variable bias?

We are on much safer ground when the instrument is truly random (e.g. in the military draft lottery case).

Exclusion Restriction

A second threat to validity comes from violations of something called the exclusion restriction.

The main idea is that the instrument can affect the outcome only through the treatment, and no other channel.

In our example, suppose being drafted also caused people a great deal of stress, and stress itself is related to political attitudes:

In this case, we would be (mis)attributing the part of the effect that runs through the stress pathway to military service.

If you read IV papers, you will find that the authors pay a lot of attention to defending the assumption that the exclusion restriction holds.

But in the end, it’s still an assumption.

Weak Instruments

With IV, we need to also worry about whether our instrument is “strong.” One intuitive way to think about this is to ask: “do we have enough compliers”?

Think about the CACE formula:

\[ CACE = \frac{ITT}{\pi_c} \]

At the extreme, if we have no compliers, the CACE does not exist!

But even if we just had a few compliers, our estimate for the CACE is going to be very noisy (large standard errors).

What’s worse, suppose we have a natural experiment, and our ITT is just slightly biased (maybe due to a small violation of the independence assumption). In this case, since we are dividing our biased ITT by a very small \(\pi_c\), the amount of bias is going to blow up!

Finally, and this is a bit subtle, we care about the actual number of compliers, not just the compliance rate. Even if the instrument is randomly assigned in the population, in a finite sample the relationship between the instrument and the non-instrumented parts of Y is going to be at least a little nonzero, just by random chance.

The smaller the sample is, the more often this “nonzero by random chance” is going to be fairly large, driving the instrument to not be quite valid in a given sample and giving you a biased estimate.

The solution of course is just to not only have a higher compliance rate, but also a larger sample, so that you literally have more compliers.

You can test for weak instruments by looking at the first-stage regression:

\[ Veteran_i = \gamma_0 + \gamma_1 \; Draft_i + \omega_i \]

summary(lm(veteran ~ draft, data= exp))


Call:
lm(formula = veteran ~ draft, data = exp)

Residuals:
   Min     1Q Median     3Q    Max 
-0.840 -0.096 -0.096  0.160  0.904 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.09600    0.02107   4.555 6.59e-06 ***
draft        0.74400    0.02980  24.963  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3332 on 498 degrees of freedom
Multiple R-squared:  0.5558,    Adjusted R-squared:  0.5549 
F-statistic: 623.1 on 1 and 498 DF,  p-value: < 2.2e-16

And look at the F-statistic. If this number is larger than 10, then you are probably OK.

Note that the F-stat increases both when (i) the compliance rate is higher and (ii) the sample size increases.