The Monty Hall Problem

To introduce you to simulations in R, we will study the famous Monty Hall problem.

Here’s the setup: there used to be a American game show called “Let’s make a deal”.

The contestant has the opportunity to win a new car, which is hidden behind 1 of 3 doors:

Behind the other two doors are goats.

The contestant begins by selecting one of the three doors.

The game show’s host (Monty) then opens one of the unchosen doors, revealing a goat. Importantly, the door that Monty opens always has a goat behind it.

Monty then asks the contestant whether they want to STAY with their chosen door, or SWITCH to the remaining closed door.

That’s the game!

Discuss:

Suppose you were a contestant in this game. Should you STAY or SWITCH?

Or does it not matter?

Simulation

library(tidyverse)

# Number of simulations to run
sims <- 1000

# Counter for the number of cars won if you ALWAYS choose STAY
WinStay <- 0

# Counter for the number of cars won if you ALWAYS choose SWITCH
WinSwitch <- 0


# Setting up the simulation
doors <- c(1,2,3)

# set random seed for reproducibility
set.seed(1)

# now run the loop
for (i in 1:sims) {
  
  
  #  one door at random has the car behind it
  WinDoor <- sample(doors, 1)
  
  # Participant selects a door a random
  choice <- sample(doors, 1)
  
  # if you picked the right door, you win by STAYING
  if (WinDoor == choice)
    WinStay <- WinStay + 1
  
  # if you picked the wrong door, you win by SWITCHING
  if (WinDoor != choice)
    WinSwitch <- WinSwitch + 1
}

print(paste("WinStay =", WinStay))

[1] "WinStay = 326"

print(paste("WinSwitch =", WinSwitch))

[1] "WinSwitch = 674"

If you always chose STAY, you would win 326 out of the 1000 simulated games.

But if you always chose SWITCH, you would win 674 out of the 1000 simulated games.

Why did we do this?

One reason is that, in experiments (and statistics more generally), we are dealing with chance processes.

For instance, we will encounter many examples were people are randomly (or quasi-randomly) allocated into “treatment” and “control” groups.

We can compare the groups on some average outcomes, but we need to keep in mind that this is only one way that the experimental randomization could have turned out.

That fact generates uncertainty.

By using simulations, we can explore what would have happened in each of these “alternate universes”, and therefore see how our answers would have changed if our experiment would have been run differently.

More generally, simulations help us to clarify for ourselves the logic of the data generating process (i.e. our “model of the world”) that are interested in learning about.

And since we programmed the data generating process ourselves – and we therefore know the “true model” – we can also study how well this “truth” is recovered by the estimates that we derive from our sample.