The Monty Hall Problem
To introduce you to simulations in R, we will study the famous Monty Hall problem.
Here’s the setup: there used to be a American game show called “Let’s make a deal”.
The contestant has the opportunity to win a new car, which is hidden behind 1 of 3 doors:
Behind the other two doors are goats.
The contestant begins by selecting one of the three doors.
The game show’s host (Monty) then opens one of the unchosen doors, revealing a goat. Importantly, the door that Monty opens always has a goat behind it.
Monty then asks the contestant whether they want to STAY with their chosen door, or SWITCH to the remaining closed door.
That’s the game!
Simulation
library(tidyverse)
# Number of simulations to run
<- 1000
sims
# Counter for the number of cars won if you ALWAYS choose STAY
<- 0
WinStay
# Counter for the number of cars won if you ALWAYS choose SWITCH
<- 0
WinSwitch
# Setting up the simulation
<- c(1,2,3)
doors
# set random seed for reproducibility
set.seed(1)
# now run the loop
for (i in 1:sims) {
# one door at random has the car behind it
<- sample(doors, 1)
WinDoor
# Participant selects a door a random
<- sample(doors, 1)
choice
# if you picked the right door, you win by STAYING
if (WinDoor == choice)
<- WinStay + 1
WinStay
# if you picked the wrong door, you win by SWITCHING
if (WinDoor != choice)
<- WinSwitch + 1
WinSwitch
}
print(paste("WinStay =", WinStay))
[1] "WinStay = 326"
print(paste("WinSwitch =", WinSwitch))
[1] "WinSwitch = 674"
If you always chose STAY, you would win 326 out of the 1000 simulated games.
But if you always chose SWITCH, you would win 674 out of the 1000 simulated games.
Why did we do this?
One reason is that, in experiments (and statistics more generally), we are dealing with chance processes.
For instance, we will encounter many examples were people are randomly (or quasi-randomly) allocated into “treatment” and “control” groups.
We can compare the groups on some average outcomes, but we need to keep in mind that this is only one way that the experimental randomization could have turned out.
That fact generates uncertainty.
By using simulations, we can explore what would have happened in each of these “alternate universes”, and therefore see how our answers would have changed if our experiment would have been run differently.
More generally, simulations help us to clarify for ourselves the logic of the data generating process (i.e. our “model of the world”) that are interested in learning about.
And since we programmed the data generating process ourselves – and we therefore know the “true model” – we can also study how well this “truth” is recovered by the estimates that we derive from our sample.