I wanted to work through an example of applying Bayes rule to update model paremeters based on toy data This example comes from Kruschke’s Doing Bayesian Data Analysis, Section 5.3.
The model is that we have a coin and we’re trying to estimate the bias in the coin, that is the probability that it will come up heads when flipped. For simplicity we assume the bias,
theta is a multiple of 0.1. We take a triangular prior centred at 0.5.
# Simple model; parameter can only be a multiple of 0.1 theta <- seq(0, 1, by=0.1) # Prior is a triangle function prior <- c(0, 0.04, 0.08, 0.12, 0.16, 0.2, 0.16, 0.12, 0.08, 0.04, 0)
Impact of seeing a head
Let’s consider how the model distribution changes by Bayes’ rule in the case we flip the coin once and see a head.
The likelihood of getting a head given
theta, is just
theta (because it’s the head bias by definition)
likelihood <- theta
Then Bayes’ rule says the posterior is proportional to the prior and the likelihood. We can ignore the constant of proportionality by normalising it to 1.
# Posterior given the head posterior <- prior * likelihood posterior <- posterior / sum(posterior)
The posterior has moved markedly to the right, it’s more likely the coin is head biased now. The expected of bias has changed from 0.5 with the prior distribution, to 0.58 after incorporating the data of having seen one heads. The most likely value for
theta is still 0.5.
Second flip of the coin
After our first flip our new prior is the posterior given the first flip was heads.
prior <- posterior
Suppose we flipped another heads, then we can apply Bayes’ rule again to get the posterior given two heads.
posterior <- prior * likelihood posterior <- posterior / sum(posterior)
The most likely value of
theta is now 0.7, and the expectation value is 0.637931.
What if we had head then tails?
Suppose the second flip gave tails instead of heads. Then our likelihood of tails is the likelihood of not flipping heads,
1 - theta. Then we can get the posterior by applying Bayes’ rule as before
likelihood <- 1 - theta posterior <- prior * likelihood posterior <- posterior / sum(posterior)
Expectation value is 0.5, as you would expect by symmetry. Notice that the distribution is less spread around 0.5 than our original prior.
After 8 coin flips
What would our posterior look like after 8 coin flips? Let’s look at each possible result of 8 coin flips, from 0 heads and 8 tails to 8 heads and 0 tails.
num_flips <- 8 heads <- seq(0, num_flips) tails <- num_flips - heads
It would also be useful to contrast how the posterior is to if we had assumed a uniform prior.
uniform_prior <- rep(1/11, 11)
Let’s combine all the scenarios in a dataframe, calculating the posterior after the 8 flips.
df <- tibble(heads=rep(heads, each=length(theta)), tails=rep(tails, each=length(theta)), theta=rep(theta, num_flips+1), uniform_prior = rep(uniform_prior, num_flips+1), triangle_prior=rep(prior, num_flips + 1)) %>% mutate(label = paste0(heads, " heads, ", tails, " tails"), likelihood = (theta ^ heads) * ((1-theta) ^ tails)) %>% gather("prior_model", "prior", ends_with('_prior')) %>% group_by(heads, prior_model) %>% mutate(posterior = prior * likelihood / sum(prior * likelihood)) %>% ungroup()
The likelihood is
theta^heads * (1-theta)^tails. Notice how the most likely value for
theta depends on the outcomes.
After 8 flips the posterior distribution for
theta changes much more depending on the outcome. Notice that the uniform prior moves more quickly with more data, and the triangular prior is more confident when the outcomes are close to equal.
This is just a simple example but I found running through the calculations helps me understand the idea of estimating parameter distributions rather than just estimating the most likely parameter.