A Reading Guide to Stein's Paradox
math

A Reading Guide to Stein's Paradox

Stein's Paradox states that when trying to estimate the 3 or more means of normally distributed data together, it's always better (on average) to shrink the estimates. Specifically if you've got p independent normally distributed variables \(X_i \sim N(\theta_i, 1) ;\, i=1,\ldots,p\) the best estimates for minimising the mean squared error of all the estimates isn't the values themselves \(X\), and the James-Stein estimator is better (has strictly lower risk).

Probability Jaccard
math

Probability Jaccard

I don't like Jaccard index for clustering because it doesn't work well on sets of different sizes. Instead I find the concepts from Association Rule Learning (a.k.a market basket analysis) very useful. It turns out Jaccard Similarity can be written in terms of these concepts so they really are more general. The main metrics in association rule mining are the confidence, which for pairs is just the conditional probability \( P(B \vert A) = \frac{P(A, B)}{P(A)} \) There is also the lift which is how much more likely than random (from the marginals) the two events are likely to occur together \( \frac{P(A, B)}{P(A)P(B)} \).

Beta Function
math

Beta Function

The Beta Function comes up in the likelihood of the binomial distribution. Understanding its properties is useful for understanding the binomial distribution. The beta function is given by \( B(a, b) = \int_0^1 p^{a-1}(1-p)^{b-1} \rm{d}p \) for a and b positive. If you have \(N\) flips of a coin of which \(k\) turn heads the likelihood is proportional to \( p^{k}(1-p)^{N-k} \) for the probability p between 0 and 1. So the beta function can be seen as the normaliser of the likelihood, with \( a = k + 1 \) and \( b = N - k + 1 \) (or inversely \( k = a - 1 \) and \( N = a + b - 2 \)).

From Bernoulli to Binomial Distributions
data

From Bernoulli to Binomial Distributions

Suppose that you flip a fair coin 10 times, how many heads will you get? You'd think it was close to 5, but it might be a bit higher or lower. If you only got 7 heads would you reconsider you assumption the coin is fair? What if you got 70 heads out of 100 flips? This might seem a bit abstract, but the inverse problem is often very important. Given that 7 out of 10 people convert on a new call to action, can we say it's more successful than the existing one that converts at 50%?