Intro to Bayesian Statistics
I find the Bayesian view of statistics a helpful one for thinking about various issues, so this post will be an overview of the ideas I may be using from that point of view.
There are 2 ways to view probability. In the traditional (frequentist) view, the probability of an event is the relative frequency that the event occurs as the experiment keeps being repeated. Eg. you toss a coin 10 times and perhaps it comes up heads 4 times (40% of the time), you keep tossing the coin up to a total of 1000 times and 49.2% of the time it comes up heads, you toss it 1000,000 times and 50.01% of the time it comes up heads. If the coin keeping being tossed would mean that the relative frequency would converge to 50%, then the probability is 0.5. Statistics based on this view of probability are called frequentist Statistics.
In the Bayesian view, and the one I will be using from now on, probability is subjective. This is usually expressed in terms of a bet. If a person would be willing to bet $0.60 to win $1 if the coin comes up heads, and would also accept the opposite bet of $0.40 to win $1 if the coin comes up tails, then their subjective probability of heads is 0.6. For another person the probability could be different.
In Bayesian statistics, experimental data X can be combined with a subjective probability to produce an updated probability. The initial probability P(A) is called the prior probability, while the updated probability, which is the probability of A conditioned on the data X, P(A|X) is called the posterior probability.
Conditional probability P(Y|Z) is defined as what you will bet on Y occuring, where the bet is only carried out if Z occurs. (If Z is known to have occured, then this conditional probability is not meaningful.)
Bayes’ law provides the way to combine the data with the prior probability. We will use Ac to denote the complement of A, the event where A does not occur.
Since we know that the data X has occured, it does not make sense to speak of the probability of X, so we represent P(X|A) instead by L(A;X), the likelihood:
where the likelihoods can be interpreted as representing how likely A and Ac are given the data X. The relative size of the likelihoods determines the relative sizes of the prior and posterior probabilities. If the likelihoods are equal, the probabilities are equal. If L(A;X)>L(Ac;X) then P(A|X)>P(A), if L(A;X)<L(Ac;X) then P(A|X)<P(A).
These ideas can be extended if we are concerned with more than 2 outomes, but the above will be sufficient for my needs. I hope the above is understandable, let me know if I need to add any clarifications.