Hypothesis Testing: A Bayesian Perspective

life-after-death

A hypothesis test is a statistical method of making decisions, given data. Frequentist inference contrasts a null hypothesis with an alternative hypothesis. For example, given a set of measurements on two subsets of a population, the null hypothesis may be that the means of the measurements do not differ, while the alternative hypothesis may be that the means differ. In this example, suppose height was measured for males and females. The mean female height was 64 inches, and the mean male height was 70 inches. Do these means differ, indicating separate subpopulations, or are they effectively the same?

To estimate this hypothesis test with frequentist inference, a level of statistical significance (α) must be chosen arbitrarily. The most common α is 0.05. The size of the statistical sample affects statistical significance, in this case usually allowing any null hypothesis to be rejected given a large enough sample. If the sample consisted of 10 females and 10 males, then it is less likely that the null hypothesis may be rejected at a given α, than if the sample consisted of 100,000 females and 100,000 males.

Finally, a p value is estimated, and often used in frequentist hypothesis testing to reject, or fail to reject, the null hypothesis. A p value ranges from 0 to 1, and is interpreted as the probability of obtaining a result at least as extreme as the observed result, given that the null hypothesis is true. P values are often misinterpreted as the probability that the null hypothesis is true. In this example, if the p value is less than or equal to 0.05, then a frequentist would typically reject the null hypothesis and conclude that the means of females and males are statistically different. Many modern frequentists understand the issues with p values, and instead emphasize frequentist confidence intervals, which are debated elsewhere on this site.

Due to the frequentist interpretation of probability, which considers long run frequencies rather than merely the data at hand, it is impossible to assign a probability to a hypothesis, given the data. In an individual case, it may be desirable to estimate the probability of guilt vs. innocence. This is possible only with Bayesian inference, because frequentist inference would estimate whether or not to reject the null hypothesis in the long run. Such an estimation should not be permissible in court, because it is the immediate case that is being considered, not long run estimates.

The above examples are oversimplified regarding frequentist hypothesis testing. Frequentist hypothesis testing has been widely criticized since it was introduced, yet it remains the dominant form of hypothesis testing in everything from scholarly journals to statistical classes, and in non academic practice. Below are some Bayesian criticisms of frequentist hypothesis testing, as well as a brief introduction to Bayesian hypothesis testing.

Criticisms of Frequentist Hypothesis Testing

The goal of the hypothesis test in the above examples should be to estimate which hypothesis is more probable, given the data. Frequentist hypothesis tests are literally unable to assign a probability to a hypothesis. The directionality of frequentist hypothesis testing is also reversed. Even the name "hypothesis test" implies that it is a hypothesis that is getting tested. Frequentist hypothesis tests do not technically test any hypothesis, given the data. Instead, frequentists test the data, given the hypothesis. Bayesians test the hypothesis, given the data. If you were on trial and the result of a study of your innocence vs. guilt is important to you, then you would want the hypotheses to be estimated given the data, not the converse, and certainly not based on long run frequencies estimating what would happen with repeated sampling of you regarding your case, knowing the immediate result with you and your case may, or may not, be represented in the results (see confidence intervals).

Bayesian Hypothesis Testing

Bayesian hypothesis testing estimates the probability of a hypothesis, given the data. Bayesians may use Bayes factors to compare hypotheses. Unfortunately, in practice these may be difficult to calculate. Spiegelhalter's DIC (Deviance Information Criterion) is an alternative to hypothesis testing for model comparison and selection, allowing Bayesians to compare the model fit of multiple models, even when each model uses a different methodology, attempting to estimate the same dependent variable in a data set whereas frequentist model fit statistics such as AIC (Akaike's Information Criterion) cannot be compared among multiple models in this way. DIC is valid only when the posterior distribution is approximately multivariate normal, which is usually the case.

Since differences exist between Bayesian and frequentist methods of hypothesis testing, the results of the two methods of inference can vary from being nearly identical in some cases, to being very different in others. When results differ due to the interpretation of probability, all else being equal, which is preferable? Frequentist inference is incoherent in a Bayesian sense, though Bayesian inference is coherent in a frequentist sense. The preferable answer is Bayesian inference.

The main reason that frequentist inference became the dominant interpretation of probability is because it has been, traditionally, much easier to estimate solutions to problems with frequentist methods. Until recently, most real world problems became unreasonable to estimate with Bayesian methods due to mathematical rigor, or outright mathematical intractability. Advances in MCMC (Markov chain Monte Carlo) and computers have finally enabled statisticians to use Bayesian inference, and the Laplace Approximation is gaining popularity as well. It is now possible to estimate Bayesian solutions to complex problems that are beyond the capability of frequentist inference. The tables have finally turned, and now it is merely a matter of time for Bayesian inference to dominate academic and non academic use.

http://www.bayesian-inference.com/hypotheses

Comments