4.4 The Null Hypothesis Significance Test (NHST)

The null hypothesis is that there is no significant difference between the means of the two groups. There are a few steps comprised of the NHST process: - Begin with the null hypothesis. - Select the probability level at which the null hypothesis is rejected (the “alpha level”). A standard alpha level is.05, whereas a stricter level can be.01 or.005. - The significant value, denoted by the letter p, is determined by gathering data and performing a statistical test, such as the t-test of two independent means. - Reject the null hypothesis if the estimated value of p is less than the alpha level that was previously selected—for instance, if p =.049 while alpha was selected as.05. - When the null hypothesis is rejected, this can be viewed as evidence in favor of some unspecified alternative hypothesis. The results of that significance test may not specifically state what that alternative hypothesis might be or the likelihood that any particular alternative hypothesis may be correct. - Reject the null hypothesis if p is higher than the alpha level that was previously chosen—for instance, if p =.051, while alpha was set at.05. Failure to reject the null hypothesis, does not imply acceptance of the null hypothesis; rather, it indicates that we lack sufficient data to support either hypothesis. Similarly, the p-value does not answer the query regarding the likelihood of the null hypothesis.

4.4.1 Flaws of NHST

The NHST comes with flaws. No information can be provided by p-value about the magnitude of the likely difference between the means. For instance, it is not enough to tell the decision maker that there is a difference between automatic and manual transmissions, knowing that the magnitude of the difference is more important. One way to solve this problem is to provide effect size which is the strength or magnitude of the statistical finding.

Effect Size

The effect size is a measure of the strength of the relationship between two variables. Researchers aim to maximize the variations in group means as well as other statistical effects by obtaining the highest effect sizes possible. Cohen’s d provides a standardized measure of the size of the mean differences: divide the mean difference by the pooled standard deviation of the two samples. Interpretation of Cohen’s d of the following values: the d estimate of 0.49 means that automatic transmissions were nearly a half standard deviations more horsepower than manual transmissions

# install.packages("lsr")
library(lsr)
cohensD(c(mtcars$hp[mtcars$am==0]), c(mtcars$hp[mtcars$am==1]))
## [1] 0.4943081

Another flaw is that the null assumption of no difference makes no sense conceptually and intuitively. The issue is ameliorated by the concept of the “region of practical equivalence” (ROPE) (Kruschke), which is a set of values that the study can really take into account as being equivalent to the null hypothesis. For example, we could have made it clear in our mtcars statistics that if the automatic and manual transmissions differed from one another by less than 2 mpg (-2 or +2 mpg), we would consider them to be equivalent.