5.1 ANOVA

ANOVA is a statistical method that is used to compare the means of more than two groups. ANOVA belongs to the family of the general linear model (GLM) There are other techniques in the GLM family such as linear multiple regression. The difference between other GLM techniques and ANOVA is that the independent variables in ANOVA tend to be categorical rather than metrics The core concept: ANOVA uses the between-groups and within-groups variance estimates to assess whether group means differ.

Formula: \[\frac{SS_{between}}{df_{between}} / \frac{SS_{within}}{df_{within}}\] where \(SS_{between}\) is the sum of squares between groups, \(df_{between}\) is the degrees of freedom between groups, \(SS_{within}\) is the sum of squares within groups, and \(df_{within}\) is the degrees of freedom within groups. \(SS_{between}\) is the sum of the squared differences between the group means and the overall mean. \(SS_{within}\) is the sum of the squared deviations of each score from its respective group mean, with the results from all the groups summed together.

Within-Groups Sum-of-Squares \[SS_{within} = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (x_{ij} - \bar{x_i})^2\] where \(k\) is the number of groups, \(n_i\) is the number of observations in group \(i\), \(x_{ij}\) is the \(j\)th observation in group \(i\), and \(\bar{x_i}\) is the mean of group \(i\).

Between-Groups Sum-of-Squares \[SS_{between} = \sum_{i=1}^{k} n_i (\bar{x_i} - \bar{x})^2\] where \(k\) is the number of groups, \(n_i\) is the number of observations in group \(i\), \(\bar{x_i}\) is the mean of group \(i\), and \(\bar{x}\) is the overall mean.

F-ratio

F-ratio is the ratio of the between-groups variance to the within-groups variance. The F-ratio is used to test the null hypothesis that the group means are equal. The F-ratio is calculated as follows: \[F = \frac{SS_{between}}{SS_{within}}\] If F-ratio is larger than 1, then it is likely that at least one of the groups is from a population with a different mean.