10.1 Repeated Measures

If we have two groups of data measured at two points in time (A1, B1, C1) and (A2, B2, C2) where A, B, C are three different subjects. The within-group variance is a function of individual differences, which has a large variance. By repeated-measures, we can examine the individual differences. Let’s look at the weights of chicks in grams versus days after hatching.

boxplot(weight ~ Time, data=ChickWeight)

ch16index <- ChickWeight$Time == 16
ch18index <- ChickWeight$Time == 18
bothChicks <- ChickWeight[ch16index | ch18index,] # Both sets together
time16weight <- bothChicks[bothChicks$Time == 16,"weight"]
time18weight <- bothChicks[bothChicks$Time == 18,"weight"]
cor(time16weight,time18weight)         # Are they correlated?
## [1] 0.9789155

The correlation is 0.98, which means that the weights of the chicks at 16 days and 18 days are highly correlated. Let’s analyze whether they are two independent groups.

library(BEST)
mean(time16weight)
## [1] 168.0851
# mean(time18weight)
# Independent groups t-test
t.test(time18weight,time16weight,paired = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  time18weight and time16weight
## t = 2.0446, df = 88.49, p-value = 0.04386
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   0.6216958 43.5910701
## sample estimates:
## mean of x mean of y 
##  190.1915  168.0851
BESTmcmc(time18weight,time16weight)
## MCMC fit results for BEST analysis:
## 100002 simulations saved.
##          mean     sd median   HDIlo  HDIup Rhat n.eff
## mu1    189.86  8.700 189.82 172.910 207.09    1 61598
## mu2    167.91  6.994 167.89 154.091 181.59    1 62435
## nu      36.60 29.509  27.99   2.982  95.76    1 19116
## sigma1  56.63  6.751  56.17  44.007  70.33    1 45051
## sigma2  46.01  5.550  45.63  35.373  57.10    1 43351
## 
## 'HDIlo' and 'HDIup' are the limits of a 95% HDI credible interval.
## 'Rhat' is the potential scale reduction factor (at convergence, Rhat=1).
## 'n.eff' is a crude measure of effective sample size.

The t.test with paired=FALSE, treats the two groups as independent, calculates a separate variance for each group, and then pools the variance to calculate the t-test. The 95% CI suggests the mean difference is barely larger than 0, and the 95% HDI for the two means overlap, suggests that there is no significant difference between these two means when two groups are considered independent. The paired=False is incorrect analysis.

Now let’s treat these two groups as dependent.

t.test(time18weight,time16weight,paired = TRUE)    # Dependent groups t-test
## 
##  Paired t-test
## 
## data:  time18weight and time16weight
## t = 10.136, df = 46, p-value = 2.646e-13
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  17.71618 26.49658
## sample estimates:
## mean difference 
##        22.10638

This paired t-test means the data is treated as matched pairs. The CI is much narrower, meaning a greater certainty that these two groups are different. When we use a dependent samples test, we eliminate the individual differences and focus on the change over time for each case. This is the same as the following:

weightDiffs <- time18weight - time16weight
t.test(weightDiffs)  # Run a one sample t-test on difference scores
## 
##  One Sample t-test
## 
## data:  weightDiffs
## t = 10.136, df = 46, p-value = 2.646e-13
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  17.71618 26.49658
## sample estimates:
## mean of x 
##  22.10638
# Run the Bayesian one- sample test on difference scores
BESTmcmc(weightDiffs)
## MCMC fit results for BEST analysis:
## 100002 simulations saved.
##        mean     sd median  HDIlo  HDIup  Rhat n.eff
## mu    21.97  2.290  21.96 17.466  26.48 1.000 60520
## nu    40.47 30.880  32.01  3.121 102.78 1.001 22872
## sigma 14.98  1.715  14.84 11.714  18.37 1.000 50313
## 
## 'HDIlo' and 'HDIup' are the limits of a 95% HDI credible interval.
## 'Rhat' is the potential scale reduction factor (at convergence, Rhat=1).
## 'n.eff' is a crude measure of effective sample size.

By combining the two sets of measurements, we are appropriately considering the two groups of chicks as a single group at two distinct times.

Instead of comparing cases across only one set of points, repeated measures ANOVA enables us to do so over two or more time points. We need to first make sure about the balanced design, which means that each subject is measured at each time point. Not all the ChickWeight is measured at each time, so we need to subset the data.

chwBal <- ChickWeight
chwBal$TimeFact <- as.factor(chwBal$Time)
list <- rowSums(table(chwBal$Chick,chwBal$TimeFact))==12 # Keep only those with 12 observations
list <- list[list==TRUE]
list <- as.numeric(names(list))    # Extract the row indices
chwBal <- chwBal[chwBal$Chick %in% list,]  # Match against the data

chwBal now keeps only the data with 12 observations for each chick. Now we run an ANOVA test on the data to see the effects of Time on weight:

summary(aov(weight ~ TimeFact + Error(Chick), data=chwBal))
## 
## Error: Chick
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 44 429899    9770               
## 
## Error: Within
##            Df  Sum Sq Mean Sq F value Pr(>F)    
## TimeFact   11 1982388  180217   231.6 <2e-16 ***
## Residuals 484  376698     778                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA test is to examine whether weight vary over time. It is noted that “Error(Chick)” specifies the individual differences among chicks as error variance that we want to leave out of the F-test. We are able to eliminate individual differences’ confounding effects from the aggregate error term, increasing the likelihood that we can spot change over time.

  • The Error: Chick section: the Residual refers to the variance attributed to the individual differences among the chicks. The key benefit of repeated measures ANOVA is that we can isolate and set aside this component of the dependent variable’s variance, preventing it from showing up in the denominator of our F-test.
  • The second section: F(11,484) = 231.6,p < .001. The df=11 reflects the 12 points and the df=484 reflects the remaining error variance that is not attributable to individual differences. The p-value suggests that the weight of the chicks vary over time. The eta-squared effect size for the influence of the Time variable from the ANOVA table is: 1982388/(1982388+376698+429899) = 0.71. This is the proportion of variance in weight that is accounted for by Time.
# install.packages("ez")
library("ez")
ezANOVA(data=chwBal,dv=.(weight),within=.(TimeFact),wid=.(Chick), detailed=TRUE)
## $ANOVA
##        Effect DFn DFd     SSn      SSd        F             p p<.05       ges
## 1 (Intercept)   1  44 8431251 429898.6 862.9362  1.504306e-30     * 0.9126857
## 2    TimeFact  11 484 1982388 376697.6 231.5519 7.554752e-185     * 0.7107921
## 
## $`Mauchly's Test for Sphericity`
##     Effect            W             p p<.05
## 2 TimeFact 1.496988e-17 2.370272e-280     *
## 
## $`Sphericity Corrections`
##     Effect       GGe        p[GG] p[GG]<.05       HFe       p[HF] p[HF]<.05
## 2 TimeFact 0.1110457 7.816387e-23         * 0.1125621 4.12225e-23         *

The first line contains the individual indifferences. The second line contains the variance attributed to the time factor. The p-value suggests that we can reject the null hypothesis of no change in weight over time. The final value on the second line, 0.71 is the eta-squared.

The next section is the test for homogeneity of variance of the differences among pairs of time groups. The p-value is 0.0001, which suggests that the variance of the differences is not equal across the time groups. This suggests a potential a violation of the assumption of homogeneity of variance, which is a requirement for the ANOVA test. We can use the Greenhouse-Geisser correction or Huynh-Feldt correction to the degrees of freedom to counteract the possible inflation of the F-ratio. The p value of the correction suggests that the inflation in the F-ratio does not affect the decision to reject the null hypothesis.