Computer Laboratory

Course pages 2013–14

Experimental Methods

Statistical analysis

Variance

  • Experiments are unlikely to have binary conclusions
  • Variance in measurements arises for two main reasons
    • Changes in independent variables
    • Random fluctuations
  • Model effects of the former to minimise residual effects of the latter

Probability revision

  • Population of size N
    • Mean μ = ΣXi / N
    • Variance σ2 = Σ(Xi-μ)2 / N
    • Covariance Σ(Xi-μ)(Yi-ν) / N
  • Sample of size n
    • Estimated mean m = ΣXi / n
    • Estimated variance s2 = Σ(Xi-m)2 / (n-1)
  • Median (halfway through sample data)
    • Quartiles, deciles, percentiles
  • Mode (most frequent occurrence)

Linear regression

  • Given data series (xi) and (yi) each with n values
    • Fit a model Yi = a + bXi + εi
    • where residual εi ~ N(0,σ2)
    • to estimate the intercept a, regression coefficient (slope) b
      and residual variance σ2
    • that minimise the sum of squared residuals (SSR) Σεi2
  • Minimising SSR by taking partial derivative w.r.t. b gives
    • Estimated b = Cov(xi,yi) / Var(xi)
  • Requiring Σεi = 0 gives
    • Estimated a = Mean(yi) – b Mean(xi)
  • Then estimated σ2 = Σεi2 / (n-2)
  • Pearson correlation coefficient
    • r2 = Cov(xi,yi)2 / (Var(xi) × Var(yi))
  • Total variance is sum of model variance and residual variance
  • Proportion of variance explained by the linear model is also
    • r2 = Variancemodel / Variancedata

Linear regression example

  • (xi) = (1,2,3,4)
    (yi) = (1,3,2,4)
  • Mean(xi) = 10 / 4 = 2.5
    Var(xi) = 5 / 3 ≈ 1.667
    Cov(xi,yi) = 4 / 3 ≈ 1.333
  • Estimated b ≈ 1.333 / 1.667 ≈ 0.8
    estimated a = 2.5 – 0.8×2.5 = 0.5
  • Pearson r2 ≈ (1.333×1.333) / (1.667×1.667) ≈ 0.64
    Variancemodel / Variancedata = (3.2/3) / (5/3) = 0.64
  • so linear model explains 64% of total variance

Linear regression in R

  • Given data series x and y
  • Look at the data
    plot (x,y)
  • Calculate the intercept and slope
    m <- lm (y~x)
  • Superimpose the line of best fit
    abline (m)
  • Observe residuals
    segments (x,fitted(m),x,y)
  • Test the null hypothesis that slope b = 0
    summary (m)

Normal distribution

  • Recall normal distribution
  • X ~ N(μ,σ2)
  • Z-value Z = (X-μ) / σ
  • Z ~ N(0,1)
  • If Zi ~ N(0,1) for 1 ≤ i ≤ n and S = ΣZi / n
  • S ~ N(0,1) in the limit for large n
  • Models should give normally distributed residuals for statistics to work

Chi-square and t distributions

  • Z ~ N(0,1)
  • U = Z2
  • U ~ Χ2 (chi-square)
  • If Ui ~ Χ2 for 1 ≤ i ≤ n and V = ΣUi
  • V ~ Χn2
  • tn = Z / √(V/n)
  • t test for independence with t = Est(b) / StdErr(Est(b))

Comparing text entry systems

  • Two methods: Qwerty & Opti
  • 4 participants
  • 3 sessions with each method
  • One half tried Qwerty then Opti, the other half tried Opti then Qwerty

Contingency tables

  • Data is often collected in a matrix with one row for each participant:
    Partic't Q1 Q2 Q3 O1 O2 O3
    1       
    2       
    3       
    4       
  • However, it may be convenient to arrange the data into a vector with one measurement for each row and columns for each of the factors:
    Partic't Method Group Trial Time
    1 Q QO 1
    2 Q QO 1
    3 Q OQ 1
    4 Q OQ 1
    1 Q QO 2
    ...

Preliminary investigation in R

  • Read times from spreadsheet of comma separated values
    typing <- read.csv ("Text entry times.csv",header=T)
    attach (typing)
  • Concatenate all times
    times <- c (Q1,Q2,Q3,O1,O2,O3)
  • Describe factors
    method <- gl (2,12,labels=c("Qwerty","Opti"))
    session <- gl (3,4,24)
    group <- gl (2,2,24)
    participant <- gl (4,1,24)
  • Inspect data
    plot (method:session,times)
    plot (method:group,times)
  • Work out average times
    qwerty <- (Q1+Q2+Q3)/3
    opti <- (O1+O2+O3)/3
  • Look for correlation
    plot (qwerty,opti)
    m <- lm (opti~qwerty)
    abline (m)
    summary (m)
  • Check for normally distributed residuals
    hist (resid(m),prob=T)
    lines (density(resid(m)))
  • Compare quantiles with normal distribution
    qqnorm (resid(m))
    qqline (resid(m))
  • Conduct a t test
    t.test(qwerty,opti,paired=T)

Power estimation

The power of a test is the probability of (correctly) rejecting the null hypothesis when it is false. The power of a two-sample t test will depend on several factors:

  • The sample sizes m and n
  • The real difference Δ = |μXY|.
  • The smallness of the population standard deviation σ.
  • The significance level α at which the test is done.
  • If the population means differ by δ and the populations share the same variance s2, then
    n ≈ 16 × s2 ÷ δ2
    estimates the number of replicates required to reject the null hypothesis with power = 80%

Power calculation in R

  • The function power.t.test will calculate an appropriate value for any one of the parameters from the others.
  • The arguments are:
    • n - the number of samples in each group
    • delta - the difference between the means
    • sd - the standard deviation of the population (defaults to 1)
    • sig.level - the significance level (defaults to 0.05)
    • power - the probability of rejecting the null hypothesis
  • It is also possible to give further arguments specifying details of the t test:
    • type - "two.sample", "one.sample" or "paired"
    • alternative - "two.sided" or "one.sided"
  • For example, the number of samples required to achieve an 80% chance of detecting a difference of 50 between two groups drawn from a population with standard deviation 100 while retaining less than a 5% false-positive rate could be calculated as:
    • power.t.test (delta=50,sd=100,sig.level=0.05,power=0.8)
    • The answer is a rather surprising 64 samples in each group
  • Alternatively, the probability of successfully detecting the same difference with just 35 samples in each group could be calculated as:
    • power.t.test (n=35,delta=50,sd=100,sig.level=0.05)
    • The answer is just 54%
  • power.anova.test provides similar calculations for one-way analysis of variance

Analysis of variance

  • Given a factor with levels i = 1, 2, …
    and observations (xij) where j = 1, 2, …
  • Fit a model Xij = μ + ai + εij
    where residuals εij ~ N(0,σ2)
  • Calculate overall mean and factor means
    m = meanij(xij)
    mi = meanj(xij)
    so xij = m + (mi - m) + (xij - mi)

Deviation from the mean

  • Total variation is sum of squares of deviations
    SSDtotal = Σi Σj (xij – m)2
  • Variation between groups
    SSDbetween = Σi Σj (mi – m)2 = Σi ni (mi – m)2
    where ni is the number of observations at level i
  • Residual variation within groups
    SSDwithin = Σi Σj (xij – mi)2
  • Total variation
    SSDtotal = SSDbetween + SSDwithin

Degrees of freedom

  • For each factor
    • Degrees of freedom between levels = #levels – 1
    • Degrees of freedom within levels = #participants – #levels
  • Calculate mean square deviations
    • MSDbetween = SSDbetween / dfbetween
    • MSDwithin = SSDwithin / dfwithin
  • Calculate ratio of variances
    • Fbetween,within = MSDbetween / MSDwithin

Analysis of variance example

  • Experienced users
    • (xi) = (1,2,2,3,3,3,4,4,5)
      Mean(xi) = 27 / 9 = 3
      SSD within experienced users = 4+2×1+0+2×1+4 = 12
  • New users
    • (yi) = (3,4,4,5,5,5,6,6,7)
      Mean(yi) = 45 / 9 = 5
      SSD within new users = 4+2×1+0+2×1+4 = 12
  • SSD within levels = 12 +12 = 24
  • Combine samples
    • Mean = 4
      Total SSD = 9+2×4+4×1+4×0+4×1+2×4+9 = 42
  • SSD between levels = 42 – 12 – 12 = 18
  • Degrees of freedom
    • dfbetween = 2 – 1 = 1
    • dfwithin = 18 - 2 = 16
  • F statistic
  • F1,16 = (18 / 1) / (24 /16) = 12 which is significant at the 0.01 level

t test

  • A t test is equivalent to an F test for a factor with two levels
  • A one-sided t test only looks for differences in one direction
  • A paired-samples t test compares repeated measures

Analysis of variance in R

  • Given data series x and y
    • Concatenate into single series
      times <- c (x,y)
    • Define factor
      group <- gl (2,9,labels=c("experienced","new"))
    • Run ANOVA
      a <- aov (times~group)
      summary(a)
    • Compare with t test
      t.test (x,y)

F distribution

  • If U ~ Χm2 and V ~ Χn2
  • Fm,n = (U/m) / (V/n)

Factorial ANOVA

  • Consider the following observations
    Method 1 Method 2
    Task 1 123345
    Task 2 234456
    Task 3 345567
  • Total SSD = 42
  • SSD within methods = 12 + 12 = 24
    SSD between methods = 42 – 24 = 18
    df between methods = #methods – 1 = 2 – 1 = 1
    MSD between methods = 18 / 1 = 18
  • SSD within tasks = 10 + 10 + 10 = 30
    SSD between tasks = 42 – 30 = 12
    df between tasks = #tasks – 1 = 3 – 1 = 2
    MSD between tasks = 12 / 2 = 6
  • Residual SSD = 6 × 2 = 12
    df for residuals = #observations – #groups = 18 – 6 = 12
    MSD for residuals = 12 / 12 = 1
  • Analysis of methods
    • F1,12 = 18 / 1 = 18
    • Significant at the 0.01 level
  • Analysis of tasks
    • F2,12 = 6 / 1 = 6
    • Significant at the 0.05 level
  • In R
    • times <- c (1,2,3,2,3,4,3,4,5,3,4,5,4,5,6,5,6,7)
      method <- gl (2,9)
      task <- gl (3,3,18)
      aov (times~(method*task))

Repeated measures ANOVA

  • Repeated measures reduce the degrees of freedom
  • Suppose there were just three participants who each tried both methods and all three tasks
    • Residual df for methods = (#participants – 1) × (#methods – 1) = 2
    • Residual df for tasks = (#participants – 1) × (#tasks – 1) = 4
  • Repeated measures in R
    • part <- gl (3,1,18)
      aov (times~(method*task+Error(part/(method*task))))

Comparing text entry systems revisited

  • Given 24 measurement times and factors for method, session, group and participant
    • Compare methods
      a <- aov (times~method)
      summary (a)
    • Compare groups
      a <- aov (times~group)
      summary (a)
    • Factorial comparison
      a <- aov (times~(method*group))
      summary (a)
  • These analyses have assumed that each measurement was taken from a different participant, a repeated measures analysis is slightly different
    a <- aov (times~(method+Error(participant/method)))
    summary (a)

Interaction effects

  • Two factors may interact so that a particular level in one combined with a particular level in the other significantly affects the measurement even when the factors taken overall do not have a significant effect
  • The method:task term shows any interaction effect after any separate factor effects on variance have been analysed
  • If interaction effects are negligible, it may be appropriate to use a multi-way analysis instead of a factorial one

Two-way ANOVA

  • Given two separate factors with p and q levels respectively
    and observations (xij)
  • Fit a model Xij = μ + ai + bj + εij
    • where residual εij ~ N(0,σ2)
  • Overall mean m, row means ri and column means cj
  • Consider variations between rows and columns
    • SSD between rows = q Σi (ri – m)2 with (p-1) degrees of freedom
    • SSD between columns = p Σj (cj – m)2 with (q-1) degrees of freedom
  • Residual SSD = Σi Σj (xij – (ri – m) – (cj – m) – m)2
    = Total SSD – SSD between rows – SSD between columns
    with (p-1)×(q-1) = pq – (p-1) – (q-1) – 1 degrees of freedom

Two-way ANOVA analysis of tasks and methods

  • df for residuals = 18 – 1 – 2 – 1 = 14
    MSD for residuals = 12 / 14 = 6 / 7 ≈ 0.857
  • Analysis of methods
    • F1,14 = 18 / (6/7) = 21
    • Significant at 0.001 level
  • Analysis of tasks
    • F2,14 = 6 / (6/7) = 7
    • Significant at the 0.01 level
  • Two-way ANOVA in R
    • aov (times~(method+task))

Formulae for models in R

  • Given factors A, B and C with p, q and r levels respectively
  • A has (p – 1) degrees of freedom
  • A:B:C represents the set product of A, B and C
    with (p×q×r – 1) degrees of freedom and no residuals
  • A+B+C represents A, B and C as three-way factors
    with (p – 1), (q – 1) and (r – 1) degrees of freedom
    and (p×q×r – 1) – (p – 1) – (q – 1) – (r – 1) residual degrees of freedom
  • A*B*C = A + B + C + A:B + A:C + B:C + A:B:C
    with (p–1), (q–1), (r–1), (p–1)×(q–1), (p–1)×(r–1), (q–1)×(r–1) and (p–1)×(q–1)×(r–1) degrees of freedom

Reporting an F statistic

There are strict conventions on how statistical results should be presented. The F statistic should be reported as follows:

There was a significant main effect of independent variable on dependent variable (F(d,e) = F-value, p < threshold).

Note that:

  • F is in upper case italic
  • d and e are the degrees of freedom in the methods and the samples
  • the F-value statistic is specified to three significant figures
  • p is in lower case italic
  • the p-value probability of a false positive is specified to two significant figures unless it is less than 0.001 in which case write "p < 0.001"
    • it is impossible to demonstrate significance if the F-value is less than 1, in which case the probability is replaced by "n.s."
    • if the F-value is greater than 1 but a large probability makes the result unlikely, the probability is specified as "p > 0.05"
  • some journals prefer the probability of a false positive to be specified as being below a threshold (of 0.05, 0.01 and 0.001) rather than exactly
    (These values are often denoted by one, two or three asterisks; probabilities between 0.10 and 0.05 are denoted by a plus sign.)