Course pages 2013–14
Experimental Methods
Statistical analysis
Variance
- Experiments are unlikely to have binary conclusions
- Variance in measurements arises for two main reasons
- Changes in independent variables
- Random fluctuations
- Model effects of the former to minimise residual effects of the latter
Probability revision
- Population of size N
- Mean μ = ΣXi / N
- Variance σ2 = Σ(Xi-μ)2 / N
- Covariance Σ(Xi-μ)(Yi-ν) / N
- Sample of size n
- Estimated mean m = ΣXi / n
- Estimated variance s2 = Σ(Xi-m)2 / (n-1)
- Median (halfway through sample data)
- Quartiles, deciles, percentiles
- Mode (most frequent occurrence)
Linear regression
- Given data series (xi) and (yi) each with n values
- Fit a model Yi = a + bXi + εi
- where residual εi ~ N(0,σ2)
- to estimate the intercept a, regression coefficient (slope) b
and residual variance σ2 - that minimise the sum of squared residuals (SSR) Σεi2
- Minimising SSR by taking partial derivative w.r.t. b gives
- Estimated b = Cov(xi,yi) / Var(xi)
- Requiring Σεi = 0 gives
- Estimated a = Mean(yi) – b Mean(xi)
- Then estimated σ2 = Σεi2 / (n-2)
- Pearson correlation coefficient
- r2 = Cov(xi,yi)2 / (Var(xi) × Var(yi))
- Total variance is sum of model variance and residual variance
- Proportion of variance explained by the linear model is also
- r2 = Variancemodel / Variancedata
Linear regression example
- (xi) = (1,2,3,4)
(yi) = (1,3,2,4) - Mean(xi) = 10 / 4 = 2.5
Var(xi) = 5 / 3 ≈ 1.667
Cov(xi,yi) = 4 / 3 ≈ 1.333 - Estimated b ≈ 1.333 / 1.667 ≈ 0.8
estimated a = 2.5 – 0.8×2.5 = 0.5 - Pearson r2 ≈ (1.333×1.333) /
(1.667×1.667) ≈ 0.64
Variancemodel / Variancedata = (3.2/3) / (5/3) = 0.64 - so linear model explains 64% of total variance
Linear regression in R
- Given data series x and y
- Look at the data
plot (x,y)
- Calculate the intercept and slope
m <- lm (y~x)
- Superimpose the line of best fit
abline (m)
- Observe residuals
segments (x,fitted(m),x,y)
- Test the null hypothesis that slope b = 0
summary (m)
Normal distribution
- Recall normal distribution
- X ~ N(μ,σ2)
- Z-value Z = (X-μ) / σ
- Z ~ N(0,1)
- If Zi ~ N(0,1) for 1 ≤ i ≤ n and S = ΣZi / n
- S ~ N(0,1) in the limit for large n
- Models should give normally distributed residuals for statistics to work
Chi-square and t distributions
- Z ~ N(0,1)
- U = Z2
- U ~ Χ2 (chi-square)
- If Ui ~ Χ2 for 1 ≤ i ≤ n and V = ΣUi
- V ~ Χn2
- tn = Z / √(V/n)
- t test for independence with t = Est(b) / StdErr(Est(b))
Comparing text entry systems
- Two methods: Qwerty & Opti
- 4 participants
- 3 sessions with each method
- One half tried Qwerty then Opti, the other half tried Opti then Qwerty
Contingency tables
- Data is often collected in a matrix with one row for each participant:
Partic't Q1 Q2 Q3 O1 O2 O3 1 2 3 4 - However, it may be convenient to arrange the data into a vector with one measurement for each row and columns for each of the factors:
Partic't Method Group Trial Time 1 Q QO 1 2 Q QO 1 3 Q OQ 1 4 Q OQ 1 1 Q QO 2 ...
Preliminary investigation in R
- Read times from spreadsheet of comma separated values
typing <- read.csv ("Text entry times.csv",header=T)
attach (typing) - Concatenate all times
times <- c (Q1,Q2,Q3,O1,O2,O3)
- Describe factors
method <- gl (2,12,labels=c("Qwerty","Opti"))
session <- gl (3,4,24)
group <- gl (2,2,24)
participant <- gl (4,1,24) - Inspect data
plot (method:session,times)
plot (method:group,times) - Work out average times
qwerty <- (Q1+Q2+Q3)/3
opti <- (O1+O2+O3)/3 - Look for correlation
plot (qwerty,opti)
m <- lm (opti~qwerty)
abline (m)
summary (m) - Check for normally distributed residuals
hist (resid(m),prob=T)
lines (density(resid(m))) - Compare quantiles with normal distribution
qqnorm (resid(m))
qqline (resid(m)) - Conduct a t test
t.test(qwerty,opti,paired=T)
Power estimation
The power of a test is the probability of (correctly) rejecting the null hypothesis when it is false. The power of a two-sample t test will depend on several factors:
- The sample sizes m and n
- The real difference Δ = |μX-μY|.
- The smallness of the population standard deviation σ.
- The significance level α at which the test is done.
- If the population means differ by δ and the populations share the same variance s2, then
n ≈ 16 × s2 ÷ δ2
estimates the number of replicates required to reject the null hypothesis with power = 80%
Power calculation in R
- The function power.t.test will calculate an appropriate value for any one of the parameters from the others.
- The arguments are:
- n - the number of samples in each group
- delta - the difference between the means
- sd - the standard deviation of the population (defaults to 1)
- sig.level - the significance level (defaults to 0.05)
- power - the probability of rejecting the null hypothesis
- It is also possible to give further arguments specifying details of the t test:
- type - "two.sample", "one.sample" or "paired"
- alternative - "two.sided" or "one.sided"
- For example, the number of samples required to achieve an 80% chance of detecting a difference of 50 between two groups drawn from a population with standard deviation 100 while retaining less than a 5% false-positive rate could be calculated as:
power.t.test (delta=50,sd=100,sig.level=0.05,power=0.8)
- The answer is a rather surprising 64 samples in each group
- Alternatively, the probability of successfully detecting the same difference with just 35 samples in each group could be calculated as:
power.t.test (n=35,delta=50,sd=100,sig.level=0.05)
- The answer is just 54%
- power.anova.test provides similar calculations for one-way analysis of variance
Analysis of variance
- Given a factor with levels i = 1, 2, …
and observations (xij) where j = 1, 2, … - Fit a model Xij = μ + ai + εij
where residuals εij ~ N(0,σ2) - Calculate overall mean and factor means
m = meanij(xij)
mi = meanj(xij)
so xij = m + (mi - m) + (xij - mi)
Deviation from the mean
- Total variation is sum of squares of deviations
SSDtotal = Σi Σj (xij – m)2 - Variation between groups
SSDbetween = Σi Σj (mi – m)2 = Σi ni (mi – m)2
where ni is the number of observations at level i - Residual variation within groups
SSDwithin = Σi Σj (xij – mi)2 - Total variation
SSDtotal = SSDbetween + SSDwithin
Degrees of freedom
- For each factor
- Degrees of freedom between levels = #levels – 1
- Degrees of freedom within levels = #participants – #levels
- Calculate mean square deviations
- MSDbetween = SSDbetween / dfbetween
- MSDwithin = SSDwithin / dfwithin
- Calculate ratio of variances
- Fbetween,within = MSDbetween / MSDwithin
Analysis of variance example
- Experienced users
- (xi) = (1,2,2,3,3,3,4,4,5)
Mean(xi) = 27 / 9 = 3
SSD within experienced users = 4+2×1+0+2×1+4 = 12 - New users
- (yi) = (3,4,4,5,5,5,6,6,7)
Mean(yi) = 45 / 9 = 5
SSD within new users = 4+2×1+0+2×1+4 = 12 - SSD within levels = 12 +12 = 24
- Combine samples
- Mean = 4
Total SSD = 9+2×4+4×1+4×0+4×1+2×4+9 = 42 - SSD between levels = 42 – 12 – 12 = 18
- Degrees of freedom
- dfbetween = 2 – 1 = 1
- dfwithin = 18 - 2 = 16
- F statistic
- F1,16 = (18 / 1) / (24 /16) = 12 which is significant at the 0.01 level
t test
- A t test is equivalent to an F test for a factor with two levels
- A one-sided t test only looks for differences in one direction
- A paired-samples t test compares repeated measures
Analysis of variance in R
- Given data series x and y
- Concatenate into single series
times <- c (x,y)
- Define factor
group <- gl (2,9,labels=c("experienced","new"))
- Run ANOVA
a <- aov (times~group)
summary(a) - Compare with t test
t.test (x,y)
F distribution
- If U ~ Χm2 and V ~ Χn2
- Fm,n = (U/m) / (V/n)
Factorial ANOVA
- Consider the following observations
Method 1 Method 2 Task 1 1 2 3 3 4 5 Task 2 2 3 4 4 5 6 Task 3 3 4 5 5 6 7 - Total SSD = 42
- SSD within methods = 12 + 12 = 24
SSD between methods = 42 – 24 = 18
df between methods = #methods – 1 = 2 – 1 = 1
MSD between methods = 18 / 1 = 18 - SSD within tasks = 10 + 10 + 10 = 30
SSD between tasks = 42 – 30 = 12
df between tasks = #tasks – 1 = 3 – 1 = 2
MSD between tasks = 12 / 2 = 6 - Residual SSD = 6 × 2 = 12
df for residuals = #observations – #groups = 18 – 6 = 12
MSD for residuals = 12 / 12 = 1 - Analysis of methods
- F1,12 = 18 / 1 = 18
- Significant at the 0.01 level
- Analysis of tasks
- F2,12 = 6 / 1 = 6
- Significant at the 0.05 level
- In R
times <- c (1,2,3,2,3,4,3,4,5,3,4,5,4,5,6,5,6,7)
method <- gl (2,9)
task <- gl (3,3,18)
aov (times~(method*task))
Repeated measures ANOVA
- Repeated measures reduce the degrees of freedom
- Suppose there were just three participants who each tried both methods and all three tasks
- Residual df for methods = (#participants – 1) × (#methods – 1) = 2
- Residual df for tasks = (#participants – 1) × (#tasks – 1) = 4
- Repeated measures in R
part <- gl (3,1,18)
aov (times~(method*task+Error(part/(method*task))))
Comparing text entry systems revisited
- Given 24 measurement times and factors for method, session, group and participant
- Compare methods
a <- aov (times~method)
summary (a) - Compare groups
a <- aov (times~group)
summary (a) - Factorial comparison
a <- aov (times~(method*group))
summary (a) - These analyses have assumed that each measurement was taken from a different participant, a repeated measures analysis is slightly different
a <- aov (times~(method+Error(participant/method)))
summary (a)
Interaction effects
- Two factors may interact so that a particular level in one combined with a particular level in the other significantly affects the measurement even when the factors taken overall do not have a significant effect
- The method:task term shows any interaction effect after any separate factor effects on variance have been analysed
- If interaction effects are negligible, it may be appropriate to use a multi-way analysis instead of a factorial one
Two-way ANOVA
- Given two separate factors with p and q levels respectively
and observations (xij) - Fit a model Xij = μ + ai + bj + εij
- where residual εij ~ N(0,σ2)
- Overall mean m, row means ri and column means cj
- Consider variations between rows and columns
- SSD between rows = q Σi (ri – m)2 with (p-1) degrees of freedom
- SSD between columns = p Σj (cj – m)2 with (q-1) degrees of freedom
- Residual SSD = Σi Σj (xij – (ri – m) – (cj – m) – m)2
= Total SSD – SSD between rows – SSD between columns
with (p-1)×(q-1) = pq – (p-1) – (q-1) – 1 degrees of freedom
Two-way ANOVA analysis of tasks and methods
- df for residuals = 18 – 1 – 2 – 1 = 14
MSD for residuals = 12 / 14 = 6 / 7 ≈ 0.857 - Analysis of methods
- F1,14 = 18 / (6/7) = 21
- Significant at 0.001 level
- Analysis of tasks
- F2,14 = 6 / (6/7) = 7
- Significant at the 0.01 level
- Two-way ANOVA in R
aov (times~(method+task))
Formulae for models in R
- Given factors A, B and C with p, q and r levels respectively
- A has (p – 1) degrees of freedom
- A:B:C represents the set product of A, B and C
with (p×q×r – 1) degrees of freedom and no residuals - A+B+C represents A, B and C as three-way factors
with (p – 1), (q – 1) and (r – 1) degrees of freedom
and (p×q×r – 1) – (p – 1) – (q – 1) – (r – 1) residual degrees of freedom - A*B*C = A + B + C + A:B + A:C + B:C + A:B:C
with (p–1), (q–1), (r–1), (p–1)×(q–1), (p–1)×(r–1), (q–1)×(r–1) and (p–1)×(q–1)×(r–1) degrees of freedom
Reporting an F statistic
There are strict conventions on how statistical results should be presented. The F statistic should be reported as follows:
There was a significant main effect of independent variable on dependent variable (F(d,e) = F-value, p < threshold).
Note that:
- F is in upper case italic
- d and e are the degrees of freedom in the methods and the samples
- the F-value statistic is specified to three significant figures
- p is in lower case italic
- the p-value probability of a false positive is specified to two significant figures unless it is less than 0.001 in which case write "p < 0.001"
- it is impossible to demonstrate significance if the F-value is less than 1, in which case the probability is replaced by "n.s."
- if the F-value is greater than 1 but a large probability makes the result unlikely, the probability is specified as "p > 0.05"
- some journals prefer the probability of a false positive to be specified as being below a threshold
(of 0.05, 0.01 and 0.001) rather than exactly
(These values are often denoted by one, two or three asterisks; probabilities between 0.10 and 0.05 are denoted by a plus sign.)