# Computer Laboratory

Course pages 2013–14

# Experimental Methods

## Statistical analysis

### Variance

• Experiments are unlikely to have binary conclusions
• Variance in measurements arises for two main reasons
• Changes in independent variables
• Random fluctuations
• Model effects of the former to minimise residual effects of the latter

### Probability revision

• Population of size N
• Mean μ = ΣXi / N
• Variance σ2 = Σ(Xi-μ)2 / N
• Covariance Σ(Xi-μ)(Yi-ν) / N
• Sample of size n
• Estimated mean m = ΣXi / n
• Estimated variance s2 = Σ(Xi-m)2 / (n-1)
• Median (halfway through sample data)
• Quartiles, deciles, percentiles
• Mode (most frequent occurrence)

### Linear regression

• Given data series (xi) and (yi) each with n values
• Fit a model Yi = a + bXi + εi
• where residual εi ~ N(0,σ2)
• to estimate the intercept a, regression coefficient (slope) b
and residual variance σ2
• that minimise the sum of squared residuals (SSR) Σεi2
• Minimising SSR by taking partial derivative w.r.t. b gives
• Estimated b = Cov(xi,yi) / Var(xi)
• Requiring Σεi = 0 gives
• Estimated a = Mean(yi) – b Mean(xi)
• Then estimated σ2 = Σεi2 / (n-2)
• Pearson correlation coefficient
• r2 = Cov(xi,yi)2 / (Var(xi) × Var(yi))
• Total variance is sum of model variance and residual variance
• Proportion of variance explained by the linear model is also
• r2 = Variancemodel / Variancedata

### Linear regression example

• (xi) = (1,2,3,4)
(yi) = (1,3,2,4)
• Mean(xi) = 10 / 4 = 2.5
Var(xi) = 5 / 3 ≈ 1.667
Cov(xi,yi) = 4 / 3 ≈ 1.333
• Estimated b ≈ 1.333 / 1.667 ≈ 0.8
estimated a = 2.5 – 0.8×2.5 = 0.5
• Pearson r2 ≈ (1.333×1.333) / (1.667×1.667) ≈ 0.64
Variancemodel / Variancedata = (3.2/3) / (5/3) = 0.64
• so linear model explains 64% of total variance

### Linear regression in R

• Given data series x and y
• Look at the data
`plot (x,y)`
• Calculate the intercept and slope
`m <- lm (y~x)`
• Superimpose the line of best fit
`abline (m)`
• Observe residuals
`segments (x,fitted(m),x,y)`
• Test the null hypothesis that slope b = 0
`summary (m)`

### Normal distribution

• Recall normal distribution
• X ~ N(μ,σ2)
• Z-value Z = (X-μ) / σ
• Z ~ N(0,1)
• If Zi ~ N(0,1) for 1 ≤ i ≤ n and S = ΣZi / n
• S ~ N(0,1) in the limit for large n
• Models should give normally distributed residuals for statistics to work

### Chi-square and t distributions

• Z ~ N(0,1)
• U = Z2
• U ~ Χ2 (chi-square)
• If Ui ~ Χ2 for 1 ≤ i ≤ n and V = ΣUi
• V ~ Χn2
• tn = Z / √(V/n)
• t test for independence with t = Est(b) / StdErr(Est(b))

### Comparing text entry systems

• Two methods: Qwerty & Opti
• 4 participants
• 3 sessions with each method
• One half tried Qwerty then Opti, the other half tried Opti then Qwerty

### Contingency tables

• Data is often collected in a matrix with one row for each participant:
Partic't Q1 Q2 Q3 O1 O2 O3
1
2
3
4
• However, it may be convenient to arrange the data into a vector with one measurement for each row and columns for each of the factors:
Partic't Method Group Trial Time
1 Q QO 1
2 Q QO 1
3 Q OQ 1
4 Q OQ 1
1 Q QO 2
...

### Preliminary investigation in R

`typing <- read.csv ("Text entry times.csv",header=T)attach (typing)`
• Concatenate all times
`times <- c (Q1,Q2,Q3,O1,O2,O3)`
• Describe factors
`method <- gl (2,12,labels=c("Qwerty","Opti"))session <- gl (3,4,24)group <- gl (2,2,24)participant <- gl (4,1,24)`
• Inspect data
`plot (method:session,times)plot (method:group,times)`
• Work out average times
`qwerty <- (Q1+Q2+Q3)/3opti <- (O1+O2+O3)/3`
• Look for correlation
`plot (qwerty,opti)m <- lm (opti~qwerty)abline (m)summary (m)`
• Check for normally distributed residuals
`hist (resid(m),prob=T)lines (density(resid(m)))`
• Compare quantiles with normal distribution
`qqnorm (resid(m))qqline (resid(m))`
• Conduct a t test
`t.test(qwerty,opti,paired=T)`

### Power estimation

The power of a test is the probability of (correctly) rejecting the null hypothesis when it is false. The power of a two-sample t test will depend on several factors:

• The sample sizes m and n
• The real difference Δ = |μXY|.
• The smallness of the population standard deviation σ.
• The significance level α at which the test is done.
• If the population means differ by δ and the populations share the same variance s2, then
n ≈ 16 × s2 ÷ δ2
estimates the number of replicates required to reject the null hypothesis with power = 80%

### Power calculation in R

• The function power.t.test will calculate an appropriate value for any one of the parameters from the others.
• The arguments are:
• n - the number of samples in each group
• delta - the difference between the means
• sd - the standard deviation of the population (defaults to 1)
• sig.level - the significance level (defaults to 0.05)
• power - the probability of rejecting the null hypothesis
• It is also possible to give further arguments specifying details of the t test:
• type - "two.sample", "one.sample" or "paired"
• alternative - "two.sided" or "one.sided"
• For example, the number of samples required to achieve an 80% chance of detecting a difference of 50 between two groups drawn from a population with standard deviation 100 while retaining less than a 5% false-positive rate could be calculated as:
• `power.t.test (delta=50,sd=100,sig.level=0.05,power=0.8)`
• The answer is a rather surprising 64 samples in each group
• Alternatively, the probability of successfully detecting the same difference with just 35 samples in each group could be calculated as:
• `power.t.test (n=35,delta=50,sd=100,sig.level=0.05)`
• The answer is just 54%
• power.anova.test provides similar calculations for one-way analysis of variance

### Analysis of variance

• Given a factor with levels i = 1, 2, …
and observations (xij) where j = 1, 2, …
• Fit a model Xij = μ + ai + εij
where residuals εij ~ N(0,σ2)
• Calculate overall mean and factor means
m = meanij(xij)
mi = meanj(xij)
so xij = m + (mi - m) + (xij - mi)

### Deviation from the mean

• Total variation is sum of squares of deviations
SSDtotal = Σi Σj (xij – m)2
• Variation between groups
SSDbetween = Σi Σj (mi – m)2 = Σi ni (mi – m)2
where ni is the number of observations at level i
• Residual variation within groups
SSDwithin = Σi Σj (xij – mi)2
• Total variation
SSDtotal = SSDbetween + SSDwithin

### Degrees of freedom

• For each factor
• Degrees of freedom between levels = #levels – 1
• Degrees of freedom within levels = #participants – #levels
• Calculate mean square deviations
• MSDbetween = SSDbetween / dfbetween
• MSDwithin = SSDwithin / dfwithin
• Calculate ratio of variances
• Fbetween,within = MSDbetween / MSDwithin

### Analysis of variance example

• Experienced users
• (xi) = (1,2,2,3,3,3,4,4,5)
Mean(xi) = 27 / 9 = 3
SSD within experienced users = 4+2×1+0+2×1+4 = 12
• New users
• (yi) = (3,4,4,5,5,5,6,6,7)
Mean(yi) = 45 / 9 = 5
SSD within new users = 4+2×1+0+2×1+4 = 12
• SSD within levels = 12 +12 = 24
• Combine samples
• Mean = 4
Total SSD = 9+2×4+4×1+4×0+4×1+2×4+9 = 42
• SSD between levels = 42 – 12 – 12 = 18
• Degrees of freedom
• dfbetween = 2 – 1 = 1
• dfwithin = 18 - 2 = 16
• F statistic
• F1,16 = (18 / 1) / (24 /16) = 12 which is significant at the 0.01 level

### t test

• A t test is equivalent to an F test for a factor with two levels
• A one-sided t test only looks for differences in one direction
• A paired-samples t test compares repeated measures

### Analysis of variance in R

• Given data series x and y
• Concatenate into single series
`times <- c (x,y)`
• Define factor
`group <- gl (2,9,labels=c("experienced","new"))`
• Run ANOVA
`a <- aov (times~group)summary(a)`
• Compare with t test
`t.test (x,y)`

### F distribution

• If U ~ Χm2 and V ~ Χn2
• Fm,n = (U/m) / (V/n)

### Factorial ANOVA

• Consider the following observations
Method 1 Method 2
• Total SSD = 42
• SSD within methods = 12 + 12 = 24
SSD between methods = 42 – 24 = 18
df between methods = #methods – 1 = 2 – 1 = 1
MSD between methods = 18 / 1 = 18
• SSD within tasks = 10 + 10 + 10 = 30
SSD between tasks = 42 – 30 = 12
df between tasks = #tasks – 1 = 3 – 1 = 2
MSD between tasks = 12 / 2 = 6
• Residual SSD = 6 × 2 = 12
df for residuals = #observations – #groups = 18 – 6 = 12
MSD for residuals = 12 / 12 = 1
• Analysis of methods
• F1,12 = 18 / 1 = 18
• Significant at the 0.01 level
• F2,12 = 6 / 1 = 6
• Significant at the 0.05 level
• In R
• `times <- c (1,2,3,2,3,4,3,4,5,3,4,5,4,5,6,5,6,7)method <- gl (2,9)task <- gl (3,3,18)aov (times~(method*task))`

### Repeated measures ANOVA

• Repeated measures reduce the degrees of freedom
• Suppose there were just three participants who each tried both methods and all three tasks
• Residual df for methods = (#participants – 1) × (#methods – 1) = 2
• Residual df for tasks = (#participants – 1) × (#tasks – 1) = 4
• Repeated measures in R
• `part <- gl (3,1,18)aov (times~(method*task+Error(part/(method*task))))`

### Comparing text entry systems revisited

• Given 24 measurement times and factors for method, session, group and participant
• Compare methods
`a <- aov (times~method)summary (a)`
• Compare groups
`a <- aov (times~group)summary (a)`
• Factorial comparison
`a <- aov (times~(method*group))summary (a)`
• These analyses have assumed that each measurement was taken from a different participant, a repeated measures analysis is slightly different
`a <- aov (times~(method+Error(participant/method)))summary (a)`

### Interaction effects

• Two factors may interact so that a particular level in one combined with a particular level in the other significantly affects the measurement even when the factors taken overall do not have a significant effect
• The method:task term shows any interaction effect after any separate factor effects on variance have been analysed
• If interaction effects are negligible, it may be appropriate to use a multi-way analysis instead of a factorial one

### Two-way ANOVA

• Given two separate factors with p and q levels respectively
and observations (xij)
• Fit a model Xij = μ + ai + bj + εij
• where residual εij ~ N(0,σ2)
• Overall mean m, row means ri and column means cj
• Consider variations between rows and columns
• SSD between rows = q Σi (ri – m)2 with (p-1) degrees of freedom
• SSD between columns = p Σj (cj – m)2 with (q-1) degrees of freedom
• Residual SSD = Σi Σj (xij – (ri – m) – (cj – m) – m)2
= Total SSD – SSD between rows – SSD between columns
with (p-1)×(q-1) = pq – (p-1) – (q-1) – 1 degrees of freedom

### Two-way ANOVA analysis of tasks and methods

• df for residuals = 18 – 1 – 2 – 1 = 14
MSD for residuals = 12 / 14 = 6 / 7 ≈ 0.857
• Analysis of methods
• F1,14 = 18 / (6/7) = 21
• Significant at 0.001 level
• F2,14 = 6 / (6/7) = 7
• Significant at the 0.01 level
• Two-way ANOVA in R
• `aov (times~(method+task))`

### Formulae for models in R

• Given factors A, B and C with p, q and r levels respectively
• A has (p – 1) degrees of freedom
• A:B:C represents the set product of A, B and C
with (p×q×r – 1) degrees of freedom and no residuals
• A+B+C represents A, B and C as three-way factors
with (p – 1), (q – 1) and (r – 1) degrees of freedom
and (p×q×r – 1) – (p – 1) – (q – 1) – (r – 1) residual degrees of freedom
• A*B*C = A + B + C + A:B + A:C + B:C + A:B:C
with (p–1), (q–1), (r–1), (p–1)×(q–1), (p–1)×(r–1), (q–1)×(r–1) and (p–1)×(q–1)×(r–1) degrees of freedom

### Reporting an F statistic

There are strict conventions on how statistical results should be presented. The F statistic should be reported as follows:

There was a significant main effect of independent variable on dependent variable (F(d,e) = F-value, p < threshold).

Note that:

• F is in upper case italic
• d and e are the degrees of freedom in the methods and the samples
• the F-value statistic is specified to three significant figures
• p is in lower case italic
• the p-value probability of a false positive is specified to two significant figures unless it is less than 0.001 in which case write "p < 0.001"
• it is impossible to demonstrate significance if the F-value is less than 1, in which case the probability is replaced by "n.s."
• if the F-value is greater than 1 but a large probability makes the result unlikely, the probability is specified as "p > 0.05"
• some journals prefer the probability of a false positive to be specified as being below a threshold (of 0.05, 0.01 and 0.001) rather than exactly
(These values are often denoted by one, two or three asterisks; probabilities between 0.10 and 0.05 are denoted by a plus sign.)