Course pages 2013–14

# Experimental Methods

## Statistical analysis

### Variance

- Experiments are unlikely to have binary conclusions
- Variance in measurements arises for two main reasons
- Changes in independent variables
- Random fluctuations
- Model effects of the former to minimise residual effects of the latter

### Probability revision

- Population of size N
- Mean μ = ΣX
_{i}/ N - Variance σ
^{2}= Σ(X_{i}-μ)^{2}/ N - Covariance Σ(X
_{i}-μ)(Y_{i}-ν) / N - Sample of size n
- Estimated mean m = ΣX
_{i}/ n - Estimated variance s
^{2}= Σ(X_{i}-m)^{2}/ (n-1) - Median (halfway through sample data)
- Quartiles, deciles, percentiles
- Mode (most frequent occurrence)

### Linear regression

- Given data series (x
_{i}) and (y_{i}) each with n values - Fit a model Y
_{i}= a + bX_{i}+ ε_{i} - where residual ε
_{i}~ N(0,σ^{2}) - to estimate the intercept a, regression coefficient (slope) b

and residual variance σ^{2} - that minimise the sum of squared residuals (SSR) Σε
_{i}^{2} - Minimising SSR by taking partial derivative w.r.t. b gives
- Estimated b = Cov(x
_{i},y_{i}) / Var(x_{i}) - Requiring Σε
_{i}= 0 gives - Estimated a = Mean(y
_{i}) – b Mean(x_{i}) - Then estimated σ
^{2}= Σε_{i}^{2}/ (n-2) - Pearson correlation coefficient
- r
^{2}= Cov(x_{i},y_{i})^{2}/ (Var(x_{i}) × Var(y_{i})) - Total variance is sum of model variance and residual variance
- Proportion of variance explained by the linear model is also
- r
^{2}= Variance_{model}/ Variance_{data}

### Linear regression example

- (x
_{i}) = (1,2,3,4)

(y_{i}) = (1,3,2,4) - Mean(x
_{i}) = 10 / 4 = 2.5

Var(x_{i}) = 5 / 3 ≈ 1.667

Cov(x_{i},y_{i}) = 4 / 3 ≈ 1.333 - Estimated b ≈ 1.333 / 1.667 ≈ 0.8

estimated a = 2.5 – 0.8×2.5 = 0.5 - Pearson r
^{2}≈ (1.333×1.333) / (1.667×1.667) ≈ 0.64

Variance_{model}/ Variance_{data}= (3.2/3) / (5/3) = 0.64 - so linear model explains 64% of total variance

### Linear regression in R

- Given data series
`x`and`y` - Look at the data

plot (x,y)

- Calculate the intercept and slope

m <- lm (y~x)

- Superimpose the line of best fit

abline (m)

- Observe residuals

segments (x,fitted(m),x,y)

- Test the null hypothesis that slope b = 0

summary (m)

### Normal distribution

- Recall normal distribution
- X ~ N(μ,σ
^{2}) - Z-value Z = (X-μ) / σ
- Z ~ N(0,1)
- If Z
_{i}~ N(0,1) for 1 ≤ i ≤ n and S = ΣZ_{i}/ n - S ~ N(0,1) in the limit for large n
- Models should give normally distributed residuals for statistics to work

### Chi-square and *t* distributions

- Z ~ N(0,1)
- U = Z
^{2} - U ~ Χ
^{2}(chi-square) - If U
_{i}~ Χ^{2}for 1 ≤ i ≤ n and V = ΣU_{i} - V ~ Χ
_{n}^{2} - t
_{n}= Z / √(V/n) *t*test for independence with t = Est(b) / StdErr(Est(b))

### Comparing text entry systems

- Two methods: Qwerty & Opti
- 4 participants
- 3 sessions with each method
- One half tried Qwerty then Opti, the other half tried Opti then Qwerty

### Contingency tables

- Data is often collected in a matrix with one row for each participant:
Partic't Q1 Q2 Q3 O1 O2 O3 1 2 3 4 - However, it may be convenient to arrange the data into a vector with one measurement for each row and columns for each of the factors:
Partic't Method Group Trial Time 1 Q QO 1 2 Q QO 1 3 Q OQ 1 4 Q OQ 1 1 Q QO 2 ...

### Preliminary investigation in R

- Read times from spreadsheet of comma separated values

typing <- read.csv ("Text entry times.csv",header=T)

attach (typing) - Concatenate all times

times <- c (Q1,Q2,Q3,O1,O2,O3)

- Describe factors

method <- gl (2,12,labels=c("Qwerty","Opti"))

session <- gl (3,4,24)

group <- gl (2,2,24)

participant <- gl (4,1,24) - Inspect data

plot (method:session,times)

plot (method:group,times) - Work out average times
qwerty <- (Q1+Q2+Q3)/3

opti <- (O1+O2+O3)/3 - Look for correlation

plot (qwerty,opti)

m <- lm (opti~qwerty)

abline (m)

summary (m) - Check for normally distributed residuals

hist (resid(m),prob=T)

lines (density(resid(m))) - Compare quantiles with normal distribution

qqnorm (resid(m))

qqline (resid(m)) - Conduct a
*t*test

t.test(qwerty,opti,paired=T)

### Power estimation

The *power* of a test is the probability of (correctly) rejecting the null hypothesis when it is false.
The power of a two-sample *t* test will depend on several factors:

- The sample sizes
*m*and*n* - The real difference Δ = |μ
_{X}-μ_{Y}|. - The smallness of the population standard deviation σ.
- The significance level α at which the test is done.
- If the population means differ by δ and the populations share the same variance s
^{2}, then

n ≈ 16 × s^{2}÷ δ^{2}

estimates the number of replicates required to reject the null hypothesis with power = 80%

### Power calculation in R

- The function
`power.t.test`will calculate an appropriate value for any one of the parameters from the others. - The arguments are:
`n`- the number of samples in each group`delta`- the difference between the means`sd`- the standard deviation of the population (defaults to 1)`sig.level`- the significance level (defaults to 0.05)`power`- the probability of rejecting the null hypothesis

- It is also possible to give further arguments specifying details of the
*t*test:`type`-`"two.sample"`,`"one.sample"`or`"paired"``alternative`-`"two.sided"`or`"one.sided"`

- For example, the number of samples required to achieve an 80% chance of detecting a difference of 50 between two groups drawn from a population with standard deviation 100 while retaining less than a 5% false-positive rate could be calculated as:
power.t.test (delta=50,sd=100,sig.level=0.05,power=0.8)

- The answer is a rather surprising 64 samples in each group

- Alternatively, the probability of successfully detecting the same difference with just 35 samples in each group could be calculated as:
power.t.test (n=35,delta=50,sd=100,sig.level=0.05)

- The answer is just 54%

`power.anova.test`provides similar calculations for one-way analysis of variance

### Analysis of variance

- Given a factor with levels i = 1, 2, …

and observations (x_{ij}) where j = 1, 2, … - Fit a model X
_{ij}= μ + a_{i}+ ε_{ij}

where residuals ε_{ij}~ N(0,σ^{2}) - Calculate overall mean and factor means

m = mean_{ij}(x_{ij})

m_{i}= mean_{j}(x_{ij})

so x_{ij}= m + (m_{i}- m) + (x_{ij}- m_{i})

### Deviation from the mean

- Total variation is sum of squares of deviations

SSD_{total}= Σ_{i}Σ_{j}(x_{ij}– m)^{2} - Variation between groups

SSD_{between}= Σ_{i}Σ_{j}(m_{i}– m)^{2}= Σ_{i}n_{i}(m_{i}– m)^{2}

where n_{i}is the number of observations at level i - Residual variation within groups

SSD_{within}= Σ_{i}Σ_{j}(x_{ij}– m_{i})^{2} - Total variation

SSD_{total}= SSD_{between}+ SSD_{within}

### Degrees of freedom

- For each factor
- Degrees of freedom between levels = #levels – 1
- Degrees of freedom within levels = #participants – #levels
- Calculate mean square deviations
- MSD
_{between}= SSD_{between}/ df_{between} - MSD
_{within}= SSD_{within}/ df_{within} - Calculate ratio of variances
- F
_{between,within}= MSD_{between}/ MSD_{within}

### Analysis of variance example

- Experienced users
- (x
_{i}) = (1,2,2,3,3,3,4,4,5)

Mean(x_{i}) = 27 / 9 = 3

SSD within experienced users = 4+2×1+0+2×1+4 = 12 - New users
- (y
_{i}) = (3,4,4,5,5,5,6,6,7)

Mean(y_{i}) = 45 / 9 = 5

SSD within new users = 4+2×1+0+2×1+4 = 12 - SSD within levels = 12 +12 = 24
- Combine samples
- Mean = 4

Total SSD = 9+2×4+4×1+4×0+4×1+2×4+9 = 42 - SSD between levels = 42 – 12 – 12 = 18
- Degrees of freedom
- df
_{between}= 2 – 1 = 1 - df
_{within}= 18 - 2 = 16 - F statistic
- F
_{1,16}= (18 / 1) / (24 /16) = 12 which is significant at the 0.01 level

*t* test

- A
*t*test is equivalent to an F test for a factor with two levels - A one-sided
*t*test only looks for differences in one direction - A paired-samples
*t*test compares repeated measures

### Analysis of variance in R

- Given data series
`x`and`y` - Concatenate into single series

times <- c (x,y)

- Define factor

group <- gl (2,9,labels=c("experienced","new"))

- Run ANOVA

a <- aov (times~group)

summary(a) - Compare with
*t*test

t.test (x,y)

### F distribution

- If U ~ Χ
_{m}^{2}and V ~ Χ_{n}^{2} - F
_{m,n}= (U/m) / (V/n)

### Factorial ANOVA

- Consider the following observations
Method 1 Method 2 Task 1 1 2 3 3 4 5 Task 2 2 3 4 4 5 6 Task 3 3 4 5 5 6 7 - Total SSD = 42
- SSD within methods = 12 + 12 = 24

SSD between methods = 42 – 24 = 18

df between methods = #methods – 1 = 2 – 1 = 1

MSD between methods = 18 / 1 = 18 - SSD within tasks = 10 + 10 + 10 = 30

SSD between tasks = 42 – 30 = 12

df between tasks = #tasks – 1 = 3 – 1 = 2

MSD between tasks = 12 / 2 = 6 - Residual SSD = 6 × 2 = 12

df for residuals = #observations – #groups = 18 – 6 = 12

MSD for residuals = 12 / 12 = 1 - Analysis of methods
- F
_{1,12}= 18 / 1 = 18 - Significant at the 0.01 level
- Analysis of tasks
- F
_{2,12}= 6 / 1 = 6 - Significant at the 0.05 level
- In R
times <- c (1,2,3,2,3,4,3,4,5,3,4,5,4,5,6,5,6,7)

method <- gl (2,9)

task <- gl (3,3,18)

aov (times~(method*task))

### Repeated measures ANOVA

- Repeated measures reduce the degrees of freedom
- Suppose there were just three participants who each tried both methods and all three tasks
- Residual df for methods = (#participants – 1) × (#methods – 1) = 2
- Residual df for tasks = (#participants – 1) × (#tasks – 1) = 4
- Repeated measures in R
part <- gl (3,1,18)

aov (times~(method*task+Error(part/(method*task))))

### Comparing text entry systems revisited

- Given 24 measurement times and factors for method, session, group and participant
- Compare methods

a <- aov (times~method)

summary (a) - Compare groups

a <- aov (times~group)

summary (a) - Factorial comparison

a <- aov (times~(method*group))

summary (a) - These analyses have assumed that each measurement was taken from a different participant, a repeated measures analysis is slightly different

a <- aov (times~(method+Error(participant/method)))

summary (a)

### Interaction effects

- Two factors may interact so that a particular level in one combined with a particular level in the other significantly affects the measurement even when the factors taken overall do not have a significant effect
- The
`method:task`term shows any interaction effect after any separate factor effects on variance have been analysed - If interaction effects are negligible, it may be appropriate to use a multi-way analysis instead of a factorial one

### Two-way ANOVA

- Given two separate factors with p and q levels respectively

and observations (x_{ij}) - Fit a model X
_{ij}= μ + a_{i}+ b_{j}+ ε_{ij} - where residual ε
_{ij}~ N(0,σ^{2}) - Overall mean m, row means r
_{i}and column means c_{j} - Consider variations between rows and columns
- SSD between rows = q Σ
_{i}(r_{i}– m)^{2}with (p-1) degrees of freedom - SSD between columns = p Σ
_{j }(c_{j}– m)^{2}with (q-1) degrees of freedom - Residual SSD = Σ
_{i}Σ_{j}(x_{ij}– (r_{i}– m) – (c_{j}– m) – m)^{2}

= Total SSD – SSD between rows – SSD between columns

with (p-1)×(q-1) = pq – (p-1) – (q-1) – 1 degrees of freedom

### Two-way ANOVA analysis of tasks and methods

- df for residuals = 18 – 1 – 2 – 1 = 14

MSD for residuals = 12 / 14 = 6 / 7 ≈ 0.857 - Analysis of methods
- F
_{1,14}= 18 / (6/7) = 21 - Significant at 0.001 level
- Analysis of tasks
- F
_{2,14}= 6 / (6/7) = 7 - Significant at the 0.01 level
- Two-way ANOVA in R
aov (times~(method+task))

### Formulae for models in R

- Given factors A, B and C with p, q and r levels respectively
- A has (p – 1) degrees of freedom
- A:B:C represents the set product of A, B and C

with (p×q×r – 1) degrees of freedom and no residuals - A+B+C represents A, B and C as three-way factors

with (p – 1), (q – 1) and (r – 1) degrees of freedom

and (p×q×r – 1) – (p – 1) – (q – 1) – (r – 1) residual degrees of freedom - A*B*C = A + B + C + A:B + A:C + B:C + A:B:C

with (p–1), (q–1), (r–1), (p–1)×(q–1), (p–1)×(r–1), (q–1)×(r–1) and (p–1)×(q–1)×(r–1) degrees of freedom

### Reporting an F statistic

There are strict conventions on how statistical results should be presented.
The *F* statistic should be reported as follows:

There was a significant main effect of

independent variableondependent variable(F(d,e) =F-value,p<threshold).

Note that:

*F*is in upper case italic*d*and*e*are the degrees of freedom in the methods and the samples- the
*F-value*statistic is specified to three significant figures *p*is in lower case italic- the
*p-value*probability of a false positive is specified to two significant figures unless it is less than 0.001 in which case write "*p*< 0.001"- it is impossible to demonstrate significance if the
*F-value*is less than 1, in which case the probability is replaced by "n.s." - if the
*F-value*is greater than 1 but a large probability makes the result unlikely, the probability is specified as "*p*> 0.05"

- it is impossible to demonstrate significance if the
- some journals prefer the probability of a false positive to be specified as being below a
*threshold*(of 0.05, 0.01 and 0.001) rather than exactly

(These values are often denoted by one, two or three asterisks; probabilities between 0.10 and 0.05 are denoted by a plus sign.)