Course pages 2013–14

Experimental Methods

Statistical analysis

Variance

Experiments are unlikely to have binary conclusions
Variance in measurements arises for two main reasons

Changes in independent variables
Random fluctuations

Model effects of the former to minimise residual effects of the latter

Probability revision

Population of size N

Mean μ = ΣX_i / N
Variance σ² = Σ(X_i-μ)² / N
Covariance Σ(X_i-μ)(Y_i-ν) / N

Sample of size n

Estimated mean m = ΣX_i / n
Estimated variance s² = Σ(X_i-m)² / (n-1)

Median (halfway through sample data)

Quartiles, deciles, percentiles

Mode (most frequent occurrence)

Linear regression

Given data series (x_i) and (y_i) each with n values

Fit a model Y_i = a + bX_i + ε_i
where residual ε_i ~ N(0,σ²)
to estimate the intercept a, regression coefficient (slope) b
and residual variance σ²
that minimise the sum of squared residuals (SSR) Σε_i²

Minimising SSR by taking partial derivative w.r.t. b gives

Estimated b = Cov(x_i,y_i) / Var(x_i)

Requiring Σε_i = 0 gives

Estimated a = Mean(y_i) – b Mean(x_i)

Then estimated σ² = Σε_i² / (n-2)
Pearson correlation coefficient

r² = Cov(x_i,y_i)² / (Var(x_i) × Var(y_i))

Total variance is sum of model variance and residual variance
Proportion of variance explained by the linear model is also

r² = Variance_model / Variance_data

Linear regression example

(x_i) = (1,2,3,4)
(y_i) = (1,3,2,4)
Mean(x_i) = 10 / 4 = 2.5
Var(x_i) = 5 / 3 ≈ 1.667
Cov(x_i,y_i) = 4 / 3 ≈ 1.333
Estimated b ≈ 1.333 / 1.667 ≈ 0.8
estimated a = 2.5 – 0.8×2.5 = 0.5
Pearson r² ≈ (1.333×1.333) / (1.667×1.667) ≈ 0.64
Variance_model / Variance_data = (3.2/3) / (5/3) = 0.64
so linear model explains 64% of total variance

Linear regression in R

Given data series x and y
Look at the data
```
plot (x,y)
```
Calculate the intercept and slope
```
m <- lm (y~x)
```
Superimpose the line of best fit
```
abline (m)
```
Observe residuals
```
segments (x,fitted(m),x,y)
```
Test the null hypothesis that slope b = 0
```
summary (m)
```

Normal distribution

Recall normal distribution
X ~ N(μ,σ²)
Z-value Z = (X-μ) / σ
Z ~ N(0,1)
If Z_i ~ N(0,1) for 1 ≤ i ≤ n and S = ΣZ_i / n
S ~ N(0,1) in the limit for large n
Models should give normally distributed residuals for statistics to work

Chi-square and t distributions

Z ~ N(0,1)
U = Z²
U ~ Χ² (chi-square)
If U_i ~ Χ² for 1 ≤ i ≤ n and V = ΣU_i
V ~ Χ_n²
t_n = Z / √(V/n)
t test for independence with t = Est(b) / StdErr(Est(b))

Comparing text entry systems

Two methods: Qwerty & Opti
4 participants
3 sessions with each method
One half tried Qwerty then Opti, the other half tried Opti then Qwerty

Contingency tables

Data is often collected in a matrix with one row for each participant:

Partic't Q1 Q2 Q3 O1 O2 O3

1

2

3

4

However, it may be convenient to arrange the data into a vector with one measurement for each row and columns for each of the factors:

Partic't	Method	Group	Trial	Time
1	Q	QO	1
2	Q	QO	1
3	Q	OQ	1
4	Q	OQ	1
1	Q	QO	2
				...

Preliminary investigation in R

Read times from spreadsheet of comma separated values

typing <- read.csv ("Text entry times.csv",header=T)
attach (typing)

Concatenate all times
```
times <- c (Q1,Q2,Q3,O1,O2,O3)
```

Describe factors

method <- gl (2,12,labels=c("Qwerty","Opti"))
session <- gl (3,4,24)
group <- gl (2,2,24)
participant <- gl (4,1,24)

Inspect data

plot (method:session,times)
plot (method:group,times)

Work out average times

qwerty <- (Q1+Q2+Q3)/3
opti <- (O1+O2+O3)/3

Look for correlation

plot (qwerty,opti)
m <- lm (opti~qwerty)
abline (m)
summary (m)

Check for normally distributed residuals

hist (resid(m),prob=T)
lines (density(resid(m)))

Compare quantiles with normal distribution
```
qqnorm (resid(m))
qqline (resid(m))
```
Conduct a t test
```
t.test(qwerty,opti,paired=T)
```

Power estimation

The power of a test is the probability of (correctly) rejecting the null hypothesis when it is false. The power of a two-sample t test will depend on several factors:

The sample sizes m and n
The real difference Δ = |μ_X-μ_Y|.
The smallness of the population standard deviation σ.
The significance level α at which the test is done.
If the population means differ by δ and the populations share the same variance s², then
n ≈ 16 × s² ÷ δ²
estimates the number of replicates required to reject the null hypothesis with power = 80%

Power calculation in R

The function power.t.test will calculate an appropriate value for any one of the parameters from the others.
The arguments are:
- n - the number of samples in each group
- delta - the difference between the means
- sd - the standard deviation of the population (defaults to 1)
- sig.level - the significance level (defaults to 0.05)
- power - the probability of rejecting the null hypothesis
It is also possible to give further arguments specifying details of the t test:
- type - "two.sample", "one.sample" or "paired"
- alternative - "two.sided" or "one.sided"
For example, the number of samples required to achieve an 80% chance of detecting a difference of 50 between two groups drawn from a population with standard deviation 100 while retaining less than a 5% false-positive rate could be calculated as:
- ```
power.t.test (delta=50,sd=100,sig.level=0.05,power=0.8)
```
- The answer is a rather surprising 64 samples in each group
Alternatively, the probability of successfully detecting the same difference with just 35 samples in each group could be calculated as:
- ```
power.t.test (n=35,delta=50,sd=100,sig.level=0.05)
```
- The answer is just 54%
power.anova.test provides similar calculations for one-way analysis of variance

Analysis of variance

Given a factor with levels i = 1, 2, …
and observations (x_ij) where j = 1, 2, …
Fit a model X_ij = μ + a_i + ε_ij
where residuals ε_ij ~ N(0,σ²)
Calculate overall mean and factor means
m = mean_ij(x_ij)
m_i = mean_j(x_ij)
so x_ij = m + (m_i - m) + (x_ij - m_i)

Deviation from the mean

Total variation is sum of squares of deviations
SSD_total = Σ_i Σ_j (x_ij – m)²
Variation between groups
SSD_between = Σ_i Σ_j (m_i – m)² = Σ_i n_i (m_i – m)²
where n_i is the number of observations at level i
Residual variation within groups
SSD_within = Σ_i Σ_j (x_ij – m_i)²
Total variation
SSD_total = SSD_between + SSD_within

Degrees of freedom

For each factor

Degrees of freedom between levels = #levels – 1
Degrees of freedom within levels = #participants – #levels

Calculate mean square deviations

MSD_between = SSD_between / df_between
MSD_within = SSD_within / df_within

Calculate ratio of variances

F_{between,within} = MSD_between / MSD_within

Analysis of variance example

Experienced users

(x_i) = (1,2,2,3,3,3,4,4,5)
Mean(x_i) = 27 / 9 = 3
SSD within experienced users = 4+2×1+0+2×1+4 = 12

New users

(y_i) = (3,4,4,5,5,5,6,6,7)
Mean(y_i) = 45 / 9 = 5
SSD within new users = 4+2×1+0+2×1+4 = 12

SSD within levels = 12 +12 = 24
Combine samples

Mean = 4
Total SSD = 9+2×4+4×1+4×0+4×1+2×4+9 = 42

SSD between levels = 42 – 12 – 12 = 18
Degrees of freedom

df_between = 2 – 1 = 1
df_within = 18 - 2 = 16

F statistic
F_1,16 = (18 / 1) / (24 /16) = 12 which is significant at the 0.01 level

t test

A t test is equivalent to an F test for a factor with two levels
A one-sided t test only looks for differences in one direction
A paired-samples t test compares repeated measures

Analysis of variance in R

Given data series x and y

Concatenate into single series
```
times <- c (x,y)
```

Define factor

group <- gl (2,9,labels=c("experienced","new"))

Run ANOVA
```
a <- aov (times~group)
summary(a)
```
Compare with t test
```
t.test (x,y)
```

F distribution

If U ~ Χ_m² and V ~ Χ_n²
F_m,n = (U/m) / (V/n)

Factorial ANOVA

Consider the following observations

Method 1 Method 2

Task 1 1 2 3 3 4 5

Task 2 2 3 4 4 5 6

Task 3 3 4 5 5 6 7
Total SSD = 42
SSD within methods = 12 + 12 = 24
SSD between methods = 42 – 24 = 18
df between methods = #methods – 1 = 2 – 1 = 1
MSD between methods = 18 / 1 = 18
SSD within tasks = 10 + 10 + 10 = 30
SSD between tasks = 42 – 30 = 12
df between tasks = #tasks – 1 = 3 – 1 = 2
MSD between tasks = 12 / 2 = 6
Residual SSD = 6 × 2 = 12
df for residuals = #observations – #groups = 18 – 6 = 12
MSD for residuals = 12 / 12 = 1
Analysis of methods

	Method 1	Method 2
Task 1	1	2	3	3	4	5
Task 2	2	3	4	4	5	6
Task 3	3	4	5	5	6	7

F_1,12 = 18 / 1 = 18
Significant at the 0.01 level

Analysis of tasks

F_2,12 = 6 / 1 = 6
Significant at the 0.05 level

In R

times <- c (1,2,3,2,3,4,3,4,5,3,4,5,4,5,6,5,6,7)
method <- gl (2,9)
task <- gl (3,3,18)
aov (times~(method*task))

Repeated measures ANOVA

Repeated measures reduce the degrees of freedom
Suppose there were just three participants who each tried both methods and all three tasks

Residual df for methods = (#participants – 1) × (#methods – 1) = 2
Residual df for tasks = (#participants – 1) × (#tasks – 1) = 4

Repeated measures in R

part <- gl (3,1,18)
aov (times~(method*task+Error(part/(method*task))))

Comparing text entry systems revisited

Given 24 measurement times and factors for method, session, group and participant

Compare methods
```
a <- aov (times~method)
summary (a)
```
Compare groups
```
a <- aov (times~group)
summary (a)
```

Factorial comparison

a <- aov (times~(method*group))
summary (a)

These analyses have assumed that each measurement was taken from a different participant, a repeated measures analysis is slightly different
```
a <- aov (times~(method+Error(participant/method)))
summary (a)
```

Interaction effects

Two factors may interact so that a particular level in one combined with a particular level in the other significantly affects the measurement even when the factors taken overall do not have a significant effect
The method:task term shows any interaction effect after any separate factor effects on variance have been analysed
If interaction effects are negligible, it may be appropriate to use a multi-way analysis instead of a factorial one

Two-way ANOVA

Given two separate factors with p and q levels respectively
and observations (x_ij)
Fit a model X_ij = μ + a_i + b_j + ε_ij

where residual ε_ij ~ N(0,σ²)

Overall mean m, row means r_i and column means c_j
Consider variations between rows and columns

SSD between rows = q Σ_i (r_i – m)² with (p-1) degrees of freedom
SSD between columns = p Σ_j(c_j – m)² with (q-1) degrees of freedom

Residual SSD = Σ_i Σ_j (x_ij – (r_i – m) – (c_j – m) – m)²
= Total SSD – SSD between rows – SSD between columns
with (p-1)×(q-1) = pq – (p-1) – (q-1) – 1 degrees of freedom

Two-way ANOVA analysis of tasks and methods

df for residuals = 18 – 1 – 2 – 1 = 14
MSD for residuals = 12 / 14 = 6 / 7 ≈ 0.857
Analysis of methods

F_1,14 = 18 / (6/7) = 21
Significant at 0.001 level

Analysis of tasks

F_2,14 = 6 / (6/7) = 7
Significant at the 0.01 level

Two-way ANOVA in R

```
aov (times~(method+task))
```

Formulae for models in R

Given factors A, B and C with p, q and r levels respectively
A has (p – 1) degrees of freedom
A:B:C represents the set product of A, B and C
with (p×q×r – 1) degrees of freedom and no residuals
A+B+C represents A, B and C as three-way factors
with (p – 1), (q – 1) and (r – 1) degrees of freedom
and (p×q×r – 1) – (p – 1) – (q – 1) – (r – 1) residual degrees of freedom
A*B*C = A + B + C + A:B + A:C + B:C + A:B:C
with (p–1), (q–1), (r–1), (p–1)×(q–1), (p–1)×(r–1), (q–1)×(r–1) and (p–1)×(q–1)×(r–1) degrees of freedom

Reporting an F statistic

There are strict conventions on how statistical results should be presented. The F statistic should be reported as follows:

There was a significant main effect of independent variable on dependent variable (F(d,e) = F-value, p < threshold).

Note that:

F is in upper case italic
d and e are the degrees of freedom in the methods and the samples
the F-value statistic is specified to three significant figures
p is in lower case italic
the p-value probability of a false positive is specified to two significant figures unless it is less than 0.001 in which case write "p < 0.001"
- it is impossible to demonstrate significance if the F-value is less than 1, in which case the probability is replaced by "n.s."
- if the F-value is greater than 1 but a large probability makes the result unlikely, the probability is specified as "p > 0.05"
some journals prefer the probability of a false positive to be specified as being below a threshold (of 0.05, 0.01 and 0.001) rather than exactly
(These values are often denoted by one, two or three asterisks; probabilities between 0.10 and 0.05 are denoted by a plus sign.)

Computer Laboratory