Skip to content | Access key help

Computer Laboratory

Course pages 2013–14

Experimental Methods

Design of experiments

What is research?

Not just diligent search to collect information about a topic
Development and revision of theories
Enquiry and examination using experiments aimed at the discovery or interpretations of facts

Originating in observations or experience
Capable of being verified or disproved by observation or experiment

Practical application of theories

Observations and inferences

Descriptive

X is happening
Observations, field studies, focus groups, interviews

Relational

X is related to Y
Observations, field studies, surveys

Experimental

X is responsible for Y
Controlled experiments

Theory and hypothesis

A theory can be very broad

In movement tasks, the movement time increases as the movement distance increases and the size of the target decreases. The movement time has a log linear relationship with the movement distance and the width of the target. (Fitts' Law)

A concrete research hypothesis lays the foundation for an experiment and can the basis for testing of statistical significance

Fitts' Law predicts navigation times successfully for a mouse and for an eye tracker.

Phrasing a hypothesis

Hypothesis should be testable
Strength

Are pop-up menus any good?
Are pop-up menus better than pull-down menus?
Are pop-up menus faster than pull-down menus?
Is the time taken for an experienced user to invoke a command using a pop-up menu less than the time taken using a pull-down menu?

The scientific method

Formulate hypothesis
Design experiment
Test with pilot, revising design if necessary
Run experiment and collect data
Analyse data
Draw conclusions
Start again with a revised hypothesis if necessary

Null and alternative hypotheses

Null hypothesis – H₀

Treatment has no effect
Any difference in measurements can be explained by random variation resulting from experimental procedure

Alternative hypothesis – H₁

Treatment has an effect
Difference in measurements is unlikely to be explained by random variation

Experiment and statistical analysis determines whether to accept or reject the null hypothesis

Comparing pull-down and pop-up menus

Speed

H₀: There is no difference between the times taken to select an item using pull-down and pop-up menus
H₁: There is a difference between the times taken to select an item using pull-down and pop-up menus

Satisfaction

H₀: There is no difference between user satisfaction selecting an item using pull-down and pop-up menus
H₁: There is a difference between user satisfaction selecting an item using pull-down and pop-up menus

Two different measures – timing and questionnaire

Errors

	H₁ true	H₀ true
Reject H₀	True positive decision Probability 1-β (power)	False positive decision (Type I error) Probability α
Accept H₀	False negative decision (Type II error) Probability β	True negative decision Probability 1-α (confidence)

Aim for α < 0.05 (95% confidence level) or α < 0.01 (99% confidence level)
Aim for β < 0.20 for power greater than 80% (correctly rejecting the null hypothesis)

Measures of accuracy

Precision = TP ÷ (TP + FP)

probability of a detected positive being true

Recall = TP ÷ (TP + FN) = 1 - β

probability of true positive being detected
Power or Sensitivity

Specificity = TN ÷ (FP + TN) = 1 - α

probability of true negative being detected

Accuracy = (TP + TN) ÷ (TP + TN + FP + FN)
F1 score = 2 × Precision × Recall ÷ (Precision + Recall)
F_λ = (1 + λ²) × Precision × Recall ÷ (λ²×Precision + Recall)

recall λ times more important than precision

Receiver operating characteristic

Plots recall (1-β) against false positive rate (α)
Area under the curve (A') is the probability that a classifier will rank a random positive instance higher than a random negative one

Controlling errors

Control groups with no treatment

Randomisation

Single and double blinding

Unconscious bias from well intentioned evaluators
Placebos

Learning and order effects
Confounding

Two different factors give rise to the same effect
e.g. age and experience may both contribute to ability

Fair testing

Take care to match ratios of samples to population size in different strata

Validity

External validity

The extent to which results can be generalised to other people in other situations
Requires representative participants and representative environment

Internal validity

The extent to which effects observed can be attributed to the test conditions
Differences caused by conditions and variance caused by participants

Increasing validity

Relaxing the test environment and experimental procedures to mimic the real world is likely to introduce uncontrolled variation from sources such as distractions or secondary tasks
Pose several narrow (testable) questions that cover a range of broader outcomes that cover the broader (untestable) questions

A technique that is faster, is more accurate, is easier to learn and is easier to remember is generally better

Testable and untestable questions are usually correlated
Comparative evaluations are more informative than user studies to identify strengths and weaknesses in a single technique

Variables

Independent variables

Different treatments being compared
Controlled by the experimenter
e.g. type of menu

Dependent variables

Effects being observed
Measured during the experiment
e.g. time taken to select an item and user satisfaction

Qualitative variables

Nominal or categorical

e.g. pull-down or pop-up menu

Ordinal or ranked

e.g. computer experience: < 1 year, 1-5 years, 5-15 years, > 15 years

Likert scale

e.g. pull-down was better: strongly disagree, disagree, neutral, agree, strongly agree
Even number of answers excludes neutral: strongly disagree, disagree, slightly disagree, slightly agree, agree, strongly agree
Balanced, so not: poor, average, good, very good, excellent

Quantitative variables

Discrete or continuous
Interval

Equally spaced
e.g. temperature

Ratio

Zero based
e.g. time taken to select an item, number of errors made

Other variables

Control variables

Factors that may influence a dependent variable but are not under investigation – so control them
Improve internal validity at the expense of external validity

Random variables

Factors that are allowed to vary randomly
Improve external validity at the expense of internal validity

Confounding variable

Factor that varies systematically with an independent variable

e.g. Prior experience with a particular technique

Design

Between subjects

Each participant is exposed to a single condition
No risk of learning and skill transfer so no need to counterbalance
Variance is not controlled

Within subjects or repeated measures

Each participant is exposed to all conditions
Variance is controlled
More demanding for participants and need to counterbalance

Mixed design or split plot

Balance load on participants and control of variance

Fisher balanced Latin square

Counterbalancing

Participants' performance may improve with practice during a repeated measures experiment
Counterbalance the order of presenting conditions
Latin square has each condition appearing once in each row and in each column
Balanced Latin square has each condition preceding and following each other condition equal numbers of times

Succinct design statement

'3×2 repeated measures design'
An experiment with two different factors, having three levels for the first and two levels for the second

e.g. Three different text entry systems and two different tasks

Factorial design tests all participants on all six conditions
Mixed design might test half the participants on all three systems for one task, and the other half on all three systems for the other task

Participants

'Participants' preferable to 'subjects'
Distinguish from users of the resulting system
Report recruitment procedure and selection criteria
Report relevant demographic information

Number of participants
Age (range, mean, standard deviation)
Balance of sexes
Prior experience

Ethical procedures

Appropriate experimental design

Test with a pilot

Recruitment of participants
Informed consent with signed form
Briefing with full written instructions for participant and experimenter
Treatment of participants

Dignity and respect, right to withdraw without penalty

Debriefing interview and further explanation
Data retention subject to Data Protection Act
Incentives and compensation

Ethical problems

Participants who are minors or have disabilities
Experiments that are likely to cause physical or mental distress or embarrassment (in any case, consent forms should make it clear that subjects can withdraw at any time)
Experiments involving deception or emotional manipulation
Keeping data in any form that would allow individuals to be identified
Auditing compensation
Medical experiments
Experiments on non-human species

Ethical approval

Laboratory Research Ethics Committee
Application
- Description of the experiment
- Consent form
- Questionnaires
- Details of remuneration

Reporting experiments

Method

Participants
Apparatus
Procedure
Design

Results
Discussion

Further information

Computer Laboratory procedures for experiments with human participants
More general advice on research involving human participants

© 2014 Neil Dodgson
Information provided by Prof. Neil Dodgson
Access to this page is restricted to .cam.ac.uk.