Course pages 2013–14
Experimental Methods
Design of experiments
What is research?
- Not just diligent search to collect information about a topic
- Development and revision of theories
- Enquiry and examination using experiments aimed at the discovery or interpretations of facts
- Originating in observations or experience
- Capable of being verified or disproved by observation or experiment
- Practical application of theories
Observations and inferences
- Descriptive
- X is happening
- Observations, field studies, focus groups, interviews
- Relational
- X is related to Y
- Observations, field studies, surveys
- Experimental
- X is responsible for Y
- Controlled experiments
Theory and hypothesis
- A theory can be very broad
- In movement tasks, the movement time increases as the movement distance increases and the size of the target decreases. The movement time has a log linear relationship with the movement distance and the width of the target. (Fitts' Law)
- A concrete research hypothesis lays the foundation for an experiment and can the basis for testing of statistical significance
- Fitts' Law predicts navigation times successfully for a mouse and for an eye tracker.
Phrasing a hypothesis
- Hypothesis should be testable
- Strength
- Are pop-up menus any good?
- Are pop-up menus better than pull-down menus?
- Are pop-up menus faster than pull-down menus?
- Is the time taken for an experienced user to invoke a command using a pop-up menu less than the time taken using a pull-down menu?
The scientific method
- Formulate hypothesis
- Design experiment
- Test with pilot, revising design if necessary
- Run experiment and collect data
- Analyse data
- Draw conclusions
- Start again with a revised hypothesis if necessary
Null and alternative hypotheses
- Null hypothesis – H0
- Treatment has no effect
- Any difference in measurements can be explained by random variation resulting from experimental procedure
- Alternative hypothesis – H1
- Treatment has an effect
- Difference in measurements is unlikely to be explained by random variation
- Experiment and statistical analysis determines whether to accept or reject the null hypothesis
Comparing pull-down and pop-up menus
- Speed
- H0: There is no difference between the times taken to select an item using pull-down and pop-up menus
- H1: There is a difference between the times taken to select an item using pull-down and pop-up menus
- Satisfaction
- H0: There is no difference between user satisfaction selecting an item using pull-down and pop-up menus
- H1: There is a difference between user satisfaction selecting an item using pull-down and pop-up menus
- Two different measures – timing and questionnaire
Errors
H1 true | H0 true | |
---|---|---|
Reject H0 | True positive decision Probability 1-β (power) |
False positive decision (Type I error) Probability α |
Accept H0 | False negative decision (Type II error) Probability β |
True negative decision Probability 1-α (confidence) |
- Aim for α < 0.05 (95% confidence level) or α < 0.01 (99% confidence level)
- Aim for β < 0.20 for power greater than 80% (correctly rejecting the null hypothesis)
Measures of accuracy
- Precision = TP ÷ (TP + FP)
- probability of a detected positive being true
- Recall = TP ÷ (TP + FN) = 1 - β
- probability of true positive being detected
- Power or Sensitivity
- Specificity = TN ÷ (FP + TN) = 1 - α
- probability of true negative being detected
- Accuracy = (TP + TN) ÷ (TP + TN + FP + FN)
- F1 score = 2 × Precision × Recall ÷ (Precision + Recall)
- Fλ = (1 + λ²) × Precision × Recall ÷ (λ²×Precision + Recall)
- recall λ times more important than precision
Receiver operating characteristic
- Plots recall (1-β) against false positive rate (α)
- Area under the curve (A') is the probability that a classifier will rank a random positive instance higher than a random negative one
Controlling errors
- Control groups with no treatment
- Randomisation
- Single and double blinding
- Unconscious bias from well intentioned evaluators
- Placebos
- Learning and order effects
- Confounding
- Two different factors give rise to the same effect
- e.g. age and experience may both contribute to ability
Fair testing
- Take care to match ratios of samples to population size in different strata
Validity
- External validity
- The extent to which results can be generalised to other people in other situations
- Requires representative participants and representative environment
- Internal validity
- The extent to which effects observed can be attributed to the test conditions
- Differences caused by conditions and variance caused by participants
Increasing validity
- Relaxing the test environment and experimental procedures to mimic the real world is likely to introduce uncontrolled variation from sources such as distractions or secondary tasks
- Pose several narrow (testable) questions that cover a range of broader outcomes that cover the broader (untestable) questions
- A technique that is faster, is more accurate, is easier to learn and is easier to remember is generally better
- Testable and untestable questions are usually correlated
- Comparative evaluations are more informative than user studies to identify strengths and weaknesses in a single technique
Variables
- Independent variables
- Different treatments being compared
- Controlled by the experimenter
- e.g. type of menu
- Dependent variables
- Effects being observed
- Measured during the experiment
- e.g. time taken to select an item and user satisfaction
Qualitative variables
- Nominal or categorical
- e.g. pull-down or pop-up menu
- Ordinal or ranked
- e.g. computer experience: < 1 year, 1-5 years, 5-15 years, > 15 years
- Likert scale
- e.g. pull-down was better: strongly disagree, disagree, neutral, agree, strongly agree
- Even number of answers excludes neutral: strongly disagree, disagree, slightly disagree, slightly agree, agree, strongly agree
- Balanced, so not: poor, average, good, very good, excellent
Quantitative variables
- Discrete or continuous
- Interval
- Equally spaced
- e.g. temperature
- Ratio
- Zero based
- e.g. time taken to select an item, number of errors made
Other variables
- Control variables
- Factors that may influence a dependent variable but are not under investigation – so control them
- Improve internal validity at the expense of external validity
- Random variables
- Factors that are allowed to vary randomly
- Improve external validity at the expense of internal validity
- Confounding variable
- Factor that varies systematically with an independent variable
- e.g. Prior experience with a particular technique
Design
- Between subjects
- Each participant is exposed to a single condition
- No risk of learning and skill transfer so no need to counterbalance
- Variance is not controlled
- Within subjects or repeated measures
- Each participant is exposed to all conditions
- Variance is controlled
- More demanding for participants and need to counterbalance
- Mixed design or split plot
- Balance load on participants and control of variance

Counterbalancing
- Participants' performance may improve with practice during a repeated measures experiment
- Counterbalance the order of presenting conditions
- Latin square has each condition appearing once in each row and in each column
- Balanced Latin square has each condition preceding and following each other condition equal numbers of times
Succinct design statement
- '3×2 repeated measures design'
- An experiment with two different factors, having three levels for the first and two levels for the second
- e.g. Three different text entry systems and two different tasks
- Factorial design tests all participants on all six conditions
- Mixed design might test half the participants on all three systems for one task, and the other half on all three systems for the other task
Participants
- 'Participants' preferable to 'subjects'
- Distinguish from users of the resulting system
- Report recruitment procedure and selection criteria
- Report relevant demographic information
- Number of participants
- Age (range, mean, standard deviation)
- Balance of sexes
- Prior experience
Ethical procedures
- Appropriate experimental design
- Test with a pilot
- Recruitment of participants
- Informed consent with signed form
- Briefing with full written instructions for participant and experimenter
- Treatment of participants
- Dignity and respect, right to withdraw without penalty
- Debriefing interview and further explanation
- Data retention subject to Data Protection Act
- Incentives and compensation
Ethical problems
- Participants who are minors or have disabilities
- Experiments that are likely to cause physical or mental distress or embarrassment (in any case, consent forms should make it clear that subjects can withdraw at any time)
- Experiments involving deception or emotional manipulation
- Keeping data in any form that would allow individuals to be identified
- Auditing compensation
- Medical experiments
- Experiments on non-human species
Ethical approval
- Laboratory Research Ethics Committee
- Application
- Description of the experiment
- Consent form
- Questionnaires
- Details of remuneration
Reporting experiments
- Method
- Participants
- Apparatus
- Procedure
- Design
- Results
- Discussion
Further information
- Computer Laboratory procedures for experiments with human participants
- More general advice on research involving human participants