User`s guide

E-Prime User’s Guide
Appendix B: Considerations in Research
Page A-31
How many trials?
Why not just have the subject respond once to each type of display, and take that single RT as
the "score" for that condition? This would certainly be faster, since few trials would be needed.
The problem with using this procedure, however, is that it ignores the large variability in RT that is
due to factors other than the independent variables. RT varies from trial to trial, even if the
stimulus does not. That variability comes from momentary changes in attention and muscular
preparation, among other things. Note that subjects cannot pay attention evenly and uniformly for
any length of time. Even when you are listening to a fascinating lecture, you will find your
attention wandering from time to time. The same thing happens in RT experiments, when the
subject sits doing trial after trial. Occasionally, subjects will start a trial when their attention is not
focused on the task. When this happens, a very long RT usually results. Long RT's due to
inattentiveness would be expected to occur about equally often for all stimulus types, so
averaging a few such trials with many others does not create a problem.
Another way to look at the problem of number of trials per condition is to realize that the RT on
each trial provides an estimate of that subject's "true" RT for that condition. Each individual
estimate is not very reliable, for the reasons given above. Therefore, averaging a number of
estimates (RT's on many trials) provides a better (more reliable) estimate of "true" RT. Recall
that the confidence interval estimate of a population mean becomes more and more precise as
the sample size increases. Similarly, the estimate of true RT becomes better and better as
sample size increases--though in this instance, sample size refers to the number of trials per
subject, rather than the number of subjects. By employing the formula for the confidence interval,
determine the number of trials needed to have a certain level of accuracy. In practice, 15-30
trials per condition per subject seem to provide a satisfactory result. This is enough trials that a
few aberrant trials will have little effect on the mean RT for that condition.
Between- Versus Within-Subjects Designs
Another issue of importance to RT experiments is that of whether the independent variables
should be manipulated between subjects or within subjects. Between-subjects variables are ones
where different subjects are tested on each level of the variable. For the example of two- versus
four-choice RT, that would mean that subjects do either the two-choice version or the four-choice
version, but not both. Within-subjects variables are those where each subject is tested at each
level of the variable. For the same example, this would mean that each subject does both two-
and four-choice trials (in either random or blocked order).
Which method is preferred? We use a different example here, to simplify. Suppose an
experimenter wanted to determine the effect of alcohol on RT's to a simple stimulus, and had 20
subjects available. He or she could randomly assign 10 subjects to perform the task drunk and
10 to perform it sober, then compare those mean RT's. This would be a between-subjects
design. But why not test each subject both sober and drunk? That way there are 20 subjects in
each condition. This would be a within-subjects design. (Of course, she would want to
counterbalance the order, and test some subjects sober and then drunk, and others drunk and
then sober.) It should be clear that an analysis based on 20 subjects per group is more powerful
than one based on only 10 subjects per group. (Note that the type of statistical analysis would
change slightly, since a within-subjects design violates the assumption of independent samples.
In this case, comparing two means, the t-test for independent samples would be used with the
between-subject design, and the t-test for dependent ("correlated", "matched-pairs") samples with
the within-subject design. If there were several levels of dosage used, the appropriate test would
be the standard ANOVA for the between-subjects design, and the repeated-measures ANOVA for
the within-subjects design.)