User`s guide

ManualsBrandsPrecision Power ManualsAutomotiveA200.2

241

242

243

244

245

246

247

248

249

250

E-Prime User’s Guide

Appendix B: Considerations in Research

Page A-31

How many trials?

Why not just have the subject respond once to each type of display, and take that single RT as

the "score" for that condition? This would certainly be faster, since few trials would be needed.

The problem with using this procedure, however, is that it ignores the large variability in RT that is

due to factors other than the independent variables. RT varies from trial to trial, even if the

stimulus does not. That variability comes from momentary changes in attention and muscular

preparation, among other things. Note that subjects cannot pay attention evenly and uniformly for

any length of time. Even when you are listening to a fascinating lecture, you will find your

attention wandering from time to time. The same thing happens in RT experiments, when the

subject sits doing trial after trial. Occasionally, subjects will start a trial when their attention is not

focused on the task. When this happens, a very long RT usually results. Long RT's due to

inattentiveness would be expected to occur about equally often for all stimulus types, so

averaging a few such trials with many others does not create a problem.

Another way to look at the problem of number of trials per condition is to realize that the RT on

each trial provides an estimate of that subject's "true" RT for that condition. Each individual

estimate is not very reliable, for the reasons given above. Therefore, averaging a number of

estimates (RT's on many trials) provides a better (more reliable) estimate of "true" RT. Recall

that the confidence interval estimate of a population mean becomes more and more precise as

the sample size increases. Similarly, the estimate of true RT becomes better and better as

sample size increases--though in this instance, sample size refers to the number of trials per

subject, rather than the number of subjects. By employing the formula for the confidence interval,

determine the number of trials needed to have a certain level of accuracy. In practice, 15-30

trials per condition per subject seem to provide a satisfactory result. This is enough trials that a

few aberrant trials will have little effect on the mean RT for that condition.

Between- Versus Within-Subjects Designs

Another issue of importance to RT experiments is that of whether the independent variables

should be manipulated between subjects or within subjects. Between-subjects variables are ones

where different subjects are tested on each level of the variable. For the example of two- versus

four-choice RT, that would mean that subjects do either the two-choice version or the four-choice

version, but not both. Within-subjects variables are those where each subject is tested at each

level of the variable. For the same example, this would mean that each subject does both two-

and four-choice trials (in either random or blocked order).

Which method is preferred? We use a different example here, to simplify. Suppose an

experimenter wanted to determine the effect of alcohol on RT's to a simple stimulus, and had 20

subjects available. He or she could randomly assign 10 subjects to perform the task drunk and

10 to perform it sober, then compare those mean RT's. This would be a between-subjects

design. But why not test each subject both sober and drunk? That way there are 20 subjects in

each condition. This would be a within-subjects design. (Of course, she would want to

counterbalance the order, and test some subjects sober and then drunk, and others drunk and

then sober.) It should be clear that an analysis based on 20 subjects per group is more powerful

than one based on only 10 subjects per group. (Note that the type of statistical analysis would

change slightly, since a within-subjects design violates the assumption of independent samples.

In this case, comparing two means, the t-test for independent samples would be used with the

between-subject design, and the t-test for dependent ("correlated", "matched-pairs") samples with

the within-subject design. If there were several levels of dosage used, the appropriate test would

be the standard ANOVA for the between-subjects design, and the repeated-measures ANOVA for

the within-subjects design.)