Free Essay

Research a

In:

Submitted By shairalucban
Words 44507
Pages 179
&r'

1
||
i

I

Test
Anxiety
Applied Research,
Assessment, and
Treatment
Interventions

i
I
: fet 2nd Edition

I
«

MARTY SAPP

m:

I
H

1

ttTttTtttttrrtiTTITTtrrtttttttttTtrttiTTtrrttTtttTtTTTtttttiTttt

TEST ANXIETY
Applied Research, Assessment, and
Treatment Interventions
2nd Edition

Marty Sapp

University Press of America, Inc.
Lanham • New York • Oxford

Copyright © 1999 by
University Press of America,® Inc.
4720 Boston Way
Lanham, Maryland 20706
12 Hid's Copse Rd.
Cumnor Hill, Oxford 0X2 9JJ
All rights reserved
Printed in the United States of America
British Library Cataloging in Publication Information Available
Library of Congress Cataloging-in-Publication Data

Sapp, Marty.
Test Anxiety : applied research, assessment, and treatment interventions / Marty Sapp. —2nd ed.
p. cm.
Includes bibliographical references and indexes.
1. Test anxiety—Research—Statistical methods. 2. Social sciences—Statistical methods. I. Title.
LB3060.6.S27 1999 371.26'01'9—dc21 99—22530 CIP

ISBN 0-7618-1386-1 (cloth: alk. ppr.)

fc/ The paper used in this publication meets the minimum requirements of American National Standard for Information
Sciences—Permanence of Paper for Printed Library Materials,
ANSI Z39.48—1984

To my students

Preface to First Edition

Preface to Second Edition

This text is divided into three parts. Part I deals with applied research design and statistical methodology frequently occurring in test anxiety literature. Part II focuses on theories and methods of assessing test anxiety using standardized instruments. Part III extensively describes and provides treatment scripts for test anxiety. In addition to advanced undergraduate and graduate students in the social sciences, this text is designed to attract two audiences—the quantitatively oriented professors teaching statistics and research methodology courses and counseling psychology professors teaching counseling and social sciences research courses. Essentially, the purpose of this text is to present a conceptual understanding of test anxiety within a research context.
On a semester system it is possible to cover all eleven chapters within two semesters. It seems plausible, since this is an innovative applied research textbook on test anxiety, that chapters can be adjusted to fit a professor's specific objectives. For example, the treatment scripts could be used in a counseling fieldwork or practicum course, while the research section would be appropriate for a research methods or statistics course in which the instructor could edit or expand upon the topics presented.

Over the last five years, the nature of test anxiety research has been influenced greatly by structural equation modeling. One purpose of the second edition of this text is to introduce researchers to the logic of structural equations and to show how the EQS, structural equation program, can easily perform structural equations modeling. Another purpose of this second edition is to synthesize more than 100 studies that have been published on test anxiety since 1993. Moreover, researchers generally view test anxiety as existing of factors such as Sarason's fourfactor model or Spielberger's two-factor model. All of these models of test anxiety can be easily analyzed by EQS; therefore, this second edition provides an entire chapter on structural equation modeling.
The features that made the first edition popular, such as applied research, assessment, and treatment interventions, are retained in the second edition; however, a chapter on measurement issues—item response theory and generalizability theory—was added to the second edition. In addition, control lines are provided for the SAS statistical software.
Moreover, nested designs, both the univariate case and multivariate case, are covered in this edition. Comments from students, faculty, and researchers at various institutions indicated that their institutions had at least one of these major statistical packages; therefore, readers will have the option of two packages.
Test Anxiety: Applied Research, Assessment, and Treatment
Interventions, Second Edition, like the previous edition, is directed toward students in the social sciences because it integrates statistical methodology and research design with actual research situations that occur within the test anxiety area. The current edition will draw from two major audiences—the quantitative professors who teach statistics and research methodology courses and others who teach counseling psychology and related courses.
In closure, "the current edition is a brief, applied text on research, assessment, and treatment interventions for test anxiety. Moreover, the current edition demonstrates how to conduct test anxiety research, and it provides actual empirically based treatment interventions. Finally, this edition presents the two most-employed statistical packages, and illustrations of EQS for structural equations modeling and confirmatory factor analysis are provided.

VII

Acknowledgments
Preface
To Students and Social Scientists
This text was designed to give you the courage and confidence to understand and conduct test anxiety research; however, the research skills employed in test anxiety are those generally employed in the social sciences. By combining research methods and design with applied research statistics, this text offers a perspective not found in any other texts on applied research that this writer is aware of; therefore, this text would be useful for applied social scientists.
Students taking an introductory or intermediate statistics or research methods course will find this text a useful supplement. Unlike other texts, example after example of applied research situations are described along with an adequate sample of exercises followed by detailed solutions. In contrast to traditional texts with statistical exercises, this text provides solutions following every exercise so that students and social scientists can obtain instant feedback. Possibly, the most useful feature of this text for advanced level students and applied researchers are the complete control lines for running statistical analyses on the SPSSX and SAS computer software. Many statistical analyses that are usually covered in introductory, intermediate, and advanced statistics courses are discussed along with the exact codes for running them on SPSSX and SAS.
In summary, a text on test anxiety that combines applied research, assessment, and treatment interventions has not heretofore been available; neither has one offering clear procedures for assessing test anxiety. In order to demonstrate how to assess test anxiety, the reproduction of several commonly used self-report measures of test anxiety are provided.
Finally, it is hoped that this text will facilitate your development as an applied researcher in test anxiety or in social science research.

Now it is time to thank individuals who helped facilitate bringing this work into press. First, I would like to thank Daniel Bieser for running many of the statistical exercises for the first edition and Khyana
Pumphrey for running many of the exercises for the second edition.
Second, June Lehman deserves thanks for proofreading this entire text.
In addition, I offer thanks to Cathy Mae Nelson for bringing this text into camera-ready condition. I am grateful to the Literary Executor of the late
Sir Ronald A. Fisher, F.R.S., to Dr. Frank Yates, F.R.S., to Longman
Group, Ltd., London, and to Oliver and Boyd, Ltd., Edinburgh for permission to reproduce statistical tables B and J from their book,
Statistical Tables for Biological, Agricultural and Medical Research.
Thanks goes to Helen Hudson and James E. Lyons for their continued support and encouragement. I would like to also thank my department at the University of Wisconsin-Milwaukee and the following individuals at the University of Cincinnati. First, Dr. James Stevens, who taught me research design and statistics. Second, Dr. Patricia O'Reilly, the chairperson of my doctoral committee. And third, Dr. Purcell Taylor, Dr.
Judith Frankel, and Dr. Marvin Berlowitz, all of whom served on my doctoral committee. Finally, I would like to thank Dr. David L. Johnson who served as my doctoral internship supervisor.
In closing, it is hoped that this text will help students and social scientists learn to understand and conduct test anxiety research. Social scientists and students in social sciences will find that this is an excellent applied research reference book with many exercises and examples. Note that careful studying of the research chapters is necessary to facilitate one's understanding of the social science research. Comments or discussions concerning this text—both positive and negative—are encouraged. My address is The University of Wisconsin-Milwaukee,
Department of Educational Psychology, 2400 E. Hartford Avenue,
Milwaukee, WI 53211. My telephone number is (414) 229-6347, my email address is Sapp@uwm.edu, and my Fax number is (414) 229-4939.
Marty Sapp

Contents
Page
Preface to First Edition
Preface to Second Edition
Preface to Students and Social Scientists

iv v vi

Part I Applied Research

1

Chapter 1 Variables Employed in Test Anxiety Research

3

1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
1.10

Variables
Confounding Variables
Independent Variables
Dependent Variables
Moderator Variables
Control Variables
Intervening Variables
Suppressor Variables
Exercises
Summary

5
5
5
5
6
6
7
7
7
8

Chapter 2 Internal Validity

9

2.1
Threats to Internal Validity
2.2 History
2.3
Maturation
2.4
Pretest Sensitization
2.5
Selection
2.6
Statistical Regression
2.7
Experimental Mortality or Attrition
2.8 Instrumentation
2.9
Statistical Error
2.10 Expectation Effects
2.11 Double- and Single-Blind Controls for
Expectation Effects
2.12 Exercises
2.13 Summary

11
11
12
12
12
12
13
13
13
14
14
14
15

XI

Chapter 3 Difficulties that Occur with Test Anxiety
Research

17

3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12

19
19
19
19
20
20
20
20
21
21
21
21

External Validity
Difficulties that Occur with Test Anxiety Research
Hawthorne Effect
Demand Characteristics
Evaluation Apprehension
Social Desirability
Placebo Effect
Controlling the Hawthorne Effect
Reactivity
Pretest and Posttest Sensitization
Generalization of Results
Summary

5.2
5.3
5.4
5.5
5.6
5.7

5.8
5.9
5.10

Chapter 4 Common Research Designs

23

4.1
4.2
4.3
4.4

25
26
27
28
28
29
31
31
33
34
35
37
40
40

4.5

4.6
4.7
4.8
4.9

One-Group Designs
Independent Two-Group Designs
Related Two-Group Designs
Multiple Treatment Designs:
Factorial Designs
Solomon Design
Quasi-Experimental Designs:
Time-Series Designs
Nonequivalent Control Group Designs
Equivalent Time-Samples
Counterbalanced Designs
Nested Designs
Exercises
Summary

Chapter 5 Measures of Centra. Tendency and Measures of Variabilibility
Measures of Central Tendency .
5.1
Averages
Mean

43
.45
.45
.45

5.11
5.12
5.13

Mode
Median
Characteristics of the Mean
When to Use the Mode
When to Use the Median
Skewed Distributions
When to Use the Mean
Measures
Standard Deviation
Variance
Computer Examples for Measures of Central Tendency and Measures of Variability
SPSSX Release 4.0
Applications of the Mean and Standard Deviation to the Normal Curve
Moments: Measures of Skewness and Kurtosis
Summary
Exercises

Chapter 6 Common Univariate Statistics
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
6.11
6.12
6.13
6.14

45
45
46
48
49
49
52
52
52
52
55
56
63
64
67
68
71

Hypothesis Testing
73
t-test for Independent Groups
78
t-test for Related or Correlated Groups
89
t-test a Special Case of Correlation or Regression
93
One-way Analysis of Variance
94
SPSSX Power Estimate and Effect Size Measures
103
Two-way ANOVA
105
Disporportional Cell Size or Unbalanced
Factorial Designs
118
Planned Comparisons and the Tukey Post Hoc Procedure . . 1 2 3
One-Way Analysis of Covariance
141
Post Hoc Procedures for ANCOVA
149
SPSSX Control Lines for Factorial Analysis of Covariance
150
Nested Designs
151
Summary
155

Chapter 7 Multivariate Research Statistical Methodology
Using SPSSX and SAS
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
7.10
7.11
7.12
7.13
7.14
7.15
7.16
7.17
7.18
7.19
7.20
7.21
7.22
7.23
7.24
7.25
7.26
7.27
7.28

One-Group Repeated Measures ANOVA
Assumptions of Repeated Measures
Violations of Sphericity Assumption
Controversy in Calculations of Greenhouse
Epsilon Statistic
Tukey Post Hoc Procedure for One-Group Repeated
Measures Design
Tukey Confidence Intervals for One-Group Repeated
Measures ANOVA
One-Group Repeated Measures ANOVA Exercises
Multiple Regression
Steps for Cross-Validating Regression Equations
Number of Recommended Subjects Per Predictor for Regression Equations
Relationship Among R2, Y, and F
Assumptions of Multiple Regression
When Regression Assumptions Appear to Be Violated . . . .
Six Methods of Selecting Predictors and Regression
Models
SPSSX Control Lines for Running the Backward
Elimination Process
Multiple Regression Exercises
K Group MANOVA
Assumptions of MANOVA
SPSSX for K Group MANOVA
K Group MANOVA Exercises
Factorial Multivariate Analysis of Variance
Factorial MANOVA Exercises
Multivariate Analysis of Covariance: Three Covariates and Three Dependent Variables
Factorial MANCOVA: One Covariate and Two
Dependent Variables
One-Way MANCOVA Exercises
Post Hoc Procedures for MANCOVA
Nested MANOVA
Summary

7.29
161
162
162
163
164
170
171
172
178
176
177
178
178
184
184
187
197
197
198
199
202
202
205
205
208
209
209
210
215

Choosing a Statistical Procedure

216

7.30

Statistical Application Exercises

218

Part II: Measurement Issues

225

Chapter 8 Measurement Issues
8.1
Measurement Issues
8.2 Testing the Dimensionality of the Worry Component or the Test Anxiety Inventory with Economically and
Educationally At-Risk High School Students: Employing
Item Response Theory Analysis and Principal
Components Analysis

227
229

230

Chapter 9 Introduction to Structural Equation Modeling
Using EQS

251

9.1
Overview of Structural Equation Models
9.2 Elements of the EQS Control Language
9.3
EQS Path Analysis
9.4
Selected Output from EQS path Analysis
9.5
EQS Confirmatory Factor Analysis
9.6
Selected Output from Confirmatory Factor Analysis
9.7
Confirmatory Factor Analysis Exercise
9.8
Identification
9.9
Model Modification
9.10 Summary

253
256
257
258
258
259
261
263
263
264

Part III: Assessment

267

Chapter 10 Assessment

269

10.1
10.2
10.3

271
272

Constructs of Test Anxiety
Defining Test Anxiety
Parent-Child Interactions and the Development of
Test Anxiety
10.4 Measuring Test Anxiety in Children
10.5 The School Environment, Motivation, Learned
Helplessness, and Test Anxiety

272
273
275

10.6 Self-Efficacy and Test Anxiety
10.7 Measuring Test-Wiseness
10.8 The Components of Test Anxiety
10.9 Recommendations for Parents
10.10 Test Anxiety and Performance
10.11 Test Anxiety Measures for Adolescents and Adults
10.12 The Development of Mathematical Test Anxiety
10.13 Test Anxiety in the Academically At-Risk
10.14 Summary

276
278
279
284
286
287
; . . . 293
294
296

APPENDIX

375

Table A Percent Area Under the Normal Curve Between the Mean and Z

377

Table B Critical Values oft

378

Table C Critical Values of F

380

Part IV: Treatment Interventions

319

Table D

Chapter 11 Treatment Interventions

321

Table E Critical Values for Bryant-Paulson Procedure

11.1 Psychotherapy Efficacy
323
11.2 Research to Support Treatment Scripts for Test Anxiety . .. 325
11.3 Appropriate Clientele and Qualifications for
Treatment Scripts
327
11.4 Introduction to Study Skills Counseling Script
328
11.5 Study Skills Counseling Session 1
328
11.6 Study Skills counseling Session 2
329
11.7 Study Skills Counseling Session 3
330
11.8 Study Skills Counseling Session 4
332
11.9 Supportive Counseling Script
334
11.10 Introduction to Relaxation Therapy Script
338
11.11 Relaxation Therapy Session 1
340
11.12 Relaxation Therapy Session 2
344
11.13 Relaxation Therapy Session 3
347
11.14 Relaxation Therapy Session 4
349
11.15 Systematic Desensitization
35]
11.16 Summary of Cognitive-Behavioral Hypnosis
Therapy Script
356
11.17 Cognitive-Behavioral Therapy
366

Percentage

Points

of

the

Studentized

Range

385
389

Table F The Hartley F-MAX Test for Homogeneity of Variances

392

Table G Critical Values for Dunnett's Test

394

Table H Critical Values of Pearson r

395

Table I

Critical Values of rs (Spearman Rank-Order
Correlation Coefficient)

Table J

397

Critical Value of Chi-Square

398

Table K A Table of Random Numbers

400

Author Index

403

Subject Index

411

Part I: Applied Research

1. Applied Research
Chapter 1
CONTENTS
Variables Employed In Test Anxiety Research:
1.1
Variables
Confounding Variables
1.2
1.3
Independent Variables
1.4
Dependent Variables
1.5
Moderator Variables
1.6
Control Variables
1.7
Intervening Variables
1.8
Suppressor Variables
Exercises
1.9
1.10 Summary

1. Applied Research

5

1.1 VARIABLES
A variable is any condition in a scientific investigation that may change in quantity and/or quality. For example, a self-report measure of test anxiety is a variable that can change due to many factors.
Specifically, test anxiety changes in individuals over time (Anastasi,
1988). Often test anxiety increases as an evaluative situation approaches, such as an examination in a course. In summary, from a practical standpoint weight, height, room temperature and time of day are variables or conditions that can vary in quantity. Confounding, independent, dependent, moderator, control, intervening, and suppressor are the common variables employed in test anxiety research.
1.2 CONFOUNDING VARIABLES
Confounding variables are variables that affect the internal validity of a study. That is, to what extent are the results of a study attributable to flaws in the research design? When a variable obscures the effects of another variable, it is said to be a confounding variable. Chapter 2 provides a discussion of confoundment and threats to internal validity.
1.3 INDEPENDENT VARIABLES
Independent variables are those variables which are manipulated or selected by a researcher, therapist, or clinician for study. These are presumed causes of change on the dependent variable(s). For test anxiety, independent variables are treatments such as supportive counseling, relaxation therapy, systematic desensitization, cognitive-behavioral hypnosis, and so on. Furthermore, classrooms of students selected for studying test anxiety would also represent independent variables.
1.4 DEPENDENT VARIABLES
Dependent variables are measures of some behavior, or the presumed effects of the independent variables. A common dependent variable for test anxiety is the Test Anxiety Inventory (Spielberger, 1980).
One can probably infer that measures on a dependent variable are dependent upon the values of an independent variable (a treatment), which results in changes on the dependent variable. Let us take a simple example of one treatment—relaxation therapy. Suppose we decided to have two levels of this treatment. For simplicity, let us assume that we had a class of students suffering from test anxiety and we decided to randomly divide them into two groups. We could treat one group and use

6

Test Anxiety: Applied Research

the other as a referent or control group. Thus, the independent variable would have two levels-treatment and control. Moreover, measures on the dependent variable is dependent upon the value or level of the independent variable. Thus, we would expect students in the treatment group to have lower levels of test anxiety than the control group In summary, dependent variables are dependent upon the levels of an independent variable which theoretically always has at least two levels.
1.5 MODERATOR VARIABLES
Moderator variables are special types of independent variables that moderate the effect between the primary independent and dependent variables. Moderator variables are factors that are measured manipulated, or chosen for study to determine if they modify the relationship between the primary independent and dependent variables
Levels of motivation and intelligence are moderating variables that can affect the results of test anxiety research. Another example of a moderator variable is what Rotter (1982) calls a generalized expectancy which is an expectation that one generally applies in a variety of related experiences. Of course, this is an individual's subjective expectancy that occurs m a variety of related situations. Specifically, an example of a generalized expectancy that is important to consider as a moderating variable in test anxiety research is locus of control. Locus of control is how an individual generally perceives the source of his or her outcomes
These outcomes can be positive or negative. For example, suppose you receive an "A" in a psychology course. Is this result luck or ability?
Internal locus of control means that an individual's reinforcements and punishments are the results of his or her abilities, while external locus of control means that an individual's reinforcements and punishments are attributed to outside or external events.
1.6 CONTROL VARIABLES
Control variables are independent variables that the researcher does not want to affect the results of a research design. Moreover control variables are confounding variables that a researcher must take into account in a research investigation. A researcher usually wants to hold a control variable constant. Commonly, levels of test anxiety are held constant. For example, we usually want subjects or clients with high levels of test anxiety. This can be accomplished by screening for or retaining subjects with high test anxiety scores.

1. Applied Research

7

1.7 INTERVENING VARIABLES
Intervening variables or mediating variables are theoretical variables; unlike independent variables, they cannot be seen, measured, or manipulated. They can influence the relationship between the primary independent and dependent variables. One's familiarity with theory will suggest which factors can theoretically affect the observed test anxiety phenomenon. The effect of an intervening variable must be inferred from the effects of the independent and moderating variables on the observed phenomenon. In this case it is, of course, test anxiety. Learning styles are often intervening variables in test anxiety research. Learning is a theoretical construct that cannot be directly seen, measured or manipulated but can be indirectly measured and inferred to have existed.
1.8 SUPPRESSOR VARIABLES
Suppressor variables are independent variables that are often used in regression analysis, and they conceal, obscure, or suppress the relationship between variables. Suppressor variables are correlated with other independent variables, but they are uncorrelated with the dependent variable. When a suppressor variable is removed from a study, irrelevant variance is eliminated, and the correlation between the residual independent variables and dependent variable is increased.
1.9 EXERCISES
For the following examples, identify the independent variables, the corresponding levels, and the dependent variables.
1. Cognitive-behavioral hypnosis, relaxation therapy, and supportive counseling was used to reduce the worry and emotionality components of test anxiety.
2. A researcher found a significant correlation between educational status, undergraduate and graduate, and the reduction of test anxiety.

1.

2.

Answers to Exercises
The treatment has three levels (cognitive-behavioral hypnosis, relaxation therapy, and supportive counseling) and is the independent variable; worry and emotionality test anxiety are the dependent variables. Educational status, undergraduate and graduate, is the independent variable, and test anxiety is the dependent variable.

8

Test Anxiety: Applied Research

1.10 SUMMARY
In test anxiety research there are two very important variables, the treatments for test anxiety-independent variables and measures of test anxiety-dependent variables. Moderating and control variables are special types of independent variables that can affect test anxiety results. Finally, intervening variables are factors which can theoretically affect the relationship between a treatment for test anxiety and a measure of test anxiety. References
Anastasi, A. (1988). Psychological testing (6th ed.). New York:
Macmillan.
Rotter, J. B. (1982). The development and application of social learning theory. New York: Praeger.
Spielberger, C. D. (1980). Test anxiety inventory. Palo Alto, CA:
Consulting Psychology Press.

2. Internal Validity
Chapter 2
CONTENTS
2.1
Threats to Internal Validity
2.2
History
2.3
Maturation
2.4
Pretest Sensitization
2.5
Selection
2.6
Statistical Regression
2.7
Experimental Mortality or Attrition
2.8
Instrumentation
2.9
Statistical Error
2.10 Expectation Effects
2.11 Double and Single Blind Controls for Expectation Effects
2.12 Exercises
2.13 Summary

2. Internal Validity

11

2.1 THREATS TO INTERNAL VALIDITY
Internal validity answers the following question: Did the treatment actually make a difference? More formally, did the independent variable cause a change on the dependent variable? There are many possible threats to internal validity in test anxiety research. However, only nine will be discussed. These threats are history, maturation, pretest sensitization, selection, statistical regression, experimental mortality or attrition, instrumentation, statistical error, and expectation. Each term will be defined followed by a vignette which will help illustrate the concept through a practical example.
External validity focuses on whether results obtained from an experiment can apply to the actual world or to other similar programs, situations, and approaches (Tuckman, 1978). Generally, as a researcher controls internal validity, he or she decreases the probability of external validity. In essence, internal validity answers the question of whether a treatment for test anxiety makes a difference, while external validity answers the questions whether or not the results obtained from a study can be generalized to other situations, subjects, or settings.
2.2 HISTORY
History occurs when something happens to research groups during the treatment period which can cause the groups to differ on the dependent variables. Suppose 30 test-anxious subjects were recruited for a test anxiety study, and they were randomly assigned to a cognitivebehavioral counseling group and a Hawthorne control group. After 8 weeks of treatment, we decide to evaluate the effectiveness of the cognitive-behavioral counseling in comparison to the control group. Let us assume we are doing a posttest on both groups in separate rooms.
During the posttest measure for the treatment group a small fire occurs.
Now, we measure both groups on the Test Anxiety Inventory once the fire is under control. It is clear from this vignette that the fire could have had unusual effects for the treatment group. Hence, it is doubtful that the impact of the fire will not influence the responses to test anxiety for the treatment group. It is extremely probable that the test anxiety scores for the treatment group may be higher or statistically equal to the control group due to the stress of the fire.

12

13

Test Anxiety: Applied Research

2. Internal Validity

2.3 MATURATION
Maturation are developmental changes that occur in subjects during the treatment period which affects the dependent variables. Let us assume we have identified 50 third grade children for a test anxiety study. We test them for test anxiety during the beginning and toward the end of the school year to determine if our treatment interventions had made a difference. The point to remember from this example is the fact that with young children, over the course of a school year, there will be developmental changes that can affect measures such as achievement, as well as test anxiety.

Test Anxiety Inventory. Students are given six sessions of relaxation therapy combined with study skills counseling. After the treatment session, we again measure the subjects on test anxiety and find that the mean percentile is now at the 50th. With extreme scores, there is a tendency for regression towards the mean. If the scores are high, the regression will be downward towards the mean which occurred in this case. If the scores are low, the regression will be upward towards the mean. It should be noted that on the Test Anxiety Inventory the mean is the 50th percentile which is what occurred with this example.

2.4 PRETEST SENSITIZATION
Pretest sensitization is where pretesting influences or sensitizes subjects to the dependent variable. Let us return to the maturation example, but for this instance, we will assume we are using the same instrument for both the beginning and end of the school year assessments of test anxiety. On many measures, such as self-report, using the same individuals on both pretest and posttest measures tend to affect the correlation between the two points in time. When the same self-report questionnaire is employed, this increases the correlation due to repeated measures from point one to point two. Primarily, this results from subjects remembering responses from the pretest and report similar responses on the posttest, which can lead to spurious results.
2.5 SELECTION
Selection is a process where the research groups differ or are unequal on some dependent variables before treatment begins. Suppose we recruit a group of 20 high school students for test anxiety and we divide the high test anxiety students in one group and separate the low test anxiety in another group. Now, we decide to treat each group with covert desensitization therapy. We noticed that after treatment, the low test anxious students improved more than the high test anxiety students.
These results can be attributed to the fact that the groups differed initially on the test anxiety variable and the results are not due to the treatment.
2.6 STATISTICAL REGRESSION
Statistical regression is a statistical fact that extremely high or low scores regress towards the arithmetic mean. Let us assume we are able to select 30 college students who scored above the 90th percentile on the

2.7. EXPERIMENTAL MORTALITY OR ATTRITION
Experimental mortality or attrition is the differential loss of subjects from research groups. If one starts off with 20 subjects in a treatment and control group and after four weeks 12 subjects withdraw from the control group, this clearly indicates attrition or experimental mortality, the systematic withdrawal of subjects from experimental groups. 2.8 INSTRUMENTATION
Instrumentation is error in measurement procedures or instruments.
That is, the differences between or among research groups is the product of the dependent variables employed. Suppose Jack, a new Ph.D. in counseling, decides to construct a new test anxiety instrument and he administers it to 50 subjects. He randomly selects 20 for research purposes in which 10 are in a hypnosis group and 10 are in a covert modeling group. Towards the end of treatment, he finds a significantly lower treatment mean for the hypnosis group. Jack concludes that his treatment reduced test anxiety. The difficulty with this example lies in the fact that Jack did not use a standardized instrument with adequate reliability and validity.
2.9 STATISTICAL ERROR
Statistical error is an error that can occur in a statistical analysis as a result of the null hypothesis being rejected when it is true. The null hypothesis simply states that the group means are equal. Suppose Jill decides to conduct a two-group MANOVA (multivariate analysis of variance) to determine if rational emotive behavior therapy is more effective in reducing test anxiety than a placebo control group.

14

2. Internal Validity

Test Anxiety: Applied Research

from the relaxation group scored significantly lower on test anxiety than the cognitive-behavioral hypnosis group.
A researcher takes one item from Test Anxiety Inventory, a standardized test anxiety measure, to determine if a sample of 50 participants experienced test anxiety. The researcher used this item to diagnose students with test anxiety.

Let us assume that Jill conducts a MANOVA and finds multivariate significance and decides to conduct ten univariate tests and finds one significant. This significance is spurious because one would expect one significant univariate test out often to be significant just simply due to change alone—especially if the difference was not predicted a priori.
2.10 EXPECTATION EFFECTS
Expectation effects can occur and are due to influences of the experimenter or the subjects. The experimenter can unconsciously influence the results of a research project; in contrast, subjects can determine the research hypothesis and give the experimenter the responses needed to support a hypothesis.
The expectation effect is extremely problematic when the experimenter knows which subjects are receiving treatments and also gathers data after the treatment. It is not uncommon for an experimenter to fit the responses into a certain theoretical framework. Good corrections for this bias are double blind studies or interrater measures of consistency or other measures of reliability included within a study.
2.11 DOUBLE AND SINGLE BLIND CONTROLS FOR
EXPECTATION EFFECTS
In a double blind study, neither the experimenter nor the subjects are aware of who is receiving the treatment or the experimental manipulations. A single blind study is a control procedure in which the experimenter measuring the behavior does not know whether the subject is a member of the treatment or control group. Here a research assistant could make this possible by keeping track of subjects' experimental status.
Another example of a single blind study is where subjects are ignorant of the purpose of study or to the specific treatments they are receiving. A single blind study can control for the experimenter expectancy bias; similarly, double blind procedures control for expectancy bias in both the experimenter and subjects.
2.12 EXERCISES
Identify the potential threats to internal validity.
1. Suppose 10 high test-anxious students are assigned to a cognitivebehavioral hypnosis group, and 10 low test-anxious students are assigned to a relaxation therapy group. After treatment, students

15

Answers to Exercises
1.
2.

Selection.
Instrumentation.

2.13 SUMMARY
In terms of test anxiety research, internal validity allows one to determine if a given treatment resulted in a decrease in test anxiety. The shortcomings of test anxiety research can be evaluated by investigating and ruling out threats to internal validity. Finally, in essence, internal validity allows a researcher to establish causal inferences within a certain confidence limit.
Reference
Tuckman, B. W. (1978). Conducting educational research. New York:
Harcourt Brace Jovanovich.

3. Difficulties That Occur With Test Anxiety Research

Chapter 3
CONTENTS
External Validity
3.1
Difficulties that Occur with Test Anxiety Research
3.2
Hawthorne Effect
3.3
3.4
Demand Characteristics
Evaluation Apprehension
3.5
3.6
Social Desirability
3.7
Placebo Effect
3.8
Controlling the Hawthorne Effect
3.9
Reactivity
3.10 Pretest and Posttest Sensitization
3.11 Generalization of Results
3.12 Summary

17

3. Difficulties That Occur With Test Anxiety Research

19

3.1 EXTERNAL VALIDITY
External validity examines if results obtained from a given experimental investigation apply to situations outside of the initial experimental setting. Essentially, external validity answers the question of whether the results obtained from a study generalize to other situations, subject, or settings. External validity falls into two broad categories: population validity-generalization of the results to other subjects and ecological validity-generalization of the results to similar settings. The difficulties that occur with true experimental designs are often related to sources of threats to external validity. The difficulties with test anxiety that follow are often associated with threats to external validity.
3.2 DIFFICULTIES WITH TEST ANXIETY RESEARCH
There are many difficulties that can occur with test anxiety research; however, only the following will be discussed: Hawthorne Effect, demand characteristics, evaluation apprehension, social desirability, placebo effect, controlling the Hawthorne Effect, reactivity, pretest/posttest sensitization, and generalization of results.
3.3 HAWTHORNE EFFECT
The Hawthorne Effect is particularly problematic for behavioral researchers. It can easily confound the effects of treatments for test anxiety. This effect makes it difficult to partial out from a dependent variable for test anxiety that which is the result of treatment interventions, and that which is the consequence of the Hawthorne Effect. The
Hawthorne Effect explains how a subject's knowledge of participating in an experiment can influence the outcome or results in a study. There are at least four features associated with the Hawthorne Effect, and they are demand characteristics, evaluation apprehension, social desirability, and the placebo effect.
3.4 DEMAND CHARACTERISTICS
Demand characteristics are the subtle cues and nuances that subjects detect from an experiment which may convey the purpose of a study. The mere fact that subjects know that they are participating in an experiment can have a significant influence on their behavior. The demand characteristics account for the fact that subjects can determine the research hypothesis and thereby produce the desired results.

Test Anxiety: Applied Research

3. Difficulties That Occur With Test Anxiety Research

3.5 EVALUATION APPREHENSION
Evaluation apprehension is especially detrimental for test anxiety research. With pre- posttest test anxiety research designs, the initial pretest may be extremely high due to evaluation apprehension or the anxiety of participating in an experiment.

3.9 REACTIVITY
Reactivity is the notion that tests, inventories, rating scales, and even observing subjects' behavior can change the events that a researcher is attempting to measure. Since test anxiety research often involves the administration of tests, reactivity becomes a concern. Whenever possible, nonreactive measures should be employed. These are items that are normally part of a subject's environment. For example, school enrollment records are usually nonreactive or passive measures. Whenever reactive measures are employed in test anxiety research, the researcher can control this confoundment by controlling in the research design for the testing effect. 20

3.6 SOCIAL DESIRABILITY
Social desirability, which is related to the demand characteristic, is the subject's motivation to produce socially acceptable results. This is the motivation the subject has to please the experimenter. For example, if a subject discovers that it is socially desirable to report less test anxiety after a certain treatment intervention. This may well be the outcome.
3.7 PLACEBO EFFECT
Placebo effects are extremely prevalent when blind studies are not employed in test anxiety research, which is often the case with practical research adventures. The placebo effect are changes in a subject's behavior due to expectations of treatment effectiveness. The subject's tendency to believe in the effectiveness of a particular treatment for test anxiety can influence the results on the dependent(s) in an experiment.
In summary, the mere fact that subjects are participating in a study can change measured behavior. It is important to be aware that subjects' beliefs, expectations, perceptions, attitudes, and values can influence behavior in an experiment.
3.8 CONTROLLING THE HAWTHORNE EFFECT
Rosenthal and Rosnow (1984) recommend several strategies to combat the Hawthorne Effect. One strategy is to employ field experiments and quasi-experimental designs that use nonreactive measures. Nonreactive measures will not alert the subject to the fact that he or she is participating in an experiment. Another helpful technique for countering the Hawthorne Effect is not telling subjects the purpose of an experiment until it is over. In addition, it can be stated to subjects that it is important for them not to attempt to figure out the purpose of a given study. Also, a Hawthorne control group, a control group that receives attention, is a good control for this effect. Finally, self-monitoring can serve as an attentional procedure, and when the experimenter interacts with participants, this reduces the Hawthorne effect.

21

3.10 PRETEST AND POSTTEST SENSITIZATION
Pretest posttest sensitization indicates that the pretest and/or posttest affects the results on the dependent variable. The pretest and/or the posttest can sensitize subjects to the effects of a treatment. Pretests, as well as posttests, can facilitate learning by helping subjects determine what effects they should be getting from a given treatment. It is important to remember that pretests and posttests can serve as a learning experience for subjects and thereby influence the results of a treatment.
3.11 GENERALIZATION OF RESULTS
One of the goals of test anxiety research is to generalize the results from a sample to some clearly defined population. Random selection or selecting subjects who are representative of a population facilitates the researcher's ability to generalize results.
Similarly, randomly assigning subjects to groups is another factor that contributes to generalization of results. Finally, a researcher must consider the limitations of his or her results in respects to the experimentally accessible population and the population in which he or she wishes to generalize the results. In essence, since no study can sample the entire universe, every study will inevitably have limited generalizability in some respect.
3.12 SUMMARY
Many difficulties can occur with test anxiety research. Subjectexperimenter artifacts and the lack of external validity can contribute to systematic error in test anxiety research. The mere fact that subjects are participating in an experiment can result in the Hawthorne Effect,

22

Test Anxiety: Applied Research

reactivity, and pretest and posttest sensitization. Due to sampling limitations, external validity or generalization of results are also a necessary restriction of research that a researcher must consider.
Reference
Rosenthal, R., & Rosnow, R. (1984). Essentials of behavioral research.
New York: McGraw-Hill.

4. Common Research Designs

Chapter 4
CONTENTS
4.1
One-Group Designs
4.2
Independent Two-Group Designs
4.3
Related Two-Group Designs
4.4
Multiple Treatment Designs:
Factorial Designs
Solomon Design
4.5
Quasi-Experimental Designs:
Time-Series Designs
Nonequivalent Control Group Designs
Equivalent Time-Samples
4.6
Counterbalanced Designs
4.7
Nested Designs
4.8
Exercises
4.9
Summary

23

4. Common Research Designs

25

4.1 ONE-GROUP DESIGNS
One-group designs involve the observation of a single group of subjects under two or more experimental conditions. Each subject serves as his or her own control by contributing experimental and control group scores (Matheson, Bruce, & Beauchamp, 1978). One-group designs can serve as useful experimental designs when random sampling is employed and the independent variable is manipulated.
The simplest of the one-group designs is the one-shot case study.
Schematically, this design is depicted as: X O. The X denotes a treatment, while the O indicates an observation or dependent variable measure. This is not an experimental design and should not be used under any circumstances because there is not a pretest or a comparison group.
Before-After Designs
One-group before-after designs consists of observing subjects before treatment and after treatment. The data are analyzed by comparing before and after measurements. Schematically, this design is: 0 1 X 0 2 . The 01 is the before treatment observation, while 02 is the after treatment observation and the X indicates a treatment.
Function
This design can be used when one knows that an experimental condition will occur. Essentially, one is able to observe subjects before and after the occurrence of a treatment or experimental condition. The experimental condition can be designed naturally. For example, it could be possible to observe the attitudes of teachers to a new teaching method.
Suppose this new method is one of teaching math. Teachers are measured on a dependent variable before and after receiving the new method of teaching math. The teachers are serving as their own control. In summary, this means additional subjects are not needed for a control group. Advantages
The one-group before-after design is an improvement over one-shot case studies. This design is useful for descriptive or correlational research. // is not helpful in making causal conclusions.

26

Test Anxiety: Applied Research

4. Common Research Designs

Limitations
This design leaves a number of variables uncontrolled. Any outside influences that occur between the two observations may account for observed differences (history effect). If the time between the two observations is more than a few days, the intervening effects of learning and maturational processes can change behavior (maturational effect).
The process of collecting the pretest may alert subjects to the experimental condition and can change behavior (pretest sensitization).
Finally, if subjects are not selected at random, any observed differences may be due to unknown factors.

Figure 4.3 Randomized Pre-Posttest Two-Group Design

Statistical Analysis
The appropriate analysis for this design is the dependent measures ttest or analysis of variance for repeated measures.
4.2 INDEPENDENT TWO-GROUP DESIGNS
Independence—the probability of each subject being assigned to a group—is not affected in any way by the nature of the other members of each group. In the independent two-group design, subjects are equivalent on all variables except the independent one(s). This equivalence is achieved by randomly assigning subjects to groups.
Advantages
1. The observations on the two groups can be made at the same time, so that time-related variables such as aging, maturation, history, and so on are controlled.
2. In a one-group, before-after design, the pretest can affect the posttest observations. In a two-group design, this sequence effect can be eliminated by not using a pretest or is controlled by using the pretest on both groups. Below, in Figure 4.3, is the schematic representation of a randomized pre-post two-group design. The X corresponds to the treatment and the dashed line "-" represents the control procedure.

27

Schematically, this design can be depicted as:

Assignment

Group

Pretest

Treatment

Posttest

R

Experimental

01

X

O2

R

Control

01

-

02

Due to randomization, this is a true experimental design since the experimenter randomly assigns subjects to the two groups. The pretest, or before measure, allows one to test the initial equivalence of the two groups. Advantages
Pretests provide a check for the effectiveness of random assignment.
Limitations
The pretest can sensitize subjects to the treatment. Similarly, it can interact with the treatment and affect the posttests (dependent variables).
Statistical Analysis
Analysis of covariance.
4.3 RELATED TWO-GROUP DESIGNS
Related two-group designs involve the observation of an experimental group and a control group that have been matched on some variables. The matching helps each individual in the experimental group to be identified with his or her counterpart in the control group on some measure. The dependent variables, which in this case would be some measure of test anxiety, such as the Test Anxiety Inventory can be thought of as occurring in pairs, with each matched subject contributing one-half of a pair under each condition. Matching minimizes between group variability or error at the onset of the experiment. Related twogroup designs employ the same statistical analysis as one-group beforeafter designs, the t-test for related or correlated measures. If random assignment can be employed, individual differences can be controlled. In summary, the combination of matching and random assignment of

28

Test Anxiety: Applied Research

4. Common Research Designs

matched pairs to experimental conditions results in a more precise statistical analysis of the effects of treatment interventions for test anxiety than does random assignment alone.

Figure 4.4 Factorial Design

29

B
4.4 MULTIPLE TREATMENT DESIGNS
Multiple treatment designs involve more than two levels of an independent variable or more than one independent variable in a single experiment. In a sense, multiple treatment designs or multilevel designs involves several two-group designs run simultaneously. Suppose subjects suffering from test anxiety were randomly assigned to a study skills counseling group, relaxation therapy group, nondirective counseling group, and a hypnosis group. This is an example of a multiple treatment design in which four treatments are used to treat test anxiety. The statistical analysis for such designs is the one-way ANOVA (analysis of variance) or F test for independent samples.
Factorial Designs
Factorial designs also represent another type of multiple treatment design. Factorial designs employ two or more independent variables simultaneously in one experiment. It is the combination of all levels for two or more independent variables on a dependent variable. The factorial design permits an experimenter to test the interaction effect of two or more independent variables upon a dependent variable. That is, factorial designs determine the effect that one independent variable has on one or more other independent variable(s).
In factorial designs, each independent variable is called a main effect. Schematically, the effects of both levels of B under both levels of A is called an interaction effect.
Similarly, the fact that the effect of one variable depends on all the levels of another also represents interaction. In a diagram form, interaction for a two-way design can be represented as:

Treatments

Bl

B2

Bl

B2

Al

I

II

M

21

33

A2

III

IV

W

29

28

Sex

The As and Bs correspond to independent variables, while the
Roman numerals represent different levels. A factorial design can be used to control for a moderating variable. The design above is sometimes called a 2 X 2 factorial design because there are two rows and two columns. The effect of one independent variable on the dependent variables is not the same for all levels of the other independent variable; this is another example of interaction.
Notice in Figure 4.4 that the best treatment dependents on the moderating variable, sex. Men had higher scores with treatment B2, while women had higher scores with treatment Bl. A 3 X 2 factorial design would have three rows and two columns. Factorial designs can employ more than two independent variables such as three-way (three independent variables) and four-way (four independent variables) factorial designs. In essence, a factorial design allows a researcher to study the effects of two or more factors at once. This design allows a researcher to observe the separate effects of each independent variable and the interaction effects of several independent variables interacting simultaneously. The statistical analysis for these types of designs is the analysis of variance. In summary, multiple treatment designs can become complex by being either independent group designs or related or dependent group designs, with or without some type of factorial combinations. Solomon Design
The Solomon design in its exact form is not common in test anxiety research, but a discussion of it is important for two reasons. First, the
Solomon design is a factorial design; second, it can control for the pretest sensitization threat to external validity.

31

Test Anxiety: Applied Research

4. Common Research Designs

The Solomon design usually occurs as a four-group design or some factor times four. As previously stated, it is used to determine the effects of pretesting on the dependent variable. The Solomon four-group design conceptually is a combination of a two-group pretest-posttest control group design and a two-group posttest only control group design.
The Solomon designs are true experimental designs, since it involves randomization or randomly assigning subjects to groups.
Specifically, the Solomon four-group designs involve two independent variables, a treatment with two levels, and a pretest with two levels. This results in a 2 X 2 design or a two-way analysis of variance (ANOVA) design. Schematically, this design can be depicted as:

The Solomon four-group design is analyzed like any 2 X 2 factorial design by means of a two-way analysis of variance (ANOVA) on the four groups' posttest scores. Note that pretest scores are not part of the statistical analysis.
Since Solomon designs employ randomization, they control for all threats to internal validity. In addition, after performing a factorial
ANOVA on the posttest, if there is a significant interaction between the pretest and the treatment, this suggests pretest sensitization. In other words, treatment effectiveness varies as a function of pretesting. When one finds a significant pretest by treatment interaction, an examination of the simple main effects can be obtained by comparing the posttest scores of 02 versus 04 and 05 versus 06. If there is a significant difference between posttest scores in 02 versus 04 and not a significant statistical difference between posttest scores in 05 versus 06, this indicates that pretesting affected the treatment groups but not the control groups. In conclusion, if there is pretest sensitization, one cannot generalize findings to nonpretested subjects, thus limiting external validity. Finally, the major limitation of this design is the large number of subjects needed in order to randomly form four separate groups.

30

Group 1
Group 2

R

Group 3

R

Group 4

o, o3 R

o2

R

X

04
X

05

o6

R = random assignment
X = a treatment
0, and 03 = pretests
02 04 0s 06 = posttests
In the schematic presentation of this design, the first two groups correspond to a pretest-posttest control group design; in contrast, the last two groups are equivalent to a posttest-only control group design.
Similarly to the first example of a factorial design, the Solomon fourgroup design can be diagramed as:

4.5 QUASI-EXPERIMENTAL DESIGNS
Cook and Campbell (1979) discuss 14 variations of quasiexperimental designs. Three designs will be discussed in this text: time series, nonequivalent control group, and equivalent time-samples.
Quasi-experimental designs have some of the features of "true" experimental designs, such as control groups (and multiple testing). In essence, they do not control all sources of threat to internal validity. In the educational world, where it is often impossible to have complete experimental control, quasi-experimental designs become extremely useful. In summary, when an experimenter cannot randomly select subjects, nor randomly assign them to experimental situations, considerations should be given to quasi-experimental designs.

Treatment
Yes
Yes
Pretest

No

02

04

No

05

o6

Time-Series Designs
This is an extended version of the one-group before-after design.
Multiple observations before and after treatment are compared.
Schematically, this design is: Ol 02 03 X O4 05 O6. The time-series design is used to measure changes in subjects' behavior under at least 3 observations at fixed time intervals. Once a baseline or trend has been

33

Test Anxiety: Applied Research

.4. Common Research Designs

established, the treatment is introduced and observations are continued.
Any changes in observations after treatment is attributed to the independent variable. At least two things can happen with observations.
First, the before measures or pretests may consistently change. Second, the before measures and after measures may maintain a consistent trend.
One method of strengthening the basic interrupted time series design is to add a second interrupted time series design with nonequivalent dependent variables. This is similar to a nonequivalent control group design (Figure 4.7) and it reduces the historical threat to internal validity. The interrupted times series design with nonequivalent dependent variables is exactly the same as the simple interrupted times series design; however, as opposed to one set of dependent variables, with the interrupted times series design with nonequivalent dependent variables, there are two sets of dependent variables. Schematically, this design is:

Limitations
The major limitation of these designs is the lack of control for history effects. Even though the interrupted times series design with nonequivalent dependent variables and the interrupted times series with a nonequivalent no-treatment control group minimize all threats to internal validity, unlike experimental designs these threats are not totally eliminated. Another difficulty with time-series designs is the difficulty that sometimes occurs in determining which scores to analyze. When there is a changing trend in scores, Cook and Campbell (1979) suggested comparing the before and after trend at the point where the treatment occurred. Graphical information on discontinuity may help in determining which scores to analyze. If data consist of a constant baseline and a different posttest level of performance, averaging scores may be adequate for data analysis.

32

Week 2
Pretest

Week 4
Pretest

Week 6
Treatment

Week 8
Posttest

Week 10
Posttest

Figure 4.6 Interrupted Times Series Design with
Nonequivalent Dependent Variables
OAi 0 ^ OA3 X OA4 OA5 OA6
OBi OB2 OB3 X OB4 OB5 OB6
OA represent the first set of dependent variables and OB represent the second set of dependent variables.
Advantages
Fewer subjects are required, since subjects are serving as their own control. Also, multiple observations reduces the chance of erroneous observations which provides measurement of maturational and learning effects. The maturational effects can be controlled statistically by measuring the departure from the established trend set by the pretests before treatment with those created by the posttest measures.

Statistical Analysis
The average before and after scores can be compared. Essentially, data can be analyzed with a slope analysis technique. A straight line is fitted to the average before and average after measurements. The slopes of the lines are tested to determine if they are reliably different from each other. The SPSSX User's Guide (3rd ed.) has a section on the BoxJenkins procedure. It can be used to fit and forecast time series data by means of a general class of statistical models (SPSSX User's Guide [3rd ed.], 1988, pp. 385-395). In summary, school settings where certain behavior occurs periodically are excellent settings for time-series designs.
Finally, Cook and Campbell (1979) give more complete descriptions, applications, and analyses of time-series designs.
Figure 4.7 Nonequivalent Control Group Design
Schematically, this design can be depicted as:

01 is a pretest for the treatment group. The dashed lines indicated that subjects were not randomly assigned to groups, thus this is an intact group

34

35

Test Anxiety: Applied Research

4. Common Research Designs

design. O2 is the posttest or dependent variable for the treatment group.
03 is .the pretest for the control group, while 04 is the posttest or dependent variable for the control group.
This is similar to the pretest-posttest independent two-group experimental design except for the lack of random assignment of subjects to groups. Thus, the nonequivalent control group design is not as good as the pretest-posttest independent two-group experimental design, but it is extremely superior to the one-group before-after design. Unless the assumptions of analysis of covariance are met, the correct statistical analysis is a repeated measures analysis and not a gain score analysis.
Statistically, gain scores (pretests minus posttests) analyses are less precise than repeated measures analyses.

Advantages
Similarly, to the time series design, the equivalent time-samples design uses fewer subjects, since subjects serve as their own control. This design controls for threats to internal validity, including historical bias.

Figure 4.8 Equivalent Time-Samples Designs
The time series design can be modified into an equivalent timesamples design. This design does control for the history bias. This design looks like this:
01 XO 02 XI O3 XO 04 XI 05 XO 06 XI 07
XI is a treatment, while XO is some control experience that is available in the absence of the treatment. When a single group of subjects are available for study, an equivalent time-samples design can be employed to control for the history threat of internal validity; however, this design lacks external validity. This suggests that the treatment can be different when it is continuous as opposed to being dispersed, which makes the results sample specific. Moreover, subjects often adapt to the repeated presentation of a treatment. This makes it difficult to make conclusions concerning the continuous effect of a given treatment. The foregoing discussion underscores the limited external validity of the equivalent timesamples design.
Function
Like the one-group time series design, the equivalent time-samples design is used with a single group of subjects. This design, like the time series design, is a repeated measures one-group design.

Limitations
The major weakness of the equivalent time-samples design is in the area of external validity. This is especially problematic when the effect of the independent variable is different when continuous than when dispersed, which makes it difficult to generalize to other independent samples. Similarly, often with equivalent time-samples design, subjects tend to adapt to the independent variables; thus, lessening external validity of this design. In summary, the major weakness of the equivalent time-samples design is its lack of external validity.
Statistical Analysis
The equivalent times-samples design can be analyzed by a repeated measures analysis of variance, if pretests or covariates are used, as was the case with the present example. Winer (1971, pp. 796-809) recommends combining analysis of covariance with repeated measures analysis of variance. Essentially, a two-factor analysis of covariance with repeated measures can be constructed to analyze equivalent time-samples designs.
4.6 COUNTERBALANCED DESIGNS
An experimenter may consider incorporating counterbalancing techniques with one-group designs or time series designs.
In
counterbalanced designs subjects are given treatment one then observation one is made. Next, treatment two is given followed by observation two.
This procedure is repeated with treatment two given again followed by an observation, which is the third one. The procedure is repeated with treatment one. That is, treatment one is given again followed by observation four. It should be noted that two observations are obtained under treatment one and two. Schematically, this design is:

36

Test Anxiety: Applied Research

4. Common Research Designs

Figure 4.9 Counterbalanced Design

adjacent trials. Finally, the last difficulty to consider with counterbalanced designs is the possibility of multiple treatment interactions.

Treatments XI X2 X2 XI
After01 02 O3 O4
Observations
For two treatments, the sequence of treatment is ABBA. Three treatments would yield the following sequence: ABCCBA.
The ABBA sequence is often referred to as a 1221 sequence. The Is and 2s refer to the treatments. Counterbalanced means balanced sequences or orders of treatments. For example, with the 1221 sequence, the sequence of treatments is 1,2 in that order followed by 2,1. The counterbalanced design can be referred to as a posttest-posttest-posttestposttest design, or an after-after-after-after design.
Function
The counterbalanced design is used to control the order effect when employing several treatments. More thorough counterbalancing can be achieved with the ABBA sequence by having half of the subjects serve under the 1221 sequence while the other half experience 2112. The counterbalanced design controls for any peculiar order effect.
Advantages
Counterbalanced designs, using one group of subjects, require fewer subjects than two-group experiments. Like other one-group designs, subjects serve as their own control. Time related variables such as maturation, learning, outside events, frustration, and fatigue are controlled by the data collection sequence. Finally, the counterbalanced design requires fewer observations than the time-series design.
Limitations
Sufficient time must be allowed between observations. If sufficient time is not allowed, there will be carry-over effects from the previous trials. Another difficulty with counterbalanced designs is the assumption of linearity among all time related variables. That is, one is assuming the effect of change from trial one to trial two is the same as between all other

37

Statistical Analyses
Matheson, Bruce, and Beauchamp (1978) recommended the analysis of variance procedure for the counterbalanced design. Essentially, the average performance under treatment one is compared with the average performance of treatment two to determine the differential effect of the two treatments.
4.7 NESTED DESIGNS
Factorial designs involve the complete crossing of all levels of a given factor with each level of the other factor. A factor is completely nested in another factor if each level of one factor occurs at only one level of another factor (Honeck, Kibler, & Sugar, 1983).
For example, a 4 X 2 factorial design would be depicted as follows:

Bl

B2

Al
A2
A3
A4
Suppose two treatments for reducing test anxiety (Bl and B2) combined with four classrooms (Al, A2, A3, A4) produced the following design:
Bl
Al

Al Bl

A2

B2

A2 Bl

A3

A3 B2

A4

A4 B2

Test Anxiety: Applied Research

38

It is apparent that all levels of one factor do not occur under all levels of the other factor; hence, this is a nested design.
Factor A has four levels (A 1, A2, A3, and A4), and Factor B has two levels (Bl and B2). Al and A2 occur at Bl, forming the combinations
A1B1 and A2B1, and A3 and A4 occur at B2, forming the combinations
A3B2 and A4B2. It can be said that factor A is nested completely within factor B. Nested designs are also called incomplete or asymmetrical designs because every cell does not contain scores or data. In addition,
Bryk and Raudenbush (1992) refer to these designs as hierarchical linear models, multilevel linear models, mixed-effects models, random-effects models, random-coefficient regression models, and covariance component models. Finally, A(B) is used to denote that Factor A is nested with
Factor B.
Another example of a nested design is the evaluation of two treatments for test anxiety with six schools. Suppose schools 1, 2, and 3 are confined to Treatment 1, and schools 4, 5, and 6 are restricted to
Treatment 2. Again, when effects are confined or limited to a single level of a factor, nesting has occurred. This design can be depicted schematically as follows:
Treatment 1

Treatment 2

School 1
School 2
School 3
School 4
School 5
School 6
In addition, this design can also be represented as follows:

39

4. Common Research Designs

Treatment 1

Treatment 2

School 1

School 2

School 3

School 4

School 5

School 6

n

n

n

n

n

n

Finally, a common nesting situation that occurs with educational research is students nested within classes, and classes nested within schools.
Advantages
Nested designs provide a smaller and more appropriate standard error than between group designs (i.e., one-way designs). This increases statistical power; however, there are some disadvantages or limitations of these designs.
Limitations
One major limitation of these designs is that they do not permit a researcher to test interactional effects. Moreover, it is not uncommon, within educational settings, for researchers to ignore class and school effects and to just analyze the data as a one-way or between-subjects design. Careful thought has to go into interpreting the results of nested designs, and a researcher must determine the appropriate error terms to use. Finally, the analysis and interpretation of unbalanced designs, an unequal number of each treatment combination, is complex.
Statistical Analysis
Lindman (1991) provides the computational formulas for analysis of variance for nested designs. Moreover, Wang (1997) describes how to analyze nested designs using the mixed procedure of Statistical Analysis
System (SAS). Wang uses a Hierarchical Linear Model (HLM) approach in contrast to analysis of variance for nested designs. Bryk and
Raudenbush (1992) developed a statistical program called Hierarchical
Linear Models (HLM) to analyze complex nested or hierarchical designs.
Finally, whenever variables such as city, county, state, and so on are located within other variables, the variables are said to be nested and the appropriate statistical analysis is analysis of variance for nested designs or HLM.

40

1.

2.

1.

Test Anxiety: Applied Research

4.8 EXERCISES
Suppose we were interested in how males and females responded to relaxation therapy and cognitive-behavioral hypnosis in reducing test anxiety. Describe and schematically sketch this design.
A researcher investigated the effects of hypnosis in reducing test anxiety and improving achievement with introductory psychology students. Participants were pretested simultaneously within the hypnosis and control groups. After the pretesting, the hypnosis group received hypnosis, and the control group served as a comparison group. Describe this design and schematically sketch it out.
Answers to Exercises
This is a 2 X 2 factorial design. Gender has two levels—males and females—and the treatment variable has two levels—relaxation therapy and cognitive-behavioral hypnosis. Schematically, this design would be depicted as:
1
Gender

Treatments
2

Males
Females

2.

This is a quasi-experimental design called a nonequivalent control group design, and it can be depicted as the following:

o, o, X
-

o2
0,

pretests o, - posttestson test anxiety on test anxiety

o2

X - hypnosis treatment
- - control group

4.9 SUMMARY
This chapter covered common research designs that are employed in test anxiety research. Many of these designs can be viewed as an extension of the one-group before-after design. This design involves measuring subjects before and after treatment. When randomization is impossible and a control group is available, the one-group before-after design can be improved by adding a control group that also receives a pretest and posttest simultaneously with the treatment group. Such a

4. Common Research Designs

41

design is called a nonequivalent control group design and falls within the category of quasi-experimental designs.
If randomization can be added to the nonequivalent control group design, this improves internal validity and results in a randomized pretest posttest two-group design. The construction of useful research designs involves ingenuity and extensive thought. Finally, once one understands the methodology underlying applied research designs, it is possible to construct designs that answer important questions in the area of test anxiety. References
Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models.
Newbury Park, CA: Sage Publications
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation; Design and analysis issues for field settings. Chicago: Rand McNally.
Honeck, R. P., Kibler, C. T., & Sugar, J. (1983). Experimental design and analysis. Lanham, MD: University Press of America.
Lindman, H. R. (1991). Analysis of variance in experimental design.
New York: Springer-Verlag.
Matheson, D. W., Bruce, R. L., & Beauchamp, K. L. (1978).
Experimental psychology: Research design and analysis (3rd ed.).
New York: Holt, Rinehart and Winston.
Wang, J. (1997). Using SAS PROC mixed to demystify the hierarchical linear model. The Journal of Experimental Education, 66{ 1), 84-94.
Winer, B. J. (1971). Statistical principles in experimental design. New
York: McGraw-Hill.

5. Measures of Central Tendency and Measures of Variability 43

Chapter 5
CONTENTS
Measures of Central Tendency
5.1
Averages:
Mean
Mode
Median
5.2
Characteristics of the Mean
5.3
When to Use the Mode
5.4 When to Use the Median
5.5
Skewed Distributions
5.6
When to Use the Mean
5.7
Measures of Variability:
Standard Deviation
Variance
5.8
Computer Examples for Measures of Central Tendency and
Measures of Variability
5.9
SPSSX Release 4.0
5.10 Applications of the Mean and Standard Deviation to the Normal
Curve
5.11 Moments: Measures of Skewness and Kurtosis
5.12 Summary
5.13 Exercises

5. Measures of Central Tendency and Measures of Variability

45

5.1 AVERAGES
There are three commonly used averages in univariate statistical methodology. These measures of central tendency are the mean, mode, and median. First, the mode and median will be discussed, since they are less often used in statistics than the mean. In addition, they lack the necessary properties that are needed for advanced statistics. The mode is the most occurring score in a frequency distribution. It is the score with the greatest frequency. The mode is not a very stable measure of central tendency. For example, a distribution of scores can be bimodal, trimodal, or multimodal. Let us assume the following distribution of scores existed.
Figure 5.1 Frequency Distribution

X = some score

F-frequency of a score

3

10

2

9

1

8

The highest frequency in the above distribution is "10," indicating that the mode is 3. By definition, the median is the middle value in a distribution of scores ordered from lowest to highest or from highest to lowest. With the median, half of the scores fall above it and half fall below it. Let us take another example, with the following distribution of scores. X
10
9
8
7*
6
5
4
The median for the above distribution is 7, since three scores fall above "7" and three scores fall below "7." The median is sometimes called the fiftieth percentile. The mean is the most used measure of

46

5. Measures of Central Tendency and Measures of Variability 47

Test Anxiety: Applied Research

central tendency. It is the summation or addition of a group of scores, divided by the total number of scores. The following is the formula for the sample mean:

VY
Mean (X) = —

If the
E(X-X) = 0, we can employ the distributive property and get
E(X-X) = EX-NX.

Where: X is some score in a distribution.
SX is the summation or addition of every score in a distribution.
N is the number of scores.

By definition, the summation of a constant is N times the constant. So
E(X-X) = NX - NX = 0.

5.2 CHARACTERISTICS OF THE MEAN
The mean has six very important characteristics. First, changing a score in a distribution will change the mean. Second, adding or subtracting a constant from each score in distribution will have the same effect on the mean. If a constant value is added to every score in a distribution, the same will be added to the mean. Similarly, subtracting a constant value from every score in a distribution will result in the constant being subtracted from the mean.
Third, multiplying or dividing each score in a distribution by a constant will result in the mean being changed in the same way. Fourth, the mean is a balance point. It is known that by definition the mean defined as "X bar" is:
X =

SX
N

If one were to cross multiple, the result would be
E X = NX.
Suppose we had the following simple distribution of scores:
X X-X
3 3-2
2 2-2
1 \-2

E(X-X) = 0

The summation of
X = NX, due the previous mentioned cross multiplication. The summation of a a constant (UK) can be shown with the distribution below by adding a constant of "2" to each score in the distribution.
X
3
2
1

K
+2
+2
+2
SK=NK, or 3x2=6

The above example arithmetically shows how the mean is a balance point in a distribution, since scores are deviated from a mean in distribution and the sum of the deviation scores equals zero. Fifth, the mean is that point in a distribution about which the sums of squares deviations is at a minimum. When the sum of the deviation scores are calculated using the mean, the sum of the squares of these values is smaller than if it had occurred with any other point or score. This is demonstrated using simple algebra and summation notions below.
Demonstration that the Sum of the
Deviation Scores Are at a Minimum
Arithmetically, S(X-P)2 = a minimum, when
P = X.

48

Test Anxiety: Applied Research

Let Z be some point about which the deviations are to be taken, it differs from the mean by, a: so Z = X+a.
Then,
E(X-Z) 2 = £[X-(X+a)] 2
= E[(X-X)-a] 2
= E[(X-X) 2 - 2a(X-X)+a 2 ]
= E[X-X) 2 - 2aE_(X-X)+na2]
Apparently, E(X-X) = 0, so
E(X-Z) 2 = E(X-X) 2 +na 2
Therefore, observation of the right side of the above equation shows that
S(X-Z)2
is smallest when a=0. From the definition of a, if a=0, Z=X.
It is apparent that the sum of square is at a minimum value when deviations are taken from the mean.
Sixth, the sample mean has some important sampling properties. The random sample mean of some distribution is the best linear, unbiased estimate of the population mean for that distribution. This property of the sample mean allows one to make generalizations from a sample to some population of interest.
5.3 WHEN TO USE THE MODE
The mode can be used with nominal scaled data. Nominal scales of measurement are just the naming or classification of events. For example, if one observed the number of students entering or leaving the rest rooms on campus, this would correspond to a nominal scale of measurement.
The number of men and women represent discreet categories used for discriminating two separate classes of gender. In addition to gender, psychiatric classification, and the number of football players or even basketball players are other examples of nominal data. Nominal scales of

5. Measures of Central Tendency and Measures of Variability

49

measurement are the weakest level of measurement. In essence, this scale is the naming or assigning of numbers to classify behavioral categories.
5.4 WHEN TO USE THE MEDIAN
The median is used on ordinal level scales or observations rank ordered from least to most on some attribute. Individuals rated on beauty during beauty contests, body building contests, or the order of finish for stock racing drivers are examples of ordinal scaled data. One can possibly infer that ordinal scales are also nominal; however, these ordinal scales do not tell one the distance apart for units of measurement. The median is also a good measure for skewed distributions.
5.5 SKEWED DISTRIBUTIONS
A distribution can be positively or negatively skewed. On a positively skewed distribution, the tail of the distribution goes towards the right and the mean is greater than the median. With a negatively skewed distribution, the tail of the distribution goes towards the left and the mean is less than the median. Since the mean is affected by extreme scores, this makes the median a more appropriate measure for skewed distributions.

50

Test Anxiety: Applied Research

5. Measures of Central Tendency and Measures of Variability

Figure 5.2 Negatively Skewed Distribution

Figure 5.3 Positively Skewed Distribution

Skewed positively
(to the right)

Skewed negatively,
(to the left)

Mo

Mdn

\x

XMdn Mo

-skew

skew

'25

^50

c75

51

52

Test Anxiety: Applied Research

5. Measures of Central Tendency and Measures of Variability

5.6 WHEN TO USE THE MEAN
The mean is used with interval or ratio scales. Interval scales of measurement includes nominal and ordinal information along with an arbitrary zero point. Unlike ordinal scales, with interval scales, the distance between units can be measured. Essentially, interval scales permit one to measure the distance values are apart. Ratio scaled data includes the properties of nominal, ordinal, and interval scales. In addition, it has an absolute zero point. Ratio scales pertain mostly to physical measurements such as inches, centimeters, pounds, miles per hour and so on. If one were doing research on learning errors or the number of correct scores on a learning task, a perfect score or zero errors would be the nonarbitrarily real zero point. This is another example of a ratio scale of measurement.
5.7 MEASURES OF VARIABILITY:
STANDARD DEVIATION AND VARIANCE
Measures of variability determine how much scores vary or disperse from the mean. Essentially, they measure how far scores spread out from the mean. There are measures of variability that do not fit this property of spread-outedness; however, these measures will not be discussed since they seldom occur in test anxiety research. The standard deviation is the most widely used measure of variability. By definition, the variance is the square root of the sum of squares divided by the degrees of freedom, and the standard deviation denoted by S is the square root of the variance.
The following relationships hold for S and S2.
_ E(X-X)
N-l

2

variance. The square root of this sample variance or moment is the standard deviation. Let us take a simple example of 3 scores that were used for calculating the mean.
X
3
2
1

E(x-x)
I
0
-1

I(X-x)2
1
0
1

_ y/E(X-X)2 _
S =
N-l
By substitution the standard deviation is / 2 / 2 = the square root of" 1" or 1. Similarly, the variance is 1 squared or 1. Like the mean, the standard deviation has some interesting properties. I will mention two properties of the standard deviation. First, adding or subtracting a constant to each score in a distribution will not change the standard deviation. Second, multiplying or dividing each score in a distribution by a constant results in the standard deviation being multiplied or divided by the same constant.
Traditionally, many textbooks present the sum of squares,
E(X-X) 2

_

Sum of Squares
Degrees of Freedom

c

_

j -

Similarly, the standard deviation is the square root of the second moment (m2). For example, the first and second moments are:

= 0

N

53

_ (X-X) 2
N

If we make an adjustment on the population variance by replacing N with
N-l, or the degrees of freedom, this formula becomes the sample

using computational formulas. With summation operations, we will define a computational formula for the sums of squares.

54

Test Anxiety: Applied Research

Algebraic
Steps' Expression

5. Measures of Central Tendency and Measures of Variability

Reason

1.

E(X-X)2

Definition of sum of squares

2.

E(X2-2XX+X)2

Expansion of a polynomial

3.

EX 2 -2EXX +EX

Distribution of a summation sign

4.

EX 2 -2NX 2 +EX 2

Substitution, since
Thus,

EX=NX.

2EXX=2(NX)X=2NX2
5.

EX 2 -2NX +NX'

5.8 COMPUTER EXAMPLES FOR MEASURES OF
CENTRAL TENDENCY AND MEASURES OF VARIABILITY
The SPSSX statistical package will be used to illustrate an analysis of some actual test anxiety scores in which we would like to obtain measures of variability and measures of central tendency. The following are data from a treatment and control group measured on the Test Attitude
Inventory (TAI). The TAI has a mean of 50 and a standard deviation of
10, which corresponds to what statisticians call t-scores.
1
50
2
65
1
51
2
56
1
45
2
32
1
53
2
50
1
55
2
51
1
47
2
53
1
46
2
50
1
61
2
47

Effects of summation over a constant

55
1

6.

EX 2 -

N(EX)2
N

7.

EX 2 -

Definition of a mean

2

(EX)2
N

Combination of terms

Finally, the sum of squares can be expressed as a definitional formula, E(X-X)2 or as a computational formula,

EX

2

(Exi 2
- -


N
The difficulty with computational formulas, and there are a variety of them, is they do not define or explain the operations one needs to perform; therefore, we will only emphasize definitional formulas.

55

53

2
2

47
44

The "Is" correspond to TAI scores of group one (treatment group), while the "2s" are for group two (control group).
SPSSX Computer Example
The following are the control lines for finding measures of central tendency and variability for the actual data listed above.
Example 1
Title "Measures of Central Tendency and Measures of Variability"
Data List/GPID 1 TAI 3-4
Begin Data
50
51
45
53
55
47
46
61
55
53
End data

56

Test Anxiety: Applied Research

Frequencies Variables=TAI/
Statistics=All
Example 2
Title "Measures of Central Tendency and Measures of Variability"
Data List/GPID 1 TAI 3-4
Begin Data
2 65
2 56
2 32
2 50
2 51
2 53
2 50
2 47
2 47
2 44
End data
Frequencies Variables=TAI/
Statistics=All
5.9 SPSSX RELEASE 4.0
SPSSX is a relatively easy statistical software package to use. It can run in an interactive or batch mode; however, the options and statistics commands only work in the batch mode. The SPSSX commands provided in this text are designed for the batch mode operation. If one is using SPSSX interactively, any control lines in this text employing the options or statistics commands must be replaced with new subcommands and key words which are found in SPSSX User's Guide [3rd ed.] (1988, pp. 1027-1044). SPSSX's interactive mode allows one to execute each command immediately; whereas with the batch mode one submits a file of SPSSX commands for execution. The batch mode is the preferred method for using SPSSX. It allows one to perform the same analysis repeatedly, and it is less tedious and error prone when one is performing detailed analyses. Batch processing also allows a file to be saved, retrieved, and edited. After running the numerous computer examples provided in this text, the reader should purchase the SPSSX manual and be able to follow it without too much difficulty. SPSSX consists of commands and subcommands. All commands must start in column one,

5. Measures of Central Tendency and Measures of Variability 57 while all subcommands must be indented at least one space. There are four commands that are common to all SPSSX computer runs. These are the title, data list, begin data, and end data commands.
The Title command specifies the text for the first line of each page of SPSSX display. The Title can be up to 60 characters long. The second command used in this SPSSX control language is the Data List command, placed before the variable definitions which in the previous examples were GPID-group identification and TAI-Test Attitude
Inventory. The " 1 " after GPID indicated that groups were identified in column one. The 3-4 after TAI indicated that data for this variable occurred in columns three to four. It should be noted that a "List" command can be inserted before the Begin Data command in order to get a listing of the data. This is a good idea in terms of checking for data entry errors.
The Begin Data command informs SPSSX that lines of data will follow, while the End Data command tell SPSSX that it has read all the data. The frequency command works through subcommands. As previously stated, all SPSSX subcommands must be indented one space.
Variables is a subcommand of the frequency command. With the two previous examples, it named the variables that were analyzed, which were TAI. Names of variables can only be eight characters or less. Other examples of variables are Yl, y2, xl, x, and y. It should be clear to the reader that in order to use SPSSX one must be familiar with one's computer software or hardware system. It should also be noted that on a personal computer that some minor modification of the control lines may be necessary to complete a run, such as adding a period to each control line. Many of the control lines in this text will run on SPSS/PC+ if a period is added at the end of each line; however, other control lines may need additional modification according to the SPSS/PC+ manual.

Test Anxiety: Applied Research

58

Figure 5.4 Selected Output Example 1

5. Measures of Central Tendency and Measures of Variability

The variance =

S u m of S

N-l

59

q u a r e s = ™± - 23.822.
9

Selected Printout From SPSSX Runs
Page 1 Measures of Central Tendency and Variability for
Treatment Group
TAI
Value Label
Mean
51.600

Median
52.000

Std dev
4.881

Mode
53.000

•Multiple modes exist. The smallest value is shown.

Variance
23.822

Figure 5.5 Selected Output Example 2

Page 1 Measures of Central Tendency and Variability for Control
Group
TAI
Value Label
Mean
49.500

Median
50.000

Std dev
8.475

Mode
47.000

•Multiple modes exist. The smallest value is shown.

Variance
71.8333

It was demonstrated earlier that the mean and standard deviation can be calculated using simple arithmetic. The mean =
— = - ^ - = 51.600 for the treatment group.
N
10

If we take the square root of the variance, the standard deviation is 4.881.
It will be left to the reader to do the same arithmetic for the control group data. Statistical Analysis System (SAS)
SAS consists of statements or instructions that must end in a semicolon. As we saw with SPSSX, runs consist of commands-data list, begin data, and end data. The SAS codes consist of three statements: (1) statements for setting up data, (2) statements indicating where the data are located (input statement), and (3) procedure statements (PROC) that tell the computer which statistical analysis to perform.
To reiterate, all SAS statements have to end in a semicolon. This is the most common error. SAS codes must start with a data statement that tells the computer about your data and where variables are located. A cards statement is placed before data are entered. The PROC or procedure statement tells SAS which statistical analysis to perform.
Finally, variable names must start with a letter, cannot exceed 8 characters, and cannot contain blanks or special characters such as commas or semicolons.

Test Anxiety: Applied Research

60

SAS Control Lines for Example 1
Data Example 1;
Input gpid 1 TAI 3-4 cards; 1 50
1 51
1 45
1

53

1 55
1 47
1 46
1 61
1 55
1 53
Proc print;
Title "Measures of Central Tendency and Measure of Variability";
Proc univariate;
Var TAI;

5. Measures of Central Tendency and Measures of Variability 61
SAS Control Lines for Example 2
Data Example2;
Input gpid 1 TAI 3-4; cards; 2 65
2 56
2 32
2 50
2 51
2 53
2 50
2 47
2 47
2 44
Proc print;
Title "Measures of Central Tendency and Measure of Variability";
Proc univariate;
Var TAI;

62

Test Anxiety: Applied Research

SAS Generated Frequency Distribution and Bar Graph for Example 1
5. Measures of Central Tendency and Measures of Variability

Data Example 1;
Input gpid 1 tai 3-4; cards; 1 50
51
45
53
55
47
46
61
55
1 53
Proc print;
Proc Freq;
Tables tai;
Vbar tai;

63

5.10 APPLICATIONS OF T H E MEAN AND
STANDARD DEVIATION TO THE NORMAL CURVE

Per cent ol c a » s und«r portions of the normal curve

Standard
Deviations

=n=
_4o

Cumulative Percentages
Rounded

34.13%

13.59%

0.13%

-3c

97.7%
98%

99.9%

+2.0

84.1%
84%

50.0*
50%

15.9%
16%

2.3%
2%

0.! %

0.13%

+2o

-lo

+3.0

t

Percentile
Equivalents

20 30 40 50 60 70 80
Q,
M,
Q,

Typical Standard Scores

T-scores

13.59*

34.13%

-4.0
1

-3.0

*-1.0

l

20

30

200

300

400

500

60

80

80

100

60

GRE scores

700
140

600

800
160

AGCT scores

Stanines

1

2

4%

Per cent In sianme

7%

3
12

4 1 5
17%

20%

1

Wechsler Scales

i

6

7

8

9

17%

12%

7%

4*

1
I

I

1

t

i

t

t

t

t

i

55

Deviation IOs

4
70

85

100

115

130

145
I

t

1

SUntord-Sinet tfte 1

7

10

1.1

I

1

1

68

100

116

16

132

19

64

Test Anxiety: Applied Research

Unlike other measures of central tendency and variability, the mean and standard deviation can be applied to the normal curve. The normal curve is a theoretical probability distribution. If we take the previous computer exercise for example 1, and look at the TAI scores of subjects five and nine which is 55, it is clear that these subjects' scores are 1/2 standard deviation above the mean, since the TAI has a mean of 50 and a standard deviation of 10.
As one can see from the normal curve, 34% of all cases fall within 1 standard deviation and the mean. Furthermore, 34% + 14% or 48% of all cases fall within 2 standard deviations above the mean. Finally, as is apparent, 50% of all cases fall within 3 standard deviations above the mean. In summary, on a normal curve the mean, median, and mode are located at the same position or center of the normal curve, hence measures of central tendency.
5.11 MOMENTS: MEASURES OF
SKEWNESS AND KURTOSIS
A moment is the sum of the deviation scores raised to some power.
We mentioned earlier that the standard deviation is the square root of the second moment. The two printouts from the SPSSX computer runs provide two pieces of information that relate to the normal curve and moments. Similar information can be found from the SAS runs. First the printouts provide a measure of skewness, which is measured by the third moment. By definition the third moment is:
E(X-X) 3
N
The value on the SPSSX printout has a positive value for skewness, indicating a positively skewed curve; however, the value is not large therefore there is not a large amount of skewness.
Kurtosis
The second statistic that is related to moments reported on the SPSSX printout is kurtosis, which is measured by the fourth moment defined by the following formula:
E(X-X) 4
N

5. Measures of Central Tendency and Measures of Variability

65

Kurtosis measures how much peakedness that exist within a distribution. Below are indications of leptokurtic, mesokurtic, and platykurtic distributions. The closer the values of skewness and kurtosis are to zero the less skewness and kurtosis.

5. Measures of Central Tendency and Measures of Variability

66

Test Anxiety: Applied Research
Figure 5.7 Graphs of Kurtosis

A=Leptokurtic
B=Mesokurtic
C=Platykurtic

67

Specifically, if the measure for skewness is positive, the distribution is positive; however, if it is negative, the distribution is negatively skewed. Similarly, a zero measure of skewness indicates a lack of skewness, or a normal distribution. In terms of kurtosis, if the measure for kurtosis is zero the shape of the distribution is mesokurtic. When the measure of kurtosis is negative, the distribution is platykurtic and when the measure is positive, the distribution is leptokurtic. For those who are interested, the precise formulas for skewness and kurtosis are the following: m,

Skewness =

m,

Kurtosis =
m.
m2 = the second moment m3 = the third moment m4 = the fourth moment
Skewness is the third moment divided by the second moment times the square root of the second moment. Kurtosis is the fourth moment divided by the second moment squared minus three.
To summarize, the first moment equals 0, the second moment is the population variance, the third moment is used to measure skewness, and the fourth moment measures kurtosis.
5.12 SUMMARY
Measures of central tendency and measures of variability represent the foundations of statistical reasoning, because most applications for advanced statistical methodology involve means and variances. In this chapter, it was demonstrated that skewness and kurtosis are also measures of variability, corresponding to approximately the third and fourth moments, respectively. Finally, it was demonstrated how SPSSX and
SAS can be easily employed to find measures of central tendency and variability and to estimate deviations from an ideal normal curve.

68

1.

Test Anxiety: Applied Research

5. Measures of Central Tendency and Measures of Variability

5.13 EXERCISES
The following is the frequency distribution for the previous exercise.
Run the data using the previously given control lines. What is the value for skewness and kurtosis?

55
60

1
1

1
(skewness=.418, kurtosis=.052).
Value
frequency
1
45
1
46
1
47
1
50
1
51
2
53
2
55
1
61
2.
3.
4.

3
3

2

47

2

44

2
2

56
32
32

3
3
3
3
3
3
3

2

61
67
55
67
58
58
58
58
57

(Overall mean=56.05 and standard deviation=l 1.95)

With the data of exercise one, what is the minimum and maximum value? (minimum=45 and maximum=61).
What would be your best estimate of kurtosis for exercise 1?
(mesokurtic, since the value is very close to 0).
The following statistics are the results of the TAI for three groups of subjects. Compute the mean and standard deviation for the three groups combined.
Figure 5.8 Calculations of Grand Mean and
Averaged Standard Deviation

mean

50
47

78
80

1

2
2

57

1

69

standard deviation N

Group 1

66

12.02

5

Group 2

44

9.00

59.89

4.31

9

EX
N
Therefore, the EX=X(N).
This is the weighted mean formula. To find the grand mean the formula is = EX(N)
Where N,=the total sample size. Only when common group sizes exist can the mean be averaged without using the weighted formula.
Standard deviations cannot be averaged using the weighted formula.
The following is the formula for averaging three standard deviations.

7

Group 3

The reader should have noticed that the means and standard deviations cannot be averaged. The formula for averaging means is the summation of the weighted means (each mean multiplied by its group size) divided by the total N. The reader remembers that the formula for the mean =

_

(X 2 + S, 2 )

+

N 2 (X 2 2 + S 2 2 )

N , + N2

The actual data for this example are listed on the next page. Run this data using SPSSX.

+

N3 ( X 2 + S 2 )

N3

Where \l = the square root of the expression

_2

70

Test Anxiety: Applied Research

N,, N2, N3 = the number of individuals in each of the three groups
X,, X2, X3 = the means for the three groups
Xt = the weighted means of the three groups combined
S|, S2, S3 = standard deviations for the three groups.
Substitute the corresponding values into the above formula and see how close your answer comes to the answer listed. Your answer may be off by a few decimal points due to rounding error.

6. Common Univariate Statistics
Chapter 6
CONTENTS
6.1
Hypothesis Testing
6.2
t-test for Independent Groups
6.3
t-test for Related or Correlated Groups
6.4
t-test a Special Case of Correlation or Regression
6.5
One-way Analysis of Variance
6.6
SPSSX Power Estimates and Effect Size Measures
6.7
Two-way ANOVA
6.8
Disproportional Cell Size or Unbalanced Factorial Designs
6.9
Planned Comparisons and the Tukey Post Hoc Procedure
6.10
One-Way Analysis of Covariance
6.11
Post Hoc Procedures for ANCOVA
6.12
SPSSX Control Lines for Factorial Analysis of Covariance
6.13
Nested Designs
6.14
Summary

71

6. Common Univariate Statistics

73

6.1 HYPOTHESIS TESTING
Before the t-test can be discussed, it is important to define the following seven terms related to hypothesis testing: null hypothesis, alternative hypothesis, type I error, type II error, power, one-tailed tests, and two-tailed tests. The possible outcomes of hypothesis testing are presented in figure 6.4.
The null hypothesis (HJ states that the independent variable had no effect on the dependent variable, hence the population means are equal.
The Ho is an actual or theoretical set of population parameters or values that would occur if an experiment was performed on an entire population, where the independent variable had no effect on the dependent variable.
The alternative hypothesis (H,), scientific or research hypothesis, is the opposite of the null hypothesis. This hypothesis states that the treatment had an effect on the dependent variable; therefore, the population means or parameters are not equivalent. In sum, both Ho and H, are statistical hypotheses. Type I error is where the null hypothesis is rejected when it is actually true. One is saying that group differences exist when there are not any. Type I error is also called the alpha level, symbolized by the
Greek letter alpha a. This level is often set at .05 in the social sciences.
It determines how much risk one is willing to take in committing a type
I error. When statistical significance is reached, the alpha is the level of significance for the statistical test. Some researchers like to control type
I error by testing statistics at stringent levels such as .01, .001, or .0001.
The difficulty with controlling type I error by using such small a levels is the fact that as type I error decreases, type II error increases. Therefore, these two errors are inversely related.
Type II error, symbolized by the Greek letter 13 (Beta), is the probability of accepting the null hypothesis when it is actually false. One is saying the groups do not differ when they do.
Power of a statistical test is the probability of rejecting a false null hypothesis. This is the probability of making a correct decision. Power is defined as 1 minus type II error or 1-8. Stevens(1990, p. 84)pointsout that power is dependent on at least five factors: a level, sample size, effect size (the amount of difference the treatment makes), the statistical test used, and the research design employed (Heppner, Kivlighan, &
Wampold, 1992).
A one-tailed test, or directional test, states the statistical hypothesis as either an increase or decrease in the population mean value. In a two-

74

Test Anxiety: Applied Research

tailed test, the statistical hypothesis is stated as the null hypothesis. Onetailed tests are more powerful than two-tailed tests, but they result in an increase of type I error. Similarly, they must be stated before an experiment is conducted and should be based on theory. This means one cannot start an experiment with a two-tailed test and fail to find statistical significance and later decide to use a one-tailed test because it can result in statistical significance. This is why many researchers consider onetailed tests invalid, since the null hypothesis can be rejected when differences between population means are relatively small. Graphically, with two-tailed tests, the alpha level is divided between the two tail ends of a normal curve. For example, if cc=.O5,2.5% of the a is distributed on each tail of the normal curve. With a one-tailed test the total alpha value is placed on the right or left tail of a normal curve. Figure 6.1 graphically presents a two-tailed test and Figures 6.2-6.3 present one-tailed tests.

6. Common Univariate Statistics
Figure 6.1 Two-Tailed Test cc=.O5

75

76

Test Anxiety: Applied Research
Figure 6.2 One-Tailed Test a=.O5

6. Common Onivariate Statistics
Figure 6.3 One-Tailed Test a=.O5

77

78

1.
2.
3.
4.

6.

6. Common Univariate Statistics

Test Anxiety: Applied Research

Steps in Hypothesis Testing
State the null and alternative hypothesis.
Choose a statistical test.
Select an alpha level or level of significance.
Calculate the test statistic—this is just performing the statistical analysis. Compare the test statistic with the critical value of its sampling distribution. Sampling distributions provide values that a statistic can take and the probability of obtaining each value under the Ho. For example, if the t-test were calculated, one could find the critical value oft from a table which presents values for the sampling distribution oft. Make a decision. If the absolute value of the test statistic is greater than the critical value, reject the null hypothesis at the set alpha level.
If the test statistic is not greater than the critical value, one fails to reject the null hypothesis; and, therefore report the failure to obtain statistical significance.
Figure 6.4 Possible Outcomes of Hypothesis Testing.
State of Reality

Decision

Ho True
No Treatment Effect

Ho False
Treatment Effect

Retain
Ho

Correct Decision
1-a

Type II
Error
1
3

Reject
Ho

Type I Error a Power
Correct Decision
1-13

6.2 t-TEST FOR INDEPENDENT GROUPS
Sampling Distribution of the Mean
The sampling distribution of the mean permits one to employ inferential statistics. Suppose we had a computer that contained only TAI scores from individuals suffering from test anxiety. Now, imagine randomly drawing samples or groups of TAI scores with various values

79

from the computer. Suppose we decide to record each score. Let us assume we kept drawing samples of 100, 200, and so on of TAI scores and we decided to record each score value for every sample.
Now we can calculate the mean TAI score for each sample. Suppose we kept drawing random samples, until we got an infinitely large number of samples. Again, we calculate the sample means for each sample of
TAI scores drawn. For example, we calculate the mean TAI score for sample one, two, and so on. We can treat each sample mean of TAI scores as a raw score. From these raw scores or sample means for TAI scores, we can construct a frequency distribution of sample means. This frequency distribution would tell how many times each sample mean occurred. This sampling distribution of TAI means is a theoretical distribution of sample means for TAI scores. (The shape of 30 or more
TAI means randomly drawn this distribution of TAI means will be approximately normal.) The mean of the sampling distribution of TAI means is the population mean for all TAI scores. This sample mean would result in the same value obtained if every TAI score from the computer were added and divided by the number of scores on the computer. Remember, from chapter 5, that in the long run the averages of the sample means will equal the population mean. The standard deviation for the sample means of TAI scores is called the standard error of the mean. It provides a measure of how much the sample mean varies from the population mean. Additionally, it provides information about the amount of error likely to be made by inferring the value of the population
TAI mean from the sample TAI mean. The greater the variability among sample means, the greater the changes that the inference made about the population TAI mean from a single sample TAI mean will be in error.
The standard error of the mean for TAI scores is a function of the population standard deviation for all TAI scores and the sample size. As the number of cases for TAI scores increase, the standard error for TAI scores decreases.
Central Limit Theorem
The previous observations lead us to the central limit theorem. If random samples of a fixed number of cases are drawn from any population, regardless of the shape of the distribution, as the number of cases get larger, the distribution of sample means approaches normality, with an overall mean approaching the population mean, and the variance

81

Test Anxiety: Applied Research

6. Common Univariate Statistics

of the sample means equals o%. The standard error equals the population standard deviation divided by the square root of the number of cases. The formula is: OA/N.

the t-test statistic is still robust to the violation of the homogeneity of variance assumption (Welkowitz, Ewen, & Cohen, 1982, p. 163).

80

t-Distribution
The t-distribution is a family of distributions that changes with the degrees of freedom. A different sampling distribution exits for every degree of freedom. As the degrees of freedom increases, the t-distribution gets closer in shape to a normal z-distribution. The z-distribution is less variable because the standard error of the mean is constant. That is, the standard error will not vary from sample to sample because it is derived from the population standard deviation. The standard error for the tdistribution is not constant, since it is estimated. It is based on a sample standard deviation which varies from sample to sample. As the number of cases increase, the variability oft decreases and gets closer and closer in shape to the z-distribution. The differences between the (-distributions and the z-distribution becomes negligible when a sample becomes large, such as 30 or more cases.
Assumptions of Independent t-test
The independent t-test is used to compare the difference between two independent sample means. There are three important assumptions of the t-test for independent samples.
1. Normality—the variables are normally distributed in each population.
2. Homogeneity of variance—the variances of the two populations are equal. 3. Independence—the observations are independent.
Robust Assumptions of Independent t-test
A test statistic is said to be robust when a given assumption is violated and the results are still fairly accurate. The independent samples t-test is robust to the assumption of normality. Similarly, the homogeneity of variance assumption can be ignored, if the two sample sizes are equal. If the sample sizes are fairly equal, that is, the larger group size is < 1.5 times greater than the smaller, the test is still robust.
Specifically, if the group sizes are approximately equal,
Larger group size
Smaller group size

< 1.5

Violations of the Homogeneity of Variance Assumption for t-test for Independent Samples
There are two conditions to be aware of in terms of violating the homogeneity assumption. That is, a researcher should be aware of what happens to the test statistic when the homogeneity of variance assumption is violated. First, we have to present two definitions concerning alpha levels (levels of significance).
A nominal alpha level (level of significance) is the level set by an experimenter. If the assumption concerning homogeneity of variance is met, the actual alpha level equals the nominal level. The actual alpha level is the percent of time one is rejecting the null hypothesis falsely when one or more assumptions are violated; therefore, when one says the t-test is robust, this means that the actual alpha level is close to the nominal alpha level. For example, for the normality assumption, the actual alpha level is close to the nominal because of the central limit theorem. It states that the sum of independent observations having any shape distribution (skewed, rectangular, and so on) approaches normality as the number of observations increase. Remember, when the group sizes are equal the t-statistic is robust to heterogeneous variances. That is as long as the group sizes are approximately equal. For instance, if the largest group size divided by the smaller size is less than 1.5, the t-test is robust to the homogeneity assumption.
When the group sizes are sharply unequal and the population variances are different, what happens to the actual alpha level? If the larger sample variance is associated with the smallest group size, the t-test is liberal. This means we are rejecting the null hypothesis falsely too often. To illustrate, the actual alpha level > the nominal alpha level.
An experimenter may think he or she is rejecting falsely 5% of the time (nominal alpha), but in reality may be falsely rejecting the null hypothesis at an actual alpha of 11%. When the larger variance is associated with larger group size, the t-statistic is conservative. This means the actual alpha.level is less than the nominal. You may not think this is a serious problem, but the smaller alpha level will decrease statistical power (the probability of rejecting a false null hypothesis). Th is conservative test statistic results in not rejecting the null hypothesis as often as it should be rejected, thus leading to what is called a type II error.

82

Test Anxiety: Applied Research

If the normality assumption is tenable, the homogeneity assumption can be tested by the Hartley's F-Max test, which is:
F-Max =

S 2 (smallest)

df = (k, n) k is the number of variances and n is the average group size if there are equal n's or the harmonic mean (see Section 6.9 for the harmonic mean formula) if the n's are unequal. Critical values of F-Max are found in Table F. Suppose, F-Max = 44.8/1.6 = 2.8 for three variances with an n of 20. The critical value of F-Max with (3,20) degrees of freedom is
2.95 at oc=.O5; hence, the absolute value of the test statistic is not greater than the critical value; therefore, we fail to reject the null hypothesis. This indicates that the homogeneity assumption is tenable or the population variances are equal. The reason it is important to ensure that the normality assumption is not violated before performing the F-Max test is that all tests of homogeneity, other than the Levene's test, are extremely sensitive to violations of normality. That is, the null hypothesis can be rejected due to violations of normality. In summary, the Levene's test, found on SPSSX version 4, is not as sensitive to nonnormality as other tests of homogeneity of variance. If the homogeneity of variance assumption is violated, it is possible to perform nonparametric statistics such as the Mann-Whitney U (Siege 1, 1956, pp. 116-127) or the
Kruskal-Wallis one-way ANOVA (Siegel, 1956, pp. 184-193) which do not make any assumptions about populations variances; however, these tests are not as powerful as their parametric counterparts, the t-tests for independent samples, and the one-way ANOVA, respectively. Stevens
(1990) recommends the Welch t statistic for heterogenous variances and unequal group sizes. Monte Carlo studies have shown that the Welch t statistic provides better control of Type I error, and it provides greater power than the nonparametric alternatives. In chapter 7, under the regression section, we discuss how to handle assumption violations by using data transformation strategies.
The Assumption of Independence for the Independent t-test
The most important assumption of the t-test is the independence assumption. A small violation of this assumption has a considerable impact on both the level of significance and the power of the statistical test. A small amount of independence among observations causes the

6. Common Univariate Statistics

83

actual alpha level to be several times greater than the nominal alpha level.
The intraclass correlations measures dependence among observations.
The formula is R=MSW-MSB/MSB + (n-l)MSW. MSB is the numerator of the F statistic, while MSW is the denominator of the F statistic, n is the number of subjects per group.
Let us take an example with a moderate amount of dependence.
Using certain statistical tables, with moderate dependence, n=30, for a two group situation, and an intraclass correlation = .30, it can be determined that the actual alpha level is .59. Now we will take an example with a small amount of dependence. With a small amount of dependence, n=30 and the intraclass correlations 10, for a two group situation, the actual alpha level is .34, whereas the experimenter would assume it is .05.
The moral of the story is for a very small amount of dependence, the actual alpha level is essentially several times greater than the nominal alpha level.
The dependence previously discussed is sometimes called statistical dependence. There is another form of dependence called linear dependence where one or more vectors of a matrix (columns or rows) is a linear combination of other vectors of the matrix. Suppose we have a vector a'=[l,2] and vector b'=[2,4]. The vectors are dependent, since
2a'=b'. The correlation coefficient between the two vectors is 1.00, hence one vector is a function of the other.
A matrix is also dependent when the determinant is zero. For example, the determinant of the following matrix formed by transposing and joining the previous given two vectors has a determinant of zero since: 1 2
2 4

= 1(4) - 2(2) = 0

which is the determinant of the matrix. This suggests that vectors Yh
Y2....Ynofthe same order are linearly dependent, if there exists scalars S,,
S2 Sn not all zero, such that S, Y, + S2Y2 SnYn=0. Vectors are linearly independent when the preceding equation is only satisfied when all the scalars are zero (Kirk, 1982, p. 784). In summary, for linear independence, one vector cannot be a linear combination of the other. In addition, as Rummel (1970, p. 66) notes, statistical independence means the intraclass correlation is not significantly different from zero, while in

84

6. Common Univariate Statistics

Test Anxiety: Applied Research

terms of vectors statistical independence implies that the vectors are orthogonal or uncorrelated.
Formula for the Independent t-test
The t-test determines if two groups came from the same population.
The null hypothesis states that the population means are equal.
Symbolically, |i,=u2, or u r u 2 =0. The formula for the t-test is the difference between the group means divided by the standard error.
Symbolically, the formula is

X! = the mean for group one, while X2 is the mean for group two.
Sv
= the standard error of the mean. The standard error can be rewritten as the variance of group one divided by its group size plus the variance of the second group divided by its group size.

Symbolically, the standard error =
N,

t =

X,

N,

N,

-X2

85

Table 1
SPSSX Control Lines for Independent Samples t-test
The control lines for the t-test is the following:
Title 't-test for independent groups'
Data list free/gpid TAI ]l* |
Begin data
50 2 65
\2*_\
51 2 56
45 2 32
53 2 50
55 2 51
47 2 53
46 2 50
61 2 47
1 55 2 47
I 53 2 44
End data t-test groups = gpid (1,2) [3* |
Variables = TAI/
I1 * | Free on the Data List line indicates that data is in the free format.

N,
|2*| The "Is" and "2s" correspond to the group identification numbers. with N-2 degrees of freedom.
13* | This is the command for the t-test with two levels in parentheses.
Symbolically, the null hypothesis is
Ho: Hi =u2
H,: u, is not=u 2
Let us take a computer example from the measure of central tendency data. 86

Test Anxiety: Applied Research

Table 2
SAS Control Lines for Independent Samples t-Test
Independent Samples t-test
Datei TAI;
Input gpid 1 TAI 3-4; cards; 1 50
1 51
1 45
1 53
1 55
1 55
1 47
1 46
1 61
1 55
1 53
2 65
2 56
2 32
2 50
2 51
2 53
2 50
2 47
2 47
2 42
Proc TTEST;
Class gpid;
Proc Print;

6. Common Univariate Statistics

87

Figure 6.5 Selected Printout from t-test for Independent Samples t-test for independent groups
Group 1: GPID Eq 1.00 t-test for: TAI

Group 2: GPID 2.00

Number of Cases

Mean

Standard
Deviation

Standard
Error

Group 1

10

51.60

4.88

1.54

Group 2

10

49.50

8.48

2.68

F 2-Tail
Value
Prob.
3.02 .116

Pooled Variance Estimate t Value

Degrees of
Freedom

2-Tail
Prob.

*.68

**18

***.5O6

* = the value of the t-test statistic.
** = the degrees of freedom which is N-2 or 18.
*** is the two-tailed probability which is greater than .05, so the null hypothesis is not rejected. This indicates that the group means did not differ greater than we would expect by chance alone.
The results of the above selected printout can be verified by substituting the appropriate value in the t formula:
51.60 - 49.50
23.82

71.83

10

51.60 - 49.50
= .68
3.09

10

.68 must be compared with its critical value from Table B (critical values oft). With 18 degrees of freedom, the critical value at the .05 alpha level is 2.101. Since the test statistic is not greater than the critical value we

88

Test Anxiety: Applied Research

6. Common Univariate Statistics

fail to reject the null hypothesis at the .05 alpha level. In APA journal form the results are t(18)=.68, p>.05.

6.3 THE T-TEST FOR RELATED SAMPLES OR
CORRELATED GROUPS
The t-test for related samples is sometimes called the t-test for repeated measure or the t-test for correlated or paired measures. The formula for the repeated measures t-test is the mean of the differences scores divided by the standard error of the difference. The formula is:

Table 3
Control Lines for Running the Pearson-Product Moment Correlation

Title "correlation"
Data list free/TAIl TAI2
Begin data
50 65
51 56
45 32
53 50
55 51
47 53
46 50
6147
55 47
53 44
End data
Pearson Corr variables = TAI1 TAI2/
Option 3 | ** |

* = the command for the Pearson correlation, using variables TAI1 and
TAI2.
Note: The new SPSSX command for Pearson Corr is Correlations.
** = Option 3 provides a two-tail test of significance.
Note: The new subcommand for a two-tail test for correlations is print=twotail/. repeated t = measures 89

D

D = the mean of the difference scores.
/S 2/N = standard error of the difference scores.
N = the number of pairs and the degrees of freedom.
The repeated measures t-test is used with pretest/posttest designs. Let us take the data provided for the t-test for independent measure and show how the data can be analyzed assuming repeated measures. Suppose one group of 10 matched pairs of subjects on the TAI are randomly assigned by pairs in a pretest and posttest design. Subjects are given guided imagery training to reduce test anxiety. We measure subjects before treatment (01) and after treatment (02).
We would like to know if there are any statistically significant changes from pre-post measures on the TAI. Schematically, the design is: X-is the guided imagery training.

01

6. Common Univariate Statistics

Test Anxiety: Applied Research

90
X
Treatment

02

Table 4
SPSSX Control Lines for Repeated Measures t-test

D=
Difference Scores or Gain Scores

50
51
45
53
55
47
46
61
55
53

65
56
32
50
51
53
50
47
47

65-50=15
56-51=5
32-45 =-13
50-53 =-3
51-55 =-4
53-47 = 6
50-46 = 4

44

44-55 = -9

47-61 = - 1 4
47-55 = -8

D = Mean of difference scores =
-2.1
S = standard deviation of differences scores = 9.386 t = -2.1

= -2.1/2.968 = -.70 with N-l or 9 degrees of freedom

i/9.3862/10
Using Table B for critical values of t, with 9 degrees of freedom the critical value is 2.262 at the .05 level for a two-tailed test. Since the absolute value of the test statistic is not greater than the critical value, we fail to reject the null hypothesis at the .05 level. Therefore, the results in
APA journal form is t(9)=-.7O, p>.05.
It should be noted that the pretest posttest design when used without careful matching and random assignment leaves a number of threats to internal validity uncontrolled. As discussed in Chapter 4, the one-group before-after design is extremely susceptible to experimental contamination. This design suffers from maturational, testing, and history threats to internal validity. In addition, the pretesting can sensitize subjects to the treatment. Finally, reactive effects are also left uncontrolled.

Title 't-test for repeated measures'
Datalist/TAIl 3-4 TAI2 9-10
List
Begin data
65
50
56
51
32
45
50
53
51
55
47
53
46
50
61
47
55
47
44
53
End data t-test Pairs = TAI1 TAI2

91

92

Test Anxiety: Applied Research

Table 5
SAS Control lines for repeated measures t-test
Data Depen;
Inpul Pre 3-4 Post 9-10;
Diff=Pre-Post;
Cards;
50
65
56
51
32
45
53
50
55
51
47
53
46
50
61
47
55
47
53
44
Proc Print;
Proc Means N Mean Stderr T PRT;
Var diff;
Exercises
1. Using SPSSX and SAS, run the t-test for the data given in the section for the t-test for independent samples.
2. Subtract 10 from each score for the data used in exercise 1 and perform the t-test for independent samples again. Compare your answer from each run.
3. Using SPSSX and SAS, run the data from exercise 1, but this time perform the t-test for repeated measures. What is the mean for the differences scores? Answer (2.10). What is the standard error?
Answer (2.968). From the printout, what is the degrees of freedom?
Answer (9).
4. It was determined in an earlier example that the t-test for independent samples was .68 and the one for the dependent samples was .71.
Why is the dependent measure t-test larger? Also, why does the dependent measures t-test have a smaller standard of error. Answer

6. Common Univariate Statistics

93

(the t-test for repeated measures is more powerful than the t-test for independent samples. The repeated measures t-test reduces error variability due to individual differences, thus leading to a smaller standard of error and hence a more powerful statistical test).
6.4 T-TEST OF A SPECIAL CASE OF
CORRELATION OR REGRESSION
The t-test for independent samples is a special case of correlation.
It can be shown that the relationship between the Pearson Product
Moment correlation and the t-test for independent samples is: t = r y'N - 2/1 - r 2 r = the Pearson Product Moment correlation.
N = the number of pairs.
N-2 = the degrees of freedom for using the critical values of t.
The formula for transforming critical values oft to critical values of r is the following: r = \/t2/N - 2 + t2
Exercise
Using SPSSX, find the correlation for the data given with the t-test for independent samples. Now, make the appropriate substitution in the formula above. Critical values of the Pearson r are found in Table H.
What is your decision in terms of rejecting the null hypothesis at the .05 probability level? Answer (r=.23, p>.05), failure to reject the null hypothesis. Using the following control lines for SAS, rerun the correlation for the data given for the t-test of independent samples.
Data Relate;
Input TAI1 1-2 TAI2 4-5;
Cards;
Proc Corr;
VarTAIl TAI2;
Note: the data are inserted between the cards and proc corr statements.

94

6. Common Univariate Statistics

Test Anxiety: Applied Research

6.5 ONE-WAY ANALYSIS OF VARIANCE
The one-way analysis of variance is used to compare two or more group means in order to determine if they differ greater than one would expect by chance. One cannot compare three or more groups with t-tests, because the overall alpha level (experimentwise error) becomes several times larger than the level set by the experimenter, for example the .05 level. The point to be remembered is, if one conducts several t-tests, the chances of one yielding significance increases with the number tests conducted. The advantage of the ANOVA is that it keeps the risk of a type I error small even when several group means are compared. The one-way analysis of variance or one-way ANOVA has the same three assumptions as the t-test for independent samples. Essentially, the same thing happens to the F-test which is used to calculate ANOVA as what occurs when the assumptions are violated under the conditions of the ttest for independent samples.
Assumptions of ANOVA
Normality - the variables are normally distributed in each population.
Also, this is equivalent to the errors (e^) within each population being normally distributed.
2. Homogeneity of variance - the variances of the populations are equal, or the samples are drawn from populations with equal variances.
3. Independence - the observations are independent. This assumption also includes the notion that the numerator and denominator of the Ftest are independent. If one suspects independence or correlated observations, calculate the intraclass correlation, and if there is a high correlation among observations, perform statistical tests at a stringent
(lower) level of significance. Finally, if observations are correlated within groups, but not across groups, employ the group mean as the unit of analysis. Even though this will reduce the sample size, it will not lead to a substantial decrease in statistical power because the means are more stable than individual observations.
1.

Relationship Among t-test, F-test and
Correlation or Regression
It was just mentioned that ANOVA is calculated using a statistic called an F-test. The t-test squared is an F-test. Also, remember it was stated that t-test is a special case of correlation or regression. This is also the case for the F-test. Since

95

_ (1-r) 2
F =
(n-2)
n = the number of pairs.
It becomes clear that the t-test, as well as the F-test, is a special case of regression or correlation. It should also be noted the one-way ANOVA fits a linear model. Suppose the ith subject is in group j, for a one-way
ANOVA the model for a subject's score is: u ,

y

where: u = the general effect or grand mean otj = the main effect e|j = the error
Verbally, this linear model states that an observed value of a dependent variable is equal to a weighted sum of values associated with the independent variable(s), plus an error term (Hays, 1981, p. 326).
Formulas for One-way ANOVA
With the one-way ANOVA, we calculate a test statistic called an Ftest. This F-test, similar to the t-test, is a ratio. The numerator is called the mean square between groups (MSB), which measures the variability among groups. The denominator is the mean square within or the average variability within groups. Hence, with the ANOVA, we measure variability across groups and variability within groups. Below are the formulas for a one-way ANOVA.
MSB
F= called an F-test
MSW
The MSB =

SSB
K-l

K-l is the degrees of freedom between for the term SSB.
SSB = En(X -X) 2

Test Anxiety: Applied Research

6. Common Univariate Statistics

formula for sum of the squares between (SSB) which measures the amount of variability each group mean varies about the grand mean. This is a weighted sum of squares, since each separate sum of squares is multiplied by its corresponding group size nr 2 (Sigma) indicates the sum of squares for each group is added across groups.

Figure &6 Schematic Diagram of One-Way ANOVA

96

97

relaxation therapy 55

55

53

52

56

54

54

54

55

54

55

52

53

53

54

x=53.5
Numerical Calculations of One-Way ANOVA
Let us take an example to illustrate how the ANOVA is calculated.
Later, the control lines for running this analysis on SPSSX and SAS are provided. Suppose we had three groups of subjects randomly assigned to three treatments for test anxiety. Let us assume subjects were randomly assigned to a relaxation therapy, cognitive-behavioral counseling, or study skills counseling group for test anxiety reduction. The dependent variables are the scores from the TAI.

56

55

SSW is the pool within group variability, it measures how much each score deviates from its corresponding group mean. S (Sigma) indicates that the deviations or (x-x) 2 are pooled or added across each group.
In summary, MSB and MSW are both variance estimates, so with the
F-test we are analyzing the variances, hence the name analysis of variance. study skills counseling 53

MSW=SSW This is the formula for mean square within groups.
N-K N-K is the degrees of freedom for the MS W, while K is the number of groups.

cognitive-behavioral counseling x=54.83

x=53.83

(Grand mean) X, = 54.05
ANOVA computes and compares two sources of variation (Stevens,
1990).
1.
2.

Between group variation—how much the group means vary about the grand mean.
Within group variation—how much the subjects' scores vary about their corresponding group means. This variation is due to individual differences, and possibly experimental error.

Between Group Variability
The first term to calculate is the sum of the squares between groups, which is SSB=2n(x r x) 2 . S denotes the summation symbol, n is the number of subjects in a given group, while SSB stands for the sum of the squares between groups. This is the weighted sum of squares in that each deviation is weighted by the number of subjects in a given group.
SSB=6(53.5-54.05)2 + 6(54.83-54.05)2 + 6(53.83-54.05)2 = 5.78

98

Test Anxiety: Applied Research

6. Common Univariate Statistics

Now we need to calculate the mean square between (MSB), which is just simply the SSB/(K-1), where K=the number of groups.

It should be apparent that, like the t-test, the F-test is a ratio. The numerator reflects the variance among the means taking into account the different samples sizes, while the denominator reflects the average variance of the groups involved. The results of the above example indicates that the means do not differ greater than one would expect by change alone, hence the null hypothesis is not rejected. The p>.05 indicates the lack of statistical significance.

MSB=SSB/(K-l)=5.78/2=2.89
Within-Group Variability
The next term to calculate is the sum of the square within (SSW) which is: k SSW = E(X-X) 2 + E(X-X) 2 +

E(X-X) 2

i= 1 where X is the score of a subject in a given group, and X is the group mean of a given group.
SSW = (5S-53.5)2 + (55-53.5)2 + (52-53.5)2 +
(S4-53.5)2 + (54-53.5)2 + (53-53.5)2 +
... + (54-53.83)2= 19.16
The sum of the squares within is finding the variability for each group, then totalling the separate variabilities. This is a pool within group variability. In order to get the variance estimate (MSW), we find the sum of squares within (SSW) divided by its degrees of freedom which is the
SSW/(N-K), where N is the total number of subjects and K. equals the number of groups.
MSW=19.16/15=1.28
The F-statistic tests a hypothesis that involves populations means, just like the one used for the t-test for independent samples. The null hypothesis is that the population means are equal, or:
Ho

,=u2=u3 or

The F-test is the MSB/MSW=2.89/1.28=2.26 p>.05. Using Table C, the critical value of F with 2,15 degrees of freedom is 3.68 at the .05 level of significance, hence the absolute value of the test statistic is not greater than the critical value, so we failed to reject the null hypothesis.

99

100

6. Common Univariate Statistics

Test Anxiety: Applied Research

Table 6
SPSSX Control Linfeg for One-Way ANOVA
Title "one way anova" data list/gpid 1 Y 3-4 begin data

53
55
52
54
54
53
56
55
56
54
55
53
55
53
54
55
52
54
end data
Manova y by gpid (1,3)/ power = F (.05)/ print cellinfo (means)
Signif (efsize)
1 - the code name for multivariate analysis of variance, which is used to calculate the one way analysis of variance. The numbers in parentheses indicate the levels of the groups being compared, in this case the levels are 1 through 3. If there were four grdups, it becomes gpid(l,4).
3 - yields the cell means and standard deviations.

101

2 - provides a power estimate at the .05 level.
4 - provides an effect size measure.
Below is the selected printout from the one-way ANOVA computer run.
Table 7
SAS Control Lines for One-Way ANOVA
Data TAI;
Input Gpid 1 Y 3-4;
Cards;
1 53
1 55
1 52
1 54
1 54
1 53

2 56
2 55
2 56
2 54
2 55
2 53
3 55
3 53
3 54
3 55
3 52
3 54
Proc Means;
By gpid;
Proc ANOVA;
Model Y = gpid;
Class Gpid;
Means Gpid/Tukey;
Proc Print;

102

Test Anxiety: Applied Research
Figure 6.7 Selected Printout from One-Way ANOVA
Tests of Significance of Y using unique sum of squares



6. Common Univariate Statistics

* *This is the measure of power or the probability of making a correct decision in terms of rejecting the null hypothesis. Notice that power is low, the effect size is large, and the null hypothesis was not rejected due to a small power level. More will be said about power in Section 6.6.

DF

MS

Tests of
Significance of Y using unique sum of squares Source of
Variation

SS

Within Cells

19.16

15

5.78

2

2.89

FsigofF

1.28

GPID

F

2.26

.139*

* Indicates that statistical significance was not reached since the two-tailed probability level is greater than .05. If significance would have been reached, it would be necessary to conduct post hoc tests such as the Tukey
(HSD).
For those who are interested in what to do after the F-test is significant with three or more groups should read the section on planned comparisons and the Tukey post hoc procedure.
Figure 6.8 Power Analysis for One-Way ANOVA

Selected Printout Power Analysis for One Way ANOVA
Source of Variation
GPID

Partial
ETA Sqd

Noncentrality

Power

.232*

4.522

.388**

*Cohen (1988, pp. 280-284) notes that eta squared or the correlation ratio is a generalization of the Pearson Product Moment Correlation squared. Stevens (1990, p. 94) characterizes values of eta squared of around .01 as small effect size measures, while values of .06 are as medium effect sizes and values of eta squared of .14 are large effect sizes.

103

6.6 SPSSX POWER ESTIMATES
AND EFFECT SIZE MEASURES
Using the MANOVA command and the power subcommand on
SPSSX power values between 0 and 1 can be obtained for fixed effects models. Fixed effects means that the inferences are fixed or are only specific to the study of interest. Within fixed effect models, the researcher is not attempting to generalize results beyond the given level of a factor in a particular study. Similarly, effect size, the practical significance of a statistical test, can also be obtained from SPSSX with the
Signif(efsize)/ subcommand. Table 4 presented the control lines for finding power and effect size estimates for a one-way ANOVA. In order to modify the power levels, one can put any expression in hundredth in the power subcommand. For example, if we wanted power at the .15 level, the power subcommand would be: Power=F(.15)/.
The effect size obtained from SPSSX is actually the correlation coefficient eta squared. This eta squared is also called the coefficient of determination or the amount of variance accounted for on a dependent variable or set of dependent variables. As previously stated, eta squared of .01 is considered a small effect size, values of .06 are medium values and values of .14 are considered large effect size measures. In terms of measures of power, Stevens (1990, p. 85) characterizes power values greater than .70 as adequate and values greater than .90 as excellent.
SPSSX Users Guide (1988, p. 602) states that the effect size measures obtained from SPSSX computer runs are actually partial eta squared correlations. Stevens (1990, p. 85) notes that partial eta squared tends to overestimate the actual effect size. The actual value for eta squared can be obtained from the following formula (Stevens, 1990, p.
94).
eta squared =

(K-l) • F
(K-l) • F+N

The formula for the partial eta squared is: partial eta squared =

(K-l) • F
(K-l) • F + (N-K)

104

Test Anxiety: Applied Research

K-1 and N-K are the degrees of freedom from the one-way ANOVA. K-1 is the number of groups minus 1, while N-k is the total sample size minus the number of groups. • implies multiplication. When sample sizes are large N > 100, the differences between eta squared and partial eta squared are negligible. The two formulas only differ by -K.
SPSSX Power Analysis for t-test for
Independent Samples ANOVA
Title "Power for t-test"
Data list free/gpid TAI
Begin data
50
2 65
51
2 56
45
2 32
53
2 50
55
2 51
47
2 53
46
2 50
61
2 47
55
2 47
53
2 44
End data
MANOVA TAI by gpid (1,2)/
Power = F(.O5)/
Print = Cellinfo (Means)
Signif(EFSIZE)/

105

6. Common Univariate Statistics

SPSSX Power Analysis for One-way ANOVA
Title "Power for One-way ANOVA"
Data list/gpid 1 Y 3-4
Begin data
End data
MANOVA y by gpid (1,3)/
Power = F(. 15)/
Print Cellinfo (Means)
Signif(EFSIZE)/
Note: Data are inserted between the begin data and end data commands.
Exercise
The following data represent the results of three treatments for test anxiety. Perform the correct analysis on the data using SPSSX, what are your results? Perform a power analysis at the .05 and . 15 levels.
Figure 6.9 Results of Three Treatments for Test Anxiety

relaxation therapy
TAI scores

cognitive-behavioral counseling TAI scores

study skills counseling TAI scores

62

63

64

66

59

65

67

58

63

x=65

x=60

x=64

Was statistical significance reached at the .05 level?
(Answer, no, F=4.20, p=.O72.)
6.7 TWO-WAY ANOVA
The two-way ANOVA is a factorial design which was discussed in
Chapter 4. The factorial ANOVA allows one to answer three specific

Test Anxiety: Applied Research

6. Common Univariate Statistics

statistical questions. First, do the row means differ significantly? This is the row main effect. Second, do the column means differ significantly
(column main effect)? Finally, is the profile of cell means in row one significantly nonparallel to that of row two? This is called an interaction between factors one and two. In summary, if the profile of cell means for row one crosses with the profile of cell means for row two, this indicates an interaction. Similarly, if the profile of cells means for row one can be extended in such a way that it crosses with the profile of cell means for row two, this also indicates an interaction. When the profiles of cell means are parallel, this indicates a noninteraction. Graphically, Figures
6.10-6.13, below, represent profiles of adjusted cells means for two kinds of two-way interactions and two forms of noninteractions. See Figure
6.15 for the calculations of adjusted cell means.

Figure 6.10 Disordinal Interaction

106

Nonparallelism

Figure 6.11 Noninteraction
Parallelism

Figure 6.12 Ordinal Interaction
Nonparallelism

Figure 6.13 Noninteraction
Parallelism

107

108

Test Anxiety: Applied Research

Just like the t-test and the one way anova, the two-way anova is also a linear model. The model looks like this: y«k = H + Aj + Bj + Os + eijk
Where:
u is the general effect or grand mean
A| is the main effect for factor A
Bj is the main effect for factor B
Ojj is the interaction effect eijk is the error
Formulas for Two-way ANOVA
SSA = nJ E(X, - X)2 formula for sum of squares for factor A. The subscript i indicates a row.
The nJ is the number of observations each row mean is based upon.
The sum of the squares reflects the variability of the row means about the grand mean.
SSB = nl £(x\ - X)2 formula for sum of squares for factor B. The subscript j indicates a column. This reflects the variability of the column means about the grand mean.
The nl is the number of observations each column mean is based upon.

6. Common Univariate Statistics

MSW = SSW
(N-IJ)

N is the total number of subjects, I is the number of rows, while J is the number of columns.

Next we will provide the interaction sum of squares for the two-way
ANOVA.
FORMULAS for TWO-WAY ANOVA
Formula for the Interaction Sum of Squares
SSAB = n2 O,,2 which is the formula for interaction sum of squares
(SSAB), where
Oy = X|j - Xj. - x.j + x is the estimated cell interaction effect.
Xy is the mean of some cell defined by row i and column j.
X; is the row mean. x j is the column mean. x is the grand mean. n is the number of scores or subjects within a cell.
S (Sigma) denotes that each interaction effect is added or summed.
The formulas that follow are for main and interaction effects.
Formulas for F-tests Main Effects

MSA =
MSB =

SSA formula for mean sum of squares for factor A.
(1-1), I is the number of rows.
SSB formula for mean sum of squares for factor B.
(J-l), J is the number of columns.

109

F(A) = MSA
MSW
F(B) = MSB
MSW

Error Term
Formula for the sum of the squares of the cells is

Formula for Interaction Effect

2

SSW = E(X-X fj ) .
X is a score in a cell, and Xr is the mean of a given cell.

MSAB = SSAB
(I-D(J-l)

SSW is called the error term or the sum of squares within each cell or the pooled within cell variability. For each cell every score is deviated or subtracted from the cell mean and squared. Basically, the sum of squares for each cell is calculated and added across every cell.

F(AB) = MSAB
MSW

where I = the number of rows and J = the number of columns. The expression (K-1)(J-1) is the degrees of freedom for interaction. Test Anxiety: Applied Research

In contrast to the one-way ANOVA, the factorial ANOVA is more powerful than the one-way ANOVA. In terms of variation, Stevens
(1990, p. 103) notes that there are four sources of variance for the factorial ANOVA and they are:

111

6. Common Univariate Statistics

Figure 6.14 2 X 2 Design

no

1. Variation due to factor 1
2. Variation due to factor 2
3. Variation due to factor 1 and 2
4. Within cell or error variation
The assumptions for the factorial ANOVA are the same as for the one-way ANOVA. That is, normality on the dependent variables in each cell and equal cell population variances.
Let us consider a 2 X 2 design with two scores per cell.
With this design, there are two levels of hypnotic susceptibility and two levels of treatments.

Treatments (B)
1

(1)
High

Hypnotic
Susceptibility
(A)
(2)

Low

2

50,70 x=60 high(l) hypnotic susceptibility row A(l)

30,60 x=45 50,50 x=50 low(2) hypnotic susceptibility row A(2)

Row
Means

60,60 x=60 x.l =55 column mean

Row Mean
A(l)
x, = 52.5

Row Mean
A(2)
x2 = 55
x.2 = 52.5 column mean

xg= 53.75
Grand
mean

The scores in this design represent the TAI scores for high and low hypnotic susceptible subjects randomly assigned to two groups: relaxation therapy, and hypnosis. The dot notation above x , indicates rows, while the dot notion in the first part of the subscript indicates columns. We are going to first test the main effects of factor A. The null hypothesis for the row effect is Ho: u, = u2 = ...u,. This indicates that the population means are equal. From the above table x, = 52.5 = U, and x2
= 55 which are the estimates of the population means for the high and low hypnotic susceptible subjects. The "I" in the above notation indicates rows. The null hypothesis for the B main effect is: Ho: u, = u2 = u ; .
This denotes the population column means are equal. Similarly, x , = 55
= u |, and x 2 = 52.5 = u 2 . These are estimates of the population column

112

Test Anxiety: Applied Research

6. Common Univariate Statistics

means. The "J" above indicates columns. Thus, this is a I X J design.
That is, there are two levels of A (susceptibility—high and low) and two levels of treatment.

MSW = SSW/ (N-IJ) = 650/(8-4)=162.50. This represents the average of the cell variances.

Sum of the Squares for a Balance Factorial Design
Balanced designs mean that there are an equal number of observations per cell. Later, we will provide the SPSSX control lines for running unbalanced designs.
SSA = nJ 2 (xL - x)2 The nJ is the number of observations each row mean is based upon, or 2 X 2. The sum of the squares reflects the variability of the row means about the grand mean. For this example,
SSA is: SSA = 2(2)[(52.5 - 53.75)2 + (55 - 53.75)2] = 12.50
The mean sum of squares for factor A (MSA) is:
MSA = SSA/ (1-1) = 12.5/1 = 12.5
SSB = nl 2 (>.05. The number in parentheses are the degrees of freedom which is 1-1 and J-IJ or 1,4. With
(1,4) degrees of freedom the critical value of F at the .05 level is 7.71.
Since the absolute value of the test statistic is not greater than the critical value of F, we fail to reject the null hypothesis. F(A) listed above is the way statistics are usually presented in APA journals. The F(B) =
MSB/MSW = 12.50/162.50 = .08. The degrees of freedom is J-l, N-IJ or
2-1, 8-2x2 wh ich is (1,4). The critical value for these degrees of freedom at the .05 level is 7.71. The results for F(B) = .08(1,4) p>.05. This indicates that the null hypothesis was not rejected or the population columns means are equal. In sum, for both factors A and B statistical significance was not obtained.
Interaction
One should note that if a significant interaction is found the main effects are ignored, since the interaction is a more complex model for the data. If a significant interaction is found, it is used to explain the data. Similarly, after a significant interaction is found one must conduct post hoc test to determine exactly where the differences are located. See
Hays (1981) for a discussion of post hoc procedures. Interaction is defined as Oy.
Where 0g = (u if u) - (u:.- u) - (u.j - u)
= Uij - Uj. - U.j + U

u denotes a population mean or the mean of every score in some population. The sum of the squares for the interaction is SSAB = n times
2 Ojj2, and n is the number of observations within a cell.
Where Oy = x^ - xr - x.j + x is the estimated cell interaction effect. We can obtain this formula, because a sample mean is the best linear unbiased estimate of the population parameter or mean. It is known that the sum of the interaction effects for a fixed effects design equal 0 for every row and column; therefore, it is only necessary to find the interaction effects for cells 0,i and 0,2.

114

Test Anxiety: Applied Research

6. Common Univariate Statistics

0n = 60-52.5-55+53.7 = 6.25
0]2 = 45-52.5-52.5+53.75 = -6.25
0 2 ,= 50-55-55+53.75 = -6.25
0,, = 60-55-52.5+53.75 = 6.25

115

Where the standard error is:
[MSW(EWk2/n)]2
n = the cell size.

Figure 6.15
Adjusted Cell Means
Treatments (B)
1
2
Hypnotic
Susceptibility
(A)

(1)
High

0,, =6.25

0,2 = -6.25

(2)
Low

02, = -6.25

022 = 6.25

Ojj is also called an adjusted cell mean, and these values should be used to graph interactions. It is common practice of many research design and statistics textbook writers to recommend the graphing of observed cell means. Such graphs can lead to the misinterpretation of interaction effects. This is because interaction effects are residual effects in that lower-order effects must be removed. Essentially, interaction effects are the residual (remaining) effects after the row and column effects are removed, especially if they are statistically significant
(Harwell, 1998).
Interaction contrasts can be identified and tested if they are not of the following form: MIJ-AVThis type of one-way contrast is most appropriate for one independent variable. The following form is appropriate for factorial designs: (?„-?,,) - ( V Y 2 2 ) .
If we let C, denote a contrast for a factorial design, t = Q divided by the standard error.
Q = W, x, + W2x2 ••• Wkxk = £W; Xi, and W^Wj - Wk are the coefficients or weights. And xk is the adjusted cell mean or the unadjusted cell mean.
These contrasts can be tested by the following t-test:
,.

standard error

The reader can read the section on planned comparisons in order to understand the rationale for contrasts; however, the calculations for the one-way and the factorial cases are similar. Finally, Kirk (1995) provides an excellent discussion of how to interpret interactions. He noted that whenever two treatments interact this interaction is referred to as a treatment-contrast interaction.
And he notes that the follow-up procedures for such designs can be complex, and researchers have different perspectives for handling such designs.
SSAB = 2 [(6.25)2 + (-6.25)2 + (-6.25)2 + (-6.25)2] = 312.50
MSAB = SSAB/(I-1) (J-l) = 312.50/(2-1) (2-1) = 312.50
F(AB) = MSAB/MSW = 312.50/162.50 = 1.92
The critical value of F for 1,1 degrees of freedom is 161 at the .05 level; hence F(AB)(1,1) = 1.92, p>.05. The null hypothesis was not rejected.
The reader may encounter designs with random factors. That is, the researcher would like to generalize to some population of treatments, if treatments were the factor with various levels. If a researcher wishes to generalize beyond the factors in a given study, this represents a random factor. Also, it is possible to have a mixed model in which one factor is fixed and the other is random. Hays (1981) discusses the correct error term to use in each case; that is, for fixed and random factors, or some combination thereof. The control lines for running a balance factorial anova are provided below. The same control lines are used for an unbalanced design since the unique sum of squares will be provided, which is the default option of SPSSX.

Test Anxiety: Applied Research

6. Common Univariate Statistics

Table 8
Control Lines for Running a Two-Way Anova
With Equal n's on SPSSX

116

Table 9
SAS Control Lines for Two-Way ANOVA

Title "two-way anova equal ns"
Data list/gpidl 1 gpid2 3 dep 5-6 begin data
1 1 50
1 1 70
1 2 30
1 2 60
2 1 50
2 1 50
2 2 60
2 2 60 end data
Manova dep by gpidl(l,2) g P id2(l,2)/ design/ List variables=gpidl gpid2 dep

Data Twoway;
Input gpidl 1 gpid2 3 dep 5-6;
Cards;
1 1 50
1 1 70
1

1
2
3

1 - This produces the factorial ANOVA. The numbers in parentheses are the levels being used for each factor.
2- Specifies the design, which is gpid 1, gpid2, gpid 1 *gpid2/. For a full model, only design/ is needed.
3 - List provides a listing of the data.
Note: This analysis will also run with unequal ns.

2

30

1 2 60
2 1 50
2 1 50
2 2 60
2 2 60
Proc Print;
Proc GLM;
Class gpidl gpid2;
Means gpidl gpid2 gpidl*gpid2;
Model dep=gpidl gpid2 gpidl *gpid2;

117

118

119

Test Anxiety: Applied Research

6. Common Univariate Statistics

Table 10

A X B design, remember that SSA, SSB and SSAB and SSW are independent. SSA corresponds to the A effect, SSB is the B effect, SSAB is the interaction effect, and SSW is the error effect. Similarly, the variance estimates or mean squares will also be independent. Stevens
(1990) notes that there are three approaches to unbalanced factorial designs: Below is the selected output from the SPSSX run.
Selected Printout Two-Way ANOVA

Tests of Significance for dep using unique sum of squares

SS

DF

MS

650.00

4

162.50

23112.50

1

GPID1

12.50

GPID2
GPID1
BY
GPID2

Method 1:

Source of
Variation
Within
Cells
Constant

F

Fsig ofF 23112.50

142.23

.000

1

12.50

.08

.795

12.50

1

12.50

.08

.795

312.50

1

312.50

1.92

.238

F sig of F are the probability values for each statistic. Because none of the main effects or the interaction was not less than .05, statistical significance was not found for any main effect or the interaction.
6.8 DISPROPORTIONAL CELL SIZE OR
UNBALANCED FACTORIAL DESIGNS
For disproportional cell size or unbalanced factorial designs, the effects are correlated or confounded. If the correlation is not accounted for, the results can be misinterpreted. When we calculate a one-way
ANOVA, the sum of the squares will be partitioned into two independent sources of variations, in which we previously noted were between and within variation.
With a factorial ANOVA, for equal cell sizes, the sum of the squares for all effects are independent. That is, the sum of the squares for the main effects, interaction effects and error effects are independent. For an

Find the unique contribution of each effect. This is adjusting each effect for every other effect in the design which is called the regression approach. On SPSSX, this approach is called the unique sum of squares and is the default option.
Method 2: Estimate the main effects, disregarding the interaction effect. Now estimate the interaction effect, adjusting it for the main effects. This is the experimental approach.
Method 3: Due to previous research and theory, establish an ordering of effects, then adjust each effect for the preceding effects.
This is called the hierarchical approach. On SPSSX, this is called the sequential sum of squares. Suppose we had effects in the following order: A,B, A*B, B*C, A*C,
A*B*C. In the sequential approach, the main effect A is not adjusted. The main effect B is adjusted for by A effect. The interaction effect A*B is adjusted for by each main effect.
The interaction B*C is adjusted for by the A*B interaction and the separate main effects A and B. A*C is adjusted for by the two interaction effects B*C and A*B and the two main effects. Finally, the second order interaction A*B*C is adjusted for by the three first order interactions A*C, B*C and A*B along with the two main effects A and B.
Table 11 is the command used to obtain the sequential sum of squares on
SPSSX for the hierarchical approach, and Tables 11 and 12 are the control lines for a three-way ANOVA.

Test Anxiety: Applied Research

6. Common Univariate Statistics

Table 11

120

Table 12
SPSSX Three-way ANOVA

Method = SSTYPE (Sequential)/

Higher Order Factorial ANOVA 2 X 2 X 2
Below are the control lines for running a three-way ANOVA.
Title "three-way ANOVA"
Data list/gpidl 1 gpid2 3 gpid3 5 dep 7-8
Begin data
1 1 1 52
1 1 1 57
1 1 1 53
1 1 1 52
1 1 1 56
1 1 1 58
1 1 2 61
1 1 2 61
1 1 2 60
1 1 2 55
1 1 2 60
1 1 2 58
2 2 1 55
2 2 1 55
2 2 1 57
2 2 1 60
2 2 1 65
2 2 1 62
2 2 2 63
2 2 2 60
2 2 2 61
2 2 2 62
2 2 2 64
2 2 2 65
End data
Manova (iep by gpidl(1,2), gpid2(l,2), gpid3(l,2)/
List variables = gpidl gpid2 gpid3 dep

121

123

Test Anxiety: Applied Research

6. Common Univariate Statistics

Table 13
SAS Three-way ANOVA

122

With higher order factorial designs, the major difficulty is that the overall alpha level can become extremely high. For example, with a three-way factorial design, there are 8 or 2k sources of variation. The sources are the following: A, B and C main effects, AB, AC and BC first order interactions, ABC second-order interaction, and within cell or error variation. Because several sources of variation exist, these increase the probability that one interaction may be significant simply as the result of chance; therefore, significant interactions must be hypothesized a priori.
If an interaction is found to be significant and it was not hypothesized a priori it should be tested at a smaller alpha level like the .02. This is obtained by setting alpha at .05 divided by three, the number of statistical tests in a two-way factorial design.

Title "Three-way ANOVA";
Data Threeway;
Input gpidl 1 gpid2 3 gpid3 5 dep 7-8;
Cards;
1 1
52
57
1 1
1 1
53
1 1
52
1 1
56
1 1
58
1 1 2 61
1 1 2 61
1 1 2 60
1 1 2 55
1 1 2 60
1 1 2 58
2 2 1 55
2 2 1 55
2 2 1 57
2 2
60
2 2 1 65
2 2 1 62
2 2 2 63
2 2 2 60
2 2 2 61
2 2 2 62
2 2 2 64
2 2 2 65
Proc Print;
Proc GLM;
Class gpidl gpid2 gpid3;
Means gpidl gpid2 gpid3 gpidl*gpid2
GpidP 'gpid 3 gpid2*gpid3
Gpidl *
2*gpid3;
Model dep = gpidl|gpid2|gpid3;

6.9 PLANNED COMPARISONS AND THE
TUKEY POST HOC PROCEDURE
A priori or planned comparisons are extremely useful statistical tools for conducting research on test anxiety. Both post hoc procedures and a priori procedures are multiple comparisons. These procedures permit a researcher to test differences among means. As the name suggests, post hoc comparisons or procedures are performed after an
ANOVA test has been found to be significant.
Some statisticians use the term a posteriori in reference to post hoc tests. If an F-test is significant, among K groups there are K(K-l)/2 possible pairwise comparisons. For three groups there will be three pairwise comparisons. This can be verified by the formula for combinations. For example, when three (n=3) groups taken r=2 at a time, this can be solved by the following formula for a combination which is:
(n) n!
= 3(2)0) = 3
(r)r!(n-r)
! called a factorial, n! = n(n-l) (n-2)... 1. By definition 0! = 1.
A Priori or Planned Comparisons
Before one can understand the meaning of pairwise comparisons, it is necessary to define the term contrast. A contrast is the difference among means, given the appropriate algebraic signs (Kirk, 1982, p. 90).
C( denotes a contrast. A contrast can be expressed as a linear combination of means with the appropriate coefficients, such that at least

124

Test Anxiety: Applied Research

6. Common Univariate Statistics

one contrast is not equal to zero and the coefficients sum to zero.
Specifically, a contrast can be expressed as:

significance when two means are relatively far apart. Thus, liberal procedures are more powerful than conservative procedures; however, conservative procedures such as the Scheffe test control type I error better than do liberal procedures.
There is another post hoc test, called the Dunnett's test, it can only be used when one has a control group and wishes to compare each treatment group with a control group. The Dunnett's test can be expressed as the following t statistic:

| + W2X2 + ... + W k X t = EWjXi

Where Wi,w2... wk correspond to the coefficients or weights. xk = the mean for group k.
Now the concept of a pairwise comparison becomes sensible. If two coefficients of a contrast are two opposite integers, such as 1 and -1 and all other coefficients equal zero, the contrast is called a pairwise comparison; otherwise, it is anonpairwise comparison (Kirk, 1982, p. 91).
It should be noted that the F test does not have to be significant in order to perform a priori or planned comparisons.
Post Hoc Procedures
Post hoc procedures are used to determine which pairs of means differ significantly. In essence, they are follow-up tests to F-tests according to Kirk (1982, pp. 90-133). Huck, Cormier, and Bounds (1974, pp. 68-72) describe the five basic post hoc comparisons and label them from liberal to conservative, as the following: Fisher's LSD (Least
Significant Difference), Duncan's new multiple range test, NewmanKeuls, Tukey HSD (Honestly Significant Difference), and Scheffe's test.
The Fisher's LSD is the most liberal; the Scheffe's test is on the other side of the spectrum being extremely conservative. Figure 4 depicts this liberal to conservative continuum for post hoc procedures.
Figure 15

Liberal to Conservative Continuum for
Five Commonly Used Post Hoc Procedures
LIBERAL
Fisher's—Duncan's-Newman-Keuls'—Tukey

CONSERVATIVE
Scheffe

Liberal post hoc procedures increase the chance that an experimenter will find statistical significance when two means are relatively close together, while conservative procedures will only indicate statistical

X,

125

-X2

t=

MSW I I
(n

+

il n2) df=N-K or the total sample size minus the number of groups. The critical values for the Dunnett test are found in Table G.
Because there is little agreement of which techniques are the most appropriate, the topic of multiple comparisons has engendered great debate among statisticians. This writer supports the position of Kirk
(1982, p. 127), who recommends finding a statistical test that controls type I error and simultaneously provides maximum power.
The Tukey's HSD is one post hoc procedure that fits this recommendation, as it allows a researcher to make any or all pairwise comparisons. Later, the control lines for obtaining the Tukey will be provided. Many post hoc procedures are a modification of an independent ttest. In contrast to post hoc procedures, a priori or planned comparisons are planned before an experiment is conducted. Unlike post hoc procedures, it is not necessary for the F-test to be significant to employ planned comparisons. Planned comparisons are used for theory testing, whereas post hoc comparisons are often employed in exploratory studies to investigate all possible differences among group means. Planned comparisons can be viewed as more precise tests than the global post hoc statistical tests. A priori comparison should be strongly based on theory and/or empirical evidence.
Note that planned comparisons are more powerful than post hoc procedures. In fact, planned comparisons are the most powerful means of statistically testing differences among means (Shavelson 1981, p. 468).

127

Test Anxiety: Applied Research

6. Common Univariate Statistics

In addition to the assumptions of the oneway ANOVA, planned comparisons have the following three assumptions:
1. Hypotheses or comparisons are planned and justified prior to conducting a study.
2. The sum of the weights for each comparison equals zero.
3. K-l, independent comparisons are performed.
With equal n sizes, two comparisons are independent if the sum of the crossproduct of their corresponding weights equals zero. If sample sizes are not equal, the orthogonality or independence of comparisons can be expressed with the following equation:

vs C2 = -1(0) + 1(-1) + 0(1) = - 1 ; therefore, the two comparisons are not independent. However, the statistical procedure for both independent and dependent comparisons is the same.

126

Two questions can be asked of this data.
1.

Is hypnosis more effective than relaxation therapy?

The -u, indicates a reduction in test anxiety for the hypnosis group.
W

1IW2I

W
+

12W22

W

lk W 2k

2.

n.

Is the relaxation therapy more effective than a Hawthorne control group? wik-corresponds to a given weight
The -u2 indicates a reduction in test anxiety for the relaxation therapy group. nk-is the corresponding n size for a group.
Suppose we are interested in using planned comparisons to test the effects of hypnosis, relaxation therapy and a Hawthorne control group on the reduction of test anxiety. It is possible to perform K-l or 2 independent comparisons. Suppose theory suggested that hypnosis would be the most effective treatment, followed by relaxation therapy as the second most effective treatment. The following contrasts or comparisons can be established:

Hypnosis group Relaxation therapy group There is a statistical test to determine if a given contrast differ significantly from 0. The null hypothesis can be stated as:
Ho: C, = 0, while the alternative hypothesis is H,: C, is not=0.
There are two statistical tests for testing the difference of a contrast. One test makes use of a t-test, the other employs an F-test. The corresponding t and F formulas are:

Hawthorne control group C,

-1

1

0

C,

0

-1

1

C,-corresponds to set of weights
F =
The crossproduct of the comparisons weights are not equal to zero since,
C,

C2 / Swi 2 / n(
MSW

128

When the group sizes are equal, the simpler formula that follows can be employed used for testing the difference of contrasts.
F=

n C 2 / £wi 2
MSW

Table 14
SPSSX for Testing Contrasts
SPSSX uses the t test to perform contrast. Below are the control lines for running this analysis of SPSSX.
Title %A priori Comparisons'
Data List Free/Gpid TAI
Begin data

1 55 1 58 1 58 1 61 1 51 1 59 1 55 1 59 2 66 2 68 2 55 2 62 2
61 2 62 2 73 2 69 3 66 3 57 3 60 3 54 3 57 3 73 3 62 3 63
End data
Oneway TAI by GPID (1,3)/
Contrast-1 1 0/
Contrast 0-1 1/
Statistics= all
The following is the annotated printout for this analysis.

129

6. Common Univariate Statistics

Test Anxiety: Applied Research

A Priori Comparison
By GPID

A Priori Comparison
By GPID

Analysis of Variance

Source

DF

Sum of
Squares

Mean
Square

F
Ratio

F
Prob.

Betwee n Groups

2

228.00

114.00

4.38

.0256

Within
Groups

21

546.00

26.00

Tests for Homogeneity of Variances
Cochrans C = Max. Variance/Sum(Variances) = .4652, p = .537(Approx.)
Bartlett-Box F = 1.395, P = .248

131

Test Anxiety: Applied Research

6. Common Univariate Statistics

Table 15
SAS for Testing Contrasts

130

**This indicates that the first contrast is significant, meaning the hypnosis group significantly reduced test anxiety when compared with the relaxation therapy group.
***This indicates that the second contrast was not significant. This means that the relaxation therapy group did not significantly reduce test anxiety more than the Hawthorne control group.
The reader can confirm how the t value of 2.94 was obtained by substituting into the appropriate formulas.

Data Compare;
Input gpid TAI @(2
Cards;
1 61
1 55
1 58
2 68
1 55
2 66
2 73
2 61
2 62
3 54
3 57
3 60
3 62
3 63
Proc Print;
Proc Means;
By gipd;
Proc GLM;
Class gpid;
Model TAI = gpid;
Contrast "gpidl vs. gpid2"
Gpid-1 1 0;
Contrast "gpid3 vs. gpid3"
Gpid 0-1 1;

1
2
2
3

51
55
69
57

1
2
3
3

59
62
66
73

C, = -1(57) + 1 (64.5) + 0(61.5)= 7.5

F= 8(56.25)72 = 8.65 df=l and (N-k) or 1, 21
26
It is known that t=VF = 2.94. The df for t = (N-K)
It is left to the reader to confirm the calculation for the second contrast.
Tukey HSD
Even though the formulas for the various post hoc procedures will not be provided, the control lines and the formula for obtaining the Tukey
Honestly Significantly Difference (HSD) will be provided. For a thorough discussion of the various post hoc procedures, consult Kirk
(1982, pp. 90-133).

Table 16
SPSSX Results of Planned Comparisons

Pooled Variance Estimate*
Value

Formula for the Tukey Procedure

S.
Error

T value d.f.

Tprob
HSD = q

Contrast 1

7.500

2.55

2.94

21

.008**

Contrast2

-300

2.55

-1.17

21

.252***

*If the homogeneity assumption is not tenable, the separative variance estimates should be used. One can see that this is not the case since the
Cochran's tailed probability is greater than .05.

MSW
H

q = the value of the studentized range statistic, which is found in Table D by using the within group degrees of freedom and the number of groups
(K). Specifically, we use K, N-K as the degrees of freedom for locating
q. Winer (1971, p. 216) defines the studentized range statistic as the difference between the largest and smallest treatment means (range) divided by the square root of the mean square error divided by n or the common group size. The studentized range statistic can be defined with

Test Anxiety: Applied Research

132

the equation that follows. But first, critical values of studentized range statistic can be found in Table D. In Table D Error df corresponds to the within degrees of freedom and "number of means (p)" corresponds to K, the number of groups.

6. Common Univariate Statistics

133

Ng= the number of groups.
Formula for Simultaneous Confidence Intervals for the
Tukey-Kramer Procedure

Formula for q-Studentized Range Statistic
X largest - X smallest

Simultaneous confidence intervals can be obtained with the following formula: ^MSW/n n= the common group from a one-way ANOVA. For factorial designs, n is the number of observations a row or column mean is based upon.
Provided the homogeneity assumption is tenable, when there are unequal n's, n can be replaced by the harmonic mean or with the harmonic mean of the largest and smallest group size. This can be accomplished by replacing the denominator with

x, and Xj are the means of two groups. If the confidence interval includes
0, we fail to reject the null hypothesis and conclude that the population means are not different. Below are the control lines for running the
Tukey-Kramer procedure and harmonic mean on SPSSX.
Table 17
Obtaining the Tukey Procedure and Harmonic Mean on SPSSX

largest

smallest

This latter procedure for unequal n's is conservative in the sense of reducing the nominal alpha level of significance. For example, one can set the nominal alpha level at .05 while the actual alpha level is less than
.05, possibly .04. This procedure is not recommended since it tends to reduce statistical power. The harmonic mean, for which the formula follows, is recommended instead and will be employed with the Tukey procedure. When the harmonic mean is used, the formula is referred to as the Tukey-Kramer procedure. We will provide simultaneous confidence intervals employing this procedure.
Formula for Harmonic Mean
H. = the harmonic mean
Hn = the number of groups divided by the reciprocal of the Ns.
H =

Ng
1/Nj + 1/N2... + 1/Nj

Title "One way ANOVA with the tukey procedure"
Data list/gpid 1 dep 3-4
Begin Data
1 50
1 60
1 70
2 55
2 45
2 35
3 30
3 38
3 29
End data
1 •
Oneway dep by gpid(l,3)/
2
ranges=tukey/
3
Harmonic= all/
4
statistics= all/
List variables=gpid dep

134

6. Common Univariate Statistics

Test Anxiety: Applied Research

1 - Oneway is the code name for one way ANOVA.
2 - Ranges =tukey is the subcommand for the tukey procedure.
3 - List variables provides a listing of the data.
4 - Provides the harmonic mean.

Exercises
1. The data below represent TAI scores for three treatment groups. Run the data using SPSSX and SAS.

Tl
53
55
52
54
54
53

Obtaining the Tukey Procedure on SAS
Data Tukey;
Input gpid 1 dep 3-4;
Cards:
1 50
1 60
1 70
2 55
2 45
2 35
2 30
3 30
3 38
3 29
Proc Means;
By gpid;
Proc ANOVA;
Class gpid;
Model dep = gpid;
Means gpid/Tukey;
Proc Print;

135

T2
56
55
56
54
55
53

T3
55
53
54
55
52
54

What is the value of the test statistic? (2.26)
Was statistical significance found? no
Write the result of this analysis in journal form.
F(2,15)=2.26,p>.05.
2. Rerun number 1, but insert the command for the Tukey post hoc test.
Were any group means statistically different? (No).
3. Rerun number 1, and perform the following contrasts:
A

Tl

c,

1

T2
1

T3
0

0

1

-1

A

Discuss the results in terms of test anxiety from the analysis.
The results indicated that neither contrast was significant. Contrast 1 had a t value of 2.043 and a probability value of .059; contrast 2 had a t-value of 1.532 and a probability value of .146. The results indicated that treatment 1 did not significantly reduce test anxiety more than treatment
2, nor did treatment 3, significantly reduce test anxiety more than treatment 2.
4.

The data below represents a study of males and females randomly assigned to three treatment conditions. The dependent variable is the
TAI. Using SPSSX and SAS, perform the appropriate statistical analysis on the data. Is there a GPID1 by GPID2 interaction? If so, is the interaction significant at the .05 level? (Yes)

136

137

6. Common Univariate Statistics

Test Anxiety: Applied Research

Treatments (B)

GPID1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
5.
6.

GPID2
1
1
2
2
2
3
3
1
1
1
2
2
2
3

TAI
66
67
63
59
58
65
63
56
60
58
61
58
58
62

How would you classify the design in number 4?
(unbalanced 2 X 3 design)
Let us consider the calculations for a 2 X 3 design with three scores per cell. This time, A(l) and A(2) correspond to the two levels of hypnotic susceptibility and B represents the three levels of treatment.

1

2

3

62,66,67

(2)
Low

x=65 high(l) hypnotic susceptibility row

x=60

x=64

61,58,58

62,60,68

x=58 low (2) hypnotic susceptibility row A (2)

x=59

x=60

x,=61.5 column mean

Hypnotic
Susceptibility
(A)

64,65,63

56,60,58

(1)
High

63,59,58

x2=59.5 column mean

Row Means

x3=62 column mean s=61 Grand
Mean

Row
Mean A (1) x, =63

Row
Mean A (2) x2 =59

The scores in this design represent the TAI scores for subjects who had high and low susceptibility to hypnosis and who were randomly assigned to three groups: relaxation therapy, hypnosis and systematic desensitization. As you remember, the dot notation above x, indicates rows the dot notation in the first part of the of the subscript indicates columns. Let us go through the steps we used earlier. First, we will test the main effects of factor A. The null hypothesis for the row effect is Ho: u,
= \i2 =—Hi.- This indicates that the population means are equal. From the above table x, =63 = u,l. and x2 =59 are the estimates of the population means for the subjects with high and low susceptibility to hypnosis. The
"I" in the above notation indicates rows.

138

Test Anxiety: Applied Research

The null hypothesis for the B main effect is:
Ho: U-, = u 2= u j. This denotes the population are equal. Again, x | = 61.5 = n,, x2 =59.5 = n2 a n d *i = 62 = u 3. These are estimates of the population columns. The "J" above indicates the column. Thus, this is a I X J design. That is, there are two levels of A (susceptibility-high and low) and three levels of treatments.
Sum of the Squares for a balance factorial design
Now let us calculate the sum of squares. Remember, that a balanced design means that there are an equal number of observations per cell. It is now possible to find the sum of squares of A, which is denoted by the following notation:
SSA = nJ S (x;. - x)2 The nJ is the number of observation upon which each row is based. The sum of the squares reflects the variability of the row means about the grand mean. For this example SSA is:
SSA =3(3) [(63 - 61)2 + (59 -61)2] = 72
Our next step is find the mean sum of squares which is:
MSA = SSA/(I-1)=72/1=72
Now we can determine the sum of squares for factor B. Again, the SSB
= nl S (x .j - x)2. This reflects the variability of the column means about the grand mean. For our example, it is:
SSB =3(2) [ (61.5 - 61)2 + (59.5 - 61)2 + (62-61)2 = 21
Now, the mean sum of squares of B (MSB) is:
MSB = SSB/ (J-l)=21/2= 10.5
Error Term
It was stated earlier that the error term represents the pooled within cell variability. Basically, for every cell, we take each score and deviate it about the cell mean, square the deviation and add the deviations across all the cells. Using the same formula presented earlier, the error term
(SSW) is:

6. Common Univariate Statistics

139

SSw=S(x - x ^ 2 cells SSW = (62-65)2 + (66-65)2 + (67 -65)2 ...(62-60)2 + (60-60)2 +
(68-60)2 = 52
MS W = SSW/ (N-IJ) = 52/ (18-6)=4.33 This represents the average of the cell variances.
The Main Effects for the F-Tests
Now, we can test the main effects for the F-tests.
F(A) =MSA/MSW = 72/4.33 = 16.63(1,12) p.05. This indicates that the null hypothesis was not rejected, or the population columns means are equal.
Interaction
The sum of the square for the interaction is SSAB= n S 0;/
Where 0^ = x^ - xr - x.j + x is the estimated cell interaction effect.
It is known that the sum of the interaction effects for a fixed effects design equal 0 for every row and column; therefore, it is only necessary to find the interaction effects for cells 0n and 012.
0,,= 65-63-61.5 +61 = 1.5 (cell 1 1)
0 l2 =60-63-59.5+61 =-1.5 (cell 1 2)
SSAB = 3 [(1.5)2 + (-1.5)2 + (-1.5)2 + (1.5) 2 ] = 27
MS AB= SSAB/(I-1 )(J-1 )=27/(2-1 )(3-1 )= 13.5
F(AB) = MSAB/MSW= 13.5/4.33 = 3.12

140

Test Anxiety: Applied Research

6. Common Univariate Statistics

The critical value for 2,12 degrees of freedom is 3.88, hence
F(A&)(2,12)=3.12, p>.05. The null hypothesis was not rejected.

2 - Specifies the design, which is gpidl,gpid2,gpidl *gpid2/. For a full model only design/ is needed.
3 - List provides a listing of the data.

Table 18
Control Line for Running a Two-way ANOVA with Equal n's on SPSSX

141

Table 19
Below is the selected output from the SPSSX run.

Title "two way ANOVA equal ns"
Data List/gpidl 1 gpid2 3 dep 5-6
Begin data
1
1
62
1

1

66

1
2
2
2

67
63
59
58

3

64

3

65
63
56
60
58
61
58
58
62
60
58

3

2
2

1

2

1

2

2

2

2

2

2
3
3

2
2
2

Tests of Significance for dep using unique sum of squares

1

3
End data
Manova depbygpidl(l,2) pnir 2(1 TV gpiu List' /ariables=gpidl gpid2 dep

Source of
Variation

SS

Within
Cells

52.00

12

4.33

66978.00

1

66978.00

15456.46

GPID1

72.00

1

72.00

16.62

.002*

GPID2

21.00

2

10.50

2.42

.131

GPID1
By
GPID2

27.00

2

13.50

3.12

.081

Constant

1

2
Hpciern/

3

1 - This produces the factorial ANOVA. The numbers in parentheses are the level being used for each factor.

DF

MS

F

Fsig ofF .000

"Indicates that statistical significance is reached since the two-tailed probability level is less than .05.
Run this design on SPSSX, and compare your results with the ones listed above from both the computer run and the actual calculations.
6.10 ONE-WAY ANALYSIS OF CO VARIANCE
Earlier it was noted that the analysis of covariance (ANCOVA) is used for randomized pre-post control group designs. Suppose 21 test anxious subjects who scored above the mean of the TAI are pretested and randomly assigned to a covert modeling group (Treatment 1), systematic desensitization group (Treatment 2) and a group to monitor study behavior (Treatment 3). Provided that the assumptions of ANCOVA are

143

Test Anxiety: Applied Research

6. Common Univariate Statistics

met, ANCOVA would be the appropriate statistical analysis.
Schematically, the design is presented below:

other. In the above equation for the adjusted mean, *y< equals the adjusted mean of the posttest or dependent variable, y, is the mean of the dependent variable scores for group i, while -b is the slope or regression coefficient, x, is the mean of the covariate scores for group i. Finally, xg is the grand mean of the covariate scores. Like ANOVA, ANCOVA is a linear model with the following structural model: yy= u + Oj +b(Xi-xg) +

142

Table 20
One-Way Analysis of Covariance

Treatment 1

Treatment2

Treatment3

Pretest

X
Posttest

Pretest

X
Posttest

Pretest

X
Posttest

01

02

PJ.

02

PJ.

02

53

56

54

58

53

56

51

54

55

59

52

57

53

55

55

57

52

57

51

53

54

59

53

57

52

54

53

58

54

58

51

53

51

55

51

55

54

56

52

57

54

57

01 is the covariate or pretest.
02 is the dependent variable or posttest.
X indicates treatment.
ANCOVA blends techniques of regression with ANOVA which permits statistical rather than experimental control of variables.
ANCOVA involves the use of a pretest or covariate, which is the variable to be controlled, and a posttest (criterion). The pretest is correlated with the posttest, thereby permitting one variable to predict the other.
ANCOVA determines the proportion of the variance of a posttest that existed prior to experimentation, this proportion is removed from the final statistical analysis. ANCOVA is used to adjust the posttest means on the basis of the pretest(covariate) means. Then the adjusted means [*yi =y; b(xi - xg)] are compared to see if they are significantly different from each

The only difference between this linear model and that of the oneway ANOVA is the regression expression +b(X r x^, which is used to find the regression of Y on X. Also, it is used in the regression equation to predict Y values from X values.
ANCOVA is a statistical method of controlling for the selection threat to internal validity. Note that it is only controlling for covariate(s) within an experiment; therefore, groups could conceivably differ on other variables not used in the experiment. Hence, ANCOVA is not a complete control for the selection threat like randomization. Earlier it was reported that the dependent measure t-test is more powerful than the t-test for independent samples. Similarly, when comparing ANCOVA to ANOVA,
ANCOVA is a more powerful test statistic. Like ANOVA, ANCOVA has assumptions; three are the same as those of ANOVA and three are unique.
Assumptions of ANCOVA
Normality
Homogeneity of variance
Independence
Linearity—a linear relationship between the covariate and dependent variable. 5. One covariate—the assumption is homogeneity of regression slopes.
Two covariates—parallelism of regression planes.
More than two covariates—homogeneity of regression hyperplanes.
6. The covariate is measured without error.

1.
2.
3.
4.

The first three assumptions were discussed with ANOVA; therefore, any violation of the assumptions are the same as those that occurred with
ANOVA. Even though this is not an assumption of ANCOVA, Stevens
(1990, p. 163) recommends limiting the number of covariates to the extent that the following inequality holds:

144

Test Anxiety: Applied Research

6. Common Univariate Statistics

145

Table 21
Control lines for running one-way ANCOVA
C-is the covariate.
J-is the number of groups.
N-is the total sample size.
If the above inequality holds, the adjusted means are likely to be stable. That is, the results should be reliable and cross-validated (Stevens,
1990,p. 163). The Johnson-Neyman technique is recommended when the homogeneity of regression slopes assumption is violated (Stevens, ibid).
Shortly, the control lines for running the ANCOVA on SPSSX will be presented. It is important to check the linearity and homogeneity assumptions from the SPSSX computer printout.

Title "one-way ANCOVA"
Data List List/gpid pretest posttest

Begin data
1 53 56
1 51 54
1 53 55
1 51 53
1 52 54
1 51 53
1 54 56
2 54 58
2 55 59
2 55 57
2 54 59
2 53 58
2 51 55
2 52 57
3 53 56
3 52 57
3 52 57
3 53 57
3 54 58
3 51 55
3 54 57
End data
Manova pretest posttest by gpid(l ,3)/ analysis= posttest with pretest/ print=pmeans/ design/ analysis=posttest/ design=pretest,gpid,pretest by gpid/ analysis-pretest/ 5 - The format for this design subcommand is covariate,grouping variable,covariate by grouping variable.

146

Test Anxiety: Applied Research

6. Common Univariate Statistics

1 - List—tells the computer that it will be in order of the data list command. The only requirements of the List format is that data is separated by a space.
2 - The covariate follows the key word with
3 - provides adjusted means
4 - tests homogeneity of regression slopes
5 - this tests if the pretests differ significantly from each other.

SAS ANCOVA

Table 22
SPSSX Control Lines for Testing the Homogeneity Assumption with Two Covariates
For two covariates, the control lines for testing the homogeneity of assumption are as follows: analysis=posttest/ design=pretestl+pretest2,gpid,pretestl by gpid+pretest2 by gpid/

Data Ancova;
Input gpid pretest posttest @(
Cards;
1 51 54
1 53 55
1 52 56
1 54 56
1 51 53
1 52 54
2 54 59
2 55 57
2 55 59
2 52 57
3 53 56
2 51 55
3 54 58
3 53 57
3 52 57
3 54 57
Proc Print;
Proc GLM;
Classes gpid;
Model posttest Pretest gpid gpid *pretest;
Proc GLM;
Model Posttesl Pretest gpid
LSMeans gpid/PDIFF;

Note when multiple covariates (two or more) are employed, simple linear regression is no longer applied. This situation requires multiple regression, a statistical procedure that finds predictors which are maximally correlated with a dependent variable. Logically, the formula for the adjusted means becomes:
*Fi = Fi-biCxy-x,) - b2 (x2j-x2) -... -bk(xuj-xk)

1
2
2
3
3

147

51
54
53
52
51

53
58
58
57
55

Table 23
Selected Output from SPSSX ANCOVA run
Tests of significance for posttest using the unique sum of squares source of variation

b; are regression coefficients, xy> is the mean for covariate 1 in group J, y> g j i the mean f covariate 2 i group jj, and so on. Finally, h i is for in x2.. xk are the grand means for the covariates.

SS

DF

MS

F

Sigof
F

Within cells

10.30

17

.61

Regression

16.56

1

16.56

27.32

.000*

Constant

3.13

1

3.13

5.16

.036

GPID

16.93

2

8.47

13.97

.000**

149

Test Anxiety: Applied Research

6. Common Univariate Statistics

The F ratio for ANCOVA=(SSb*/(K-l))/SSw*/(N-K-C)=MSb*/MSw*

* This indicates a significant relationship between the covariate and the dependent variable.
**These are the main results of ANCOVA. The results indicate the adjusted population means are unequal, indicating statistical significance.

148

C is the number of covariates, and one degree of freedom for the error term is lost for every covariate employed.

Tests of significance for pretest using the unique sum of squares source of variation

***This is the test of homogeneity of regression slopes. The slopes are not significantly different; hence, the assumption is tenable.
****This indicates that the subjects on the three pretest did not differ.

SS

DF

F

Sigof
F

Within+ residual 9.63

15

.64

Constant

2.88

1

2.88

4.49

.051

Pretest

15.67

1

15.67

24.40

GPID

.83

2

.41

.65

.538

Pretest by
GPID

.67

2

.33

.52

.605***

Tests of significance for pretest using the unique sum of squares source of variation

SS
Within
Cells
Constant
GPID

Exercises
1. Run the data given with the one-way ANCOVA example. First, run the data as a simple one-way ANOVA, omitting the covariate. On the second run, include the covariate. Compare your results.
2. Use the following equation to verify the adjusted means from the
ANCOVA run of exercise 1. *y= y - b(x; - xg). Compare your results. .000

MS

DF

30.00

18

MS

F

Sigof
F

6.11 POST HOC PROCEDURES FOR ANCOVA
Stevens (1990) and Kirk (1982) recommend the Bryant-Paulson procedure, which is a generalization of the Tukey procedure, as a post hoc test for ANCOVA. There are two formulas for this procedure, one for randomized designs and another for non-randomized designs. The randomized and nonrandomized formulas are given below: randomized design: y* - y* yMsw*[l+Msbx/Mswx]/n nonrandomized design:

1.67

yi* ,/Msw*([2/n + (x, - X//SSJ/2

58460.19

1

58460.19

5.81

2

2.90

35076
1.74

.000
.203****

n is the common group size
If the group sizes are unequal, the harmonic mean is used.

150

Test Anxiety: Applied Research

Msw*is the error term for the covariance.
Msbx and Ms^ are the mean between and within sums of squares from the analysis of variance on the covariate only.
Let us conduct three post hoc tests for the ANCOVA example, assuming a nonrandomized design.
Groups 1 and 2
54.89 - 57.08
(.61 [2/7 + (54.43 - 57.57)2 /30J/2)"2

= -5.06

Group 2 and 3
57.08 - 56.75
= 1.07
(.61 [2/7 + (57.57 - 56.71)2 /30]/2) 1/2
Groups 1 and 3
54.98 - 56.75
(.61 [2/7 + (54.43 - 56.71)2 /30]/2)l/2

= -4.97

With an alpha level of .05, one covariate, three groups and 17 (N-J-C) error degrees of freedom, the critical value of the Bryant-Paulson
Procedure is 3.68. This value was found in Table E. Therefore, we can conclude that groups 1 and 2 and groups 1 and 3 are significantly different, while groups 2 and 3 are not significantly different.
6.12 SPSSX CONTROL LINES FOR
FACTORIAL ANALYSIS OF COVARIANCE
The control lines for factorial analysis of variance are basically a generalization of the ones for one-way ANCOVA and factorial ANOVA; therefore, factorial ANCOVA is a logical combination of factorial
ANOVA and the one-way ANCOVA. The following are the control lines for running a factorial ANCOVA design.

6. Common Univariate Statistics

151

Title "Factorial Ancova"
Data List Free/gpidl gpid2 dep covar
List
Begin data

1
1
1
1
1
1
2
2
2
2
2
2

1
1
2
2
3
3
1
1
2
2
3
3

95
70
80
40
40
30
50
80
90
35
95
85

40
20
30
90
70
90
70
70
70
95
50
35

End data
Manova covar dep by gpidl (1,2) gpid2 (1,3)/
Method = SSTYPE (Unique)/
Analysis dep with covar/
Print = pmeans/
Design/
Analysis = dep/
Design = covar, gpidl, gpid2, covar by gpidl gpid2/
Analysis = covar/
6.13 NESTED DESIGNS
We stated in Chapter 4 that nested designs are called incomplete designs. Usually, one level of one factor is paired with one level of another factor. This is called a completely nested design. Suppose we were interested in the extent that students' test anxiety depended on their schools and teachers. We get three schools to participate in our study, and each school has two teachers who are willing to participate. Next, we are able to randomly assign three students to teachers' classes during an exam phase. We can treat the schools as levels of Factor A and the teachers as levels of Factor B, and we have a 6 X 3 design with three test anxiety scores in each cell. Essentially, students are nested within classes and schools. Schematically, this design can be depicted as:

Test Anxiety: Applied Research

6. Common Univariate Statistics

6 X 3 Nested Design

152

SPSSX Nested Design

Al
Bl

A3

70, 68, 64

B2

A2

69, 70, 70

B3

64, 68, 64

B4

62, 62, 59

B5

63, 66, 63

B6

59, 54, 54

Each level of Factor B is associated or paired with one level of A; in contrast, each level of Factor A is connected with two levels of Factor B.
Below are the SPSSX control lines for running this design.

153

Title "Nested design for TAI scores"
Data list free/Teachers Schools dep
List
Begin data
1 1 70
1 1 68
1 1 64
2 1 69
2 1 70
2 1 70
3 2 64
3 2 68
3 2 64
4 2 62
4 2 62
4 2 59
5 3 63
5 3 66
5 3 63
6 3 59
6 3 54
6 3 54
End data
Manova dep by Teachers (1,6) Schools (1,3)/
Design = Schools, Teachers, Teachers within schools1
'The within command indicates that teachers were nested within schools. To summarize, we described a 6 X 3 nested design. The term nested suggested that dependent variables will not be in every cell; hence, these are called incomplete designs. Factor B was nested with Factor A. The design notation for this would be B(A). This means that Factor B is

Test Anxiety: Applied Research

6. Common Univariate Statistics

nested within Factor A. The following are the SAS control lines for nested ANOVA, and a discussion of how to calculate the sum of squares is provided.

155

Degrees of Freedom for Nested ANOVA

154

SAS Nested Design
Data Nested;
Input Teachers Schools Dep;
Cards;
Data Lines
Proc ANOVA;
Class teachers—Schools;
Model Dep = Teachers|Schools|Teachers (Schools);
Means Teachers|Schools;
Test H=Teachers Teachers* Schools E=Teachers* Schools (Schools);
The sum of squares for a nested factor (SSnf) can be calculated as a residual or through subtraction using the following:
SS nf =SS cells -SS b = SS t -SS b -SS w
For example, if SSt = 235.00, SSb = 112.50, and SSW = 18.50, SSnf =
104, which is the sum of the squares for the nested factor. For fixed effects models, the nested factor is not a random effect, the means squares within cells is used for post hoc and planned comparisons, and for mixed models, the nested factor is a random effect, the appropriate error term is the means squares for the nested factor. In summary, careful thought must be used in determining the appropriate error term for nested designs.
Finally, nested designs do not allow one to separate main effects from interaction effects. If effects cannot be separated, they are said to be confounded. When a design is not completely cross, some factors will be confounded. The table that follows provides the degrees of freedom for the nested ANOVA.

Source
A

B(A)
Error

Df
1-1
I(J-l)
N-I

6.14 SUMMARY
In summary, this chapter described three sampling distributions or theoretical probability distributions that are important in order to conduct research on test anxiety. First, we described the normal distribution. It should be understood that the normal distribution is a special case of the t-distribution. For example, using Table A, a z-score of 1.96 cuts off 95% of the area under the normal curve; one of 2.58 cuts off 99% of the area.
When 95% of the area under the normal curve is cut off, 5% is equally distributed between the two tails with 2 1/2% on each tail. A similar finding occurs for the z that corresponds to 99% of the area under the normal curve. Using table A, if one located a two-tailed t value with infinity degrees of freedom at the .05 level, it becomes apparent that this value corresponds to the probability of obtaining a z score at the .05 level of significance, which is 1.96. The same thing will occur for a two-tailed critical value oft at the .01 level where the critica1 value is 2.58.
In this chapter, the second distribution described that is related to the normal curve is the t-distribution. It was noted earlier that a t value squared equals F. Using table C, if we locate an F value with 1 and 6 degrees of freedom at the .05 level of significance, the value equals 5.99.
Now, if we locate a critical value oft with 6 degrees of freedom at the .05 level of significance for a two-tailed test, the value is 2.447, which is the square root of F.
The third distribution described in this chapter was the F-distribution, which is a special case of a distribution called Chi-Square. If a critical value from the F-distribution, with degrees of freedom of the denominator set at infinity is multiplied by the degrees of freedom of the numerator, the result is a critical value of the Chi-Square distribution with the same degrees of freedom as the numerator of the F-distribution. For example, a critical value of F at the .05 level of significance with 6 and infinity degrees of freedom at the .05 level is 2.10. If we multiply 2.10 by 6, the

156

Test Anxiety: Applied Research

result is 12.6; this is the critical value of Chi-Square with 6 degrees of freedom at the .05 level of significance. The exact value of Chi-Square found in table J is 12.592; but when rounded to the nearest tenth, it equals
12.6. In summary, there is a relationship among the normal, t, F, and ChiSquare distributions. Figure 16 depicts the t distribution, and Figures 17 and 18 illustrate the F and Chi-Square distribution, respectively.

6. Common Univariate Statistics
"Figure 16 t-distribution degree of freedom =~

Values of t

157

158

Test Anxiety: Applied Research

6. Common Univariate Statistics

Figure 17
F distribution

Figure 18
Chi Square distribution

Critical
Value
of

F

159

160

Test Anxiety: Applied Research

References
Cohen, J. (1988). Statistical power analysis for the behavioral sciences
(3rd ed.)- New York: Academic Press.
Harwell, M. (1998). Misinterpreting interaction effects in analysis of variance. Measurement and Evaluation in Counseling and
Development, 31(2), 125-136.
Hays, W. (1981). Statistics (3rd ed.). New York: Holt, Rinehart and
Winston.
Heppner, P. P., Kivlighan, D. M., & Wampold, B. E. (1992). Research design in counseling. Pacific Grove, CA: Brooks/Cole.
Huck, S. W., Cormier, W. L., & Bounds, W. G. (1974). Reading statistics and research. New York: Harper and Row.
Kirk, R. (1982). Experimental design: Procedures for the behavioral sciences (2nd ed.). Pacific Grove, CA: Brooks/Cole.
Kirk, R. (1995). Experimental design: Procedures for the behavioral sciences (3rd ed.). Pacific Grove, CA: Brooks/Cole.
Rummel, R. J. (1970). Applied factor analysis. Evanston, IL:
Northwestern University Press.
Shavelson, R. J. (1981). Statistical reasoning for the behavioral sciences. Boston: Allyn and Bacon.
Siegel, S. (1956). Nonparametric statistics for the behavioral sciences.
New York: McGraw-Hill.
SPSSX User's guide (3rd ed.). (1988). Chicago: SPSS, Inc.
Stevens, J. P. (1990). Intermediate statistics: A modern approach.
Hillsdale, NJ: Lawrence Erlbaum.
Welkowitz, J., Ewen, R. B., & Cohen, J. (1982). Introductory statistics for the behavioral sciences (3rd ed.). New York: Academic Press.
Winer, B. J. (1971). Statistical principles in experimental design. New
York: McGraw-Hill.

7. Multivariate Research Statistical Methodology Using SPSSX and SAS 161

Chapter 7
CONTENTS
Multivariate Research Statistical Methodology Using SPSSX Computer
Software:
7.1
One-Group Repeated Measures ANOVA
7.2
Assumptions of Repeated Measures
7.3
Violations of Sphericity Assumption
7.4
Controversy in Calculations of Greenhouse Epsilon Statistic
7.5
Tukey Post Hoc Procedure for One-Group Repeated Measures
Design
7.6
Tukey Confidence Intervals for One-Group Repeated Measures
ANOVA
7.7
One-Group Repeated Measures ANOVA Exercises
7.8
Multiple Regression
7.9
Steps for Cross-Validating Regression Equations
7.10 Number of Recommended Subjects Per Predictor for Regression
Equations
7.11
Relationship Among R2, Y, and F
7.12
Assumptions of Multiple Regression
7.13
When Regression Assumptions Appear to Be Violated
7.14
Six Methods of Selecting Predictors and Regression Models
7.15
SPSSX Control Lines for Running the Backward Elimination
Process
7.16
Multiple Regression Exercises
7.17
K Group MANOVA
7.18
Assumptions of MANOVA
7.19
SPSSX for K Group MANOVA
7.20
K Group MANOVA Exercises
7.21
Factorial Multivariate Analysis of Variance
7.22
Factorial MANOVA Exercises
7.23
Multivariate Analysis of Covariance: Three Covariates and
Three Dependent Variables
7.24
Factorial MANCOVA: One Covariate and Two Dependent
Variables
7.25
One-Way MANCOVA Exercises
7.26
Post Hoc Procedures for MANCOVA
7.27
Nested MANOVA
7.28
Summary
7.29
Choosing a Statistical Procedure
7.30
Statistical Application Exercises

162

Test Anxiety: Applied Research

7.1 ONE-GROUP REPEATED MEASURES ANOVA
. The one-group repeated measures ANOVA (analysis of variance) can be viewed as an extension of the repeated measures t-test. With the repeated measures t-test, one group of subjects is measured repeatedly at two points in time. With the repeated measures ANOVA, however, one group of subjects is measured at three or more points in time. Just as the repeated measures t-test is more powerful than the independent measures t-test, the repeated measures ANOVA is more powerful than the ANOVA for k independent groups. The reader should note that sometimes K independent group designs are referred to as between-group designs.
Note that there are many names for the repeated measures ANOVA, including Linquist Type I, blocked designs, split-plot ANOVA, withinsubjects design with repeated measures on one factor, two-way
ANOVA with repeated measures on one factor, treatments-bysubjects designs, and mixed designs.
In many situations, the repeated measures design is the only design of choice. For example, the repeated measures ANOVA is useful for measuring educational performance over time. Similarly, with clinical populations where subjects' availability is limited, repeated measures designs are the ones of choice. Unlike between-subjects or completely randomized designs, repeated measures designs require fewer subjects because the same subject serves as his or her own controls. Not only is this beneficial in reducing the number of subjects required, but it also reduces within-group variability or individual differences among subjects.
Essentially, with the repeated measures ANOVA, variability due to individual differences is removed from the error term. This is what makes the repeated measures design more powerful than a between-groups design or a completely randomized design, where different subjects are randomly assigned to different treatments.
7.2 ASSUMPTIONS OF REPEATED MEASURES ANOVA
The three assumptions of the repeated measures design are as follows: 1. Independence of observations
2. Multivariate normality
3. Sphericity or circularity-homogeneity of variance for the difference scores for all pairs of repeated measures, or equality of variances for all difference variables. This statistic is found on the SPSSX and
SAS printout as the Greenhouse-Geisser Epsilon. This assumption

7. Multivariate Research Statistical Methodology Using SPSSX and SAS 163

is tenable if the Greenhouse-Geisser Epsilon equals one. The worst possible violation of the sphericity assumption occurs when the value of the Greenhouse-Geisser Epsilon = l/(k-l), where k is the number of repeated measures (Greenhouse & Geisser, 1959; Keppel, 1982;
Stevens, 1990).
The reader can recall that the first assumption, independence of observations, was discussed in chapter 6. It was noted that this is a serious violation. The same holds true for a repeated measures ANOVA.
Also, recall from chapter 6 that the independent t-test and the independent k group ANOVA are robust to the violation of the normality assumption.
Shortly, the reader will notice a similar finding for the repeated measures
ANOVA. In terms of multivariate normality, Stevens (1986, p. 205) and
Johnson and Wichern (1988, pp. 120-168) stated three necessary conditions for its tenability. First, each variable must approach normality.
Second, linear combinations of any variables are normally distributed.
Third, all subsets of variables have a multivariate distribution. That is, all pairs of variables must be bivariate normal. Do not be concerned with the multivariate normality assumption since Stevens (1986, p. 207) and Bock
(1975, pp. 104-163) presented evidence that the repeated measures
ANOVA is robust to the violation of this assumption, which is the same thing that occurred in chapter 6 with the t-test for independence and the
F-test for k independent groups (Stevens, 1996).
7.3 VIOLATIONS OF THE
SPHERICITY ASSUMPTION
If the sphericity assumption is violated, the type I error rate of the repeated measures ANOVA is positively biased. That is, the null hypothesis is rejected falsely too often. When this assumption is violated,
Huynh and Feldt (1976) and Stevens (1986, p. 413) recommend adjusting the degrees of freedom of the numerator and denominator from (k-1) and
(k-1 )(n-1), respectively, to e(k-1) and e(k-1 )(n-1). Where k is the number of repeated measures or the number of treatments; n is the number of subjects within a treatment, while e is the Greenhouse-Geisser Epsilon statistic obtained from the SPSSX printout. These new degrees of freedom are used to locate the critical value of F to test for statistical significance. Other tests of sphericity can be obtained from SPSSX; however,
Stevens (1986, p. 414) and Kesselman, Rogan, Mendoza, and Breen
(1980) argue that these tests, like their univarite counterparts, are sensitive

164

Test Anxiety: Applied Research

to violations of multivariate normality; therefore, these tests are not recommended. 7.4 CONTROVERSY IN CALCULATIONS OF
GREENHOUSE EPISILON STATISTICS
It should be noted that recently a controversy occurred in terms of the calculation of the Greenhouse-Geisser Epsilon statistic. In the Journal of
Educational Statistics, Le Countre (1991,p.371) reported that the routine formula that is used to calculate the Greenhouse-Geisser Epsilon statistic, when the condition of circularity is not fulfilled, may lead to a substantial underestimation of the deviation from circularity, especially when the number of subjects is small. Similarly, he reported that this routine formula, which is the one used by SPSSX is erroneous in the case of two or more groups. Therefore, he recommended the following corrected formula when the number of groups (g) a 2: Greenhouse-Geisser Epsilon

7. Multivariate Research Statistical Methodology Using SPSSX and SAS 165

Chapter 4, the reader can remember that counterbalancing is a method of handling carry-over effects in one-group designs. Suppose we wanted to know the effects of four treatments supportive counseling, relaxation therapy, systematic desensitization, and hypnosis on reducing test anxiety in a group of five subjects. Schematically, this one-group repeated measures design can be depicted as:
Table 1
One-Group Repeated Measures Design

Treatments for Test Anxiety

Subjects

Where r is the orthogonal (uncorrelated) normalized variables associated with each within-subject factor and their interactions

Where e =

- i) 2 j2 - 2 k U ; 2 + k2?2)

k is the number of levels for the within variable s is the mean of all entries in the covariance matrix S s;i is the mean of entries on the main diagonal of S
S; is the mean of all entries in row i of S
Sjj is the ijth entry of S
Clearly the repeated measures ANOVA requires fewer subjects than does a k independent group design. Similarly, the repeated measures
ANOVA removes individual differences from the repeated measures Ftest; however, there are disadvantages such as carry-over effects from one treatment to the next, which can make interpretations difficult. From

3

4

1

50

48

36

54

47

34

38

30

42

36

3

44

40

38

50

43

4

l)re-2
(N-g
r(N-g-re)

2

2 e/ =

1

Row
Means

58

54

40

64

54

5

46

48

34

50

44.5

Column
Means

46.4

45.6

35.6

52

Grand Mean = 44.9
Overall Standard Deviation = 8.86
Completely Randomized Univariate Repeated Measures Analysis
SSb = Sums of squares for the column means or the between-group sum of squares.
2
2
2
2
SSb = S[(46.4-44.9) + (45.6-44.9) + (35.6-44.9) + (52-44.9) = 698.2
2
2
2
SS = (50-46.4) + (34-46.4) + ... (46-46.4) Treatment 1
+ (48-45.6)2 + (38-45.6)2 + ... (48-45.6)2 Treatment 2
+ (36-35.6)2 + (30-35.6)2 + ... (34-35.6)2 Treatment 3
+ (54-52)2 + (42-52)2 + ... (50-52)2
Treatment 4
= 793.60

166

Test Anxiety: Applied Research

SSbl or the sum of squares for blocks = K2(y r y) 2 and K is the number of repeated measures.
SSbl can be views as a quantity obtained by finding the sum of squares for blocks, because we are blocking on the participants. Error variability can be calculated by segmenting variability into three parts,
SSW, SS bl , and SS ra .
SSres is called the sum of squares residual, and SSW can be partitioned into SSb plus SS ra ; therefore, SSW = SSbl + SSrK, and SSres = SSW - SSbl.
The calculation for the sum of squares for blocks is as follows:
SSbl = K2(y r p) 2
= 4[(47-44.9)2 + (36-44.9)2 + (43-44.9)2 + (54-44.9) + (44.544.9)2
= 680.80
SSrM = SSw-SSbl = 793.60 - 680.80 = 112.80
MS re = SSres/(n-l) (K-l), where n equals the number of participants, and K is the number of repeated measures; therefore, MS ra =
112.80/4(3) = 9.40.
F = MSb/MSres = 232.73/9.40 = 24.76, and the degrees of freedom are (K-l)=3 and (n-1) (K-l) = 12. This test statistic is significant beyond the .01 level and is about 5 times larger than the F one would obtain from a between-groups or completely randomized design.
Table 2
Control Lines for Running One Group Repeated Measures
Design on SPSSX
Title 'Repeated Measures ANOVA1
Data List/TAIl 1-2 TAI2 4-5 TAI3 7-8 TAI4 10-11
List
Begin data

50
34
44
58
46

48
38
40
54
48

End data

36
30
38
40
34

54
42
50
64
50

7. Multivariate Research Statistical Methodology Using SPSSX and SAS 167

MANOVA TAI1 to TAI4/
Wsfactor=TAI(4)
Wsdesign=TAI/

2
|_3j

Print=transform cellinfo(means)error(cor) signif(averf)/
Analysis(repeated)/
Below is the annotated printout from SPSSX for the repeated measures
ANOVA.
1 - Gives the listing of the data
2 - Lets the computer know there are four levels of the within factor
3 - Specifies the design used
SAS Repeated Measures ANOVA
Title "Repeated Measures ANOVA";
Data Repeated;
Input gpidl 1 gpid2 2 TAI 4-5;
Cards;
Data Lines
Proc Print; proc GLM;
Class gpidl gpid2;
Model TAI=gpidlgpid2j

168

Test Anxiety: Applied Research

7. Multivariate Research Statistical Methodology Using SPSSX and SAS 169

Table 3
Greenhouse-Geisser Epsilon Statistic
Mauchly
Sphericity Test,
W=
Chi-Square
Approx.=
Significance=

1.00000

Lower-Bound
Epsilon=

.33333

ofF

28.41

3.00

2.00

.034*

42.62

28.41

3.00

2.00

.034*

Wilks

.022

28.41

3.00

2.00

.034*

Roys

.97707

.98

Hotelings

.470

Huynh-Feldt
Epsilon=

Error
DF

Value

Pillais

4.57156 with 5
D.F.

.60487

Hypoth
DF

Test
Name

.18650

GreenhouseGeisser Epsilon=

Table 4

Multivariate Tests of Significance
Effect TAI

This corresponds to e, indicating a moderate departure from sphericity

Approx

Sig.

170

7. Multivariate Research Statistical Methodology Using SPSSX and SAS 171

Test Anxiety: Applied Research

Tukey procedure, with the one sample repeated measures design, the
MSW is replaced with MSres. This is the error term for the repeated measures ANOVA, and n is still the common group size or the number of subjects. Table 5
Univariate Tests of Significance

Averaged Test of Significance for TAI
Using Unique Sums of Squares
SS

DF

MS

F

Sig ofF Within
Cells

122.80

12

9.40

24.76

.000**

TAI

698.20

3

232.73

7.6 TUKEY CONFIDENCE INTERVALS FOR
ONE-GROUP REPEATED MEASURES ANOVA
For the one-group repeated measure ANOVA, simultaneous confidence intervals can be obtained with the following formula:

Source of
Variation

( X - X ) ± q.ce = .05;k,(n-l)(k-l) -^
1

Table 6

Notations from Annotated Output for
One Group Repeated Measures ANOVA

*Corresponds to the multivariate tests, which are significant at the .05 level.
**This is the univariate test. We must adjust the degrees of freedom. The adjusted univariate test is .60487(3) and
.60487(3)(4), or 1.8 and 7.25844 degrees of freedom. Rounding to whole numbers, we have 2 and 7 degrees of freedom. The critical value of F with 2 and 7 degrees of freedom at the .05 level is 4.74, which indicates that the adjusted univariate test is still significant at the .05 level. Note: the subcommand Signif
(AVERF UNIV GG HF)/ provides this adjusted univariate test.

J

i Xj are the means of two groups. If the confidence interval includes
0, we fail to reject the null hypothesis and conclude that the population means are not different.
Using Table D, if alpha is set at .05, q=4.2 with df=k,(n-l)(k-l)=(51)(4-1) + 4,12. The following confidence intervals can be established:
(X-X) ± 4.20
TAI1 vsTAI3

(45.4 - 35.6) =

TAI2 vs TAD
7.5 TUKEY POST HOC PROCEDURE FOR
ONE-GROUP REPEATED MEASURES DESIGN
If the F-test for a repeated measures ANOVA is found to be significant and sphericity assumption is tenable, a modification of the
Tukey HSD can be used to make pairwise comparisons.
As with the K group ANOVA, the Tukey procedure can be modified to obtain simultaneous confidence intervals. In contrast to the K group

^, df for error = (n-l)(k-l) n 9A0

5

- 5 76
5.04
16.56 upper lower limit limit
10.8 ±5.76
10.8 + 5.76 = 5.04
10.8-5.76 = 16.56

A ~)A

lower limit (45.6 - 35.6) =

-15.76 upper limit

10 ±5.76
10 + 5.76 = 15.76
10-5.76 = 4.24

172

Test Anxiety: Applied R

TAI3 vs TAI4

-10.64 lower limit

7. Multivariate Research Statistical Methodology Using SPSSX and SAS 173

-22.16 upper limit

-16.4 ±5.76
-16.4 + 5.76 = -10.64
-16.4-5.76 = -22.16

All three confidence limits are significant, since 0 is not included in the intervals. 7.7 ONE-GROUP REPEATED MEASURES
ANOVA EXERCISES
1. How many nonredundant confidence intervals can be established for the data given at the beginning of this chapter for the repeated measures ANOVA?
(Answer [K(K-l)/2=6])
2.

With the data given for the repeated measure example, establish a confidence interval for TAI1 vs TAI2. What is the decision in terms of the null hypothesis?
(Answer: The null hypothesis is not rejected since 0 is included in the confidence interval.)
TAI1 vs TAI2 -4.96
6.56
(46.4-45.6) =
.8 ± 5.76
.8 + 5.76 = 6.56
.8-5.76 = -4.96

3.

With the same data, establish a confidence interval for TAI 1 vs TAI4.
(Answer:
-11.36
.16)
46.4-52 =
-5.6 ± 5.76
-5.6+ 5.76 = .16
-5.6-5.76 = -11.36

4.

Run the following repeated measures design on SPSSX and perform the appropriate post hoc procedures. Which group means are significantly different?

TAI2

TA3

1
(35.6 -52) =

TAI1
50

48

54

2

34

38

42

3

44

40

50

4

58

54

64

5

46

48

50

(Answer: The univariate F test is 12.67, p=.003. The adjusted univariate test is significant at the .05 level, since the adjusted degrees of freedom is e(k-l) and e(k-l)(n-l)=.66564(3-l) and
.66564(3-l)(5-l)=1.33128 and 5.32512. The critical value of F with
1 and 5 degrees of freedom is 6.61. Post hoc tests indicate the following differences: TAI3 differed from TAI1 and TAI3 differed from TAI2.)
Solutions for exercise 4
If a = .05, q.05, df=3,8=4.041
4.041^/MSres/n = 4.041v/4.80/5 = 4.041 (.09797959) = 3.9593552 = 3.96

TAI1 andTAI3
(46.4-52) ±3.96
-5.6+ 3.96 = -1.64
-5.6-3.96 = -9.56
TAD and TAI2
(52-45.6) ±3.96
6.4 ±3.96
6.4 + 3.96=10.36
6.4-3.96 = 2.44

174

Test Anxiety: Applied Research

Using Table C, if alpha is set at .05, q=4.041 with df=k=3, and (n-l)(k1)=(5-1)(3-1) = 8. Confidence intervals can be established with the following formula:
X. - X

7. Multivariate Research Statistical Methodology Using SPSSX and SAS 175

For the SPSSX computer example that will follow, we will only consider the two-predictor case; however, one can easily make generalizations to the k-predictor case. The mathematical model for a two-predictor case is (Pedhazur, 1982, p. 46):

4.04K

J ±

Y = b,X, + b2X2 + C
7.8 MULTIPLE REGRESSION
The reader can consult Sapp (1997) for a detailed discussion of univariate correlational techniques and regression. And the reader should note that regression is a special case of structural equation modeling.
Chapter 9 will provide an introductory discussion to structural equation modeling. With multiple regression, one is interested in using predictors (Xs also called independent variables) to predict some criterion or dependent variable (the Y's) (Pedhazur, 1982). Essentially, a set of X values is used to predict a dependent variable. For a one predictor case, the mathematical model is:
Y = bX + C b = the slope, which can be expressed as Y2 - Y,/X2 - X,
C = the Y-intercept
The above equation for Y' is called a regression equation (Pedhazur,
1982, p. 45). The values of X are fixed (predictors); the values of Y are subject to vary.
We are interested in finding a line that best fits the relationship between Y and X; hence, this is called the regression of Y on X. The graph below represents the regression of Y on X.
Regression of Y on X
» *

0
0

'

b, is the slope of the dependent variable (Y) with predictor X,, while the second predictor X2 is held constant. Similarly, b2 is also the slope of Y, with predictor X, held constant.
C = the Y intercept
Below is a schematic design for a two predictor case.
Table 7
Schematic Design for Two-Predictor Case
Schematically, the design for the two predictor case is:

x,
52
53
53
54
55
55
55
57
58
58

X2

Y

51
52
52
53
54
55
56
58
59
60

58
57
58
55
54
55
53
55
53
52

,

The Xs are the predictors, and Y is the dependent variable. For this example, X, represents measures of stress, X2 are measures of worry, and
Y is the dependent variable, the TAI.

176

Test Anxiety: Applied Research

The two-predictor case can also be expressed using matrix algebra and parameters or population values. For example, for the two predictor case, the model using matrix notation is: y = Bo + B,X, + B2X2 + e,
Where Bo is the regression constant or Y intercept,
B, and B2 are the parameters to be estimated, and e, is the error of prediction.
If we let y be a column vector, XB the product of two matrices and e, a column matrix, the traditional matrix equation for multiple regression can be established:

7. Multivariate Research Statistical Methodology Using SPSSX and SAS 177

7.10 NUMBER OF RECOMMENDED SUBJECTS PER
PREDICTOR FOR REGRESSION EQUATIONS
When several predictors are employed in regression with a minuscule sample size, the result will yield a poor prediction for an independent sample of subjects. If the number of predictors approximate the sample size, R will produce a value close to or equal to 1, even if none of the predictors correlate with the criterion (Borg & Gall, 1983, pp. 602-603).
In order to have a reliable regression equation, Stevens (1986, p. 58;
1996) recommends approximately 15 subjects per predictor. He suggests that an n/k ratio of 15/1 is needed to have a regression equation that crossvalidates. The n corresponds to the sample size and k is the number of predictors. In chapter 6, it was noted that analysis of variance was a special case of regression. Below is the analysis of variance table for regression. y = XB + e
Using differential calculus, it can be shown that the least squares estimates of the B's is (Finn, 1974, p. 135):

Table 8
Analysis of Variance Table for Regression

B'=(X'X)-'X'y

Source

Note that the least squares criterion minimizes error in prediction. This is done by finding a linear combination of the Xs, which is maximally correlated with y. And this is why there is shrinkage, or drop-off of prediction power with regression equations and the reason they must be cross-validated.
7.9 STEPS FOR CROSS-VALIDATING
REGRESSION EQUATIONS
There are four steps to cross-validation (Huck, Cormier, & Bounds,
1974, p. 159). First, the original group of subjects for whom the predictors and criterion scores are available are randomly split into two groups. Second, one of the subgroups is used to derive a regression equation. Third, the regression equation is used to predict criterion scores for the group that it was not derived from. Fourth, the predicted criterion scores are correlated with actual criterion scores. If there is a significant correlation, this indicates there was not shrinkage in predictive power.

SS

Regression

ss reK

k

Residual or
Error

ss r M

n-k-1

MS

df

F

SSreB/k

MS reK /MS rK

SS r K /(n-k-l)

Table 9
Multiple Regression Broken Down into Sums of Squares

sum of squares about the mean

sum of squares about regression

sum of squares due to regression

S(Y - Y)2 df=n-l S(Yi - Y)2 df=n-k-l S(Y'-Y) 2 df=k 178

Test Anxiety: Applied Research

7.11 RELATIONSHIP AMONG R2, Y AND F
The squared multiple correlation R2 can be expressed as the ratio of the sum of squares due to regression divided by the sum of squares about the mean. The following is the algebraic formula for R2:
R2 = 2(Y'-y) 2 /2(Y-y) 2
R2 is called the coefficient of determination, or the proportion of variance accounted for on Y by a set of predictors. Bock (1975, p. 184) demonstrated a more direct relationship between R2 and F. The equation is: R2/k

with df+k and (n-k-1)
7.12 ASSUMPTIONS OF MULTIPLE REGRESSION
The assumptions of multiple regression are as follows:
1. Linearity
2. Homoscedasticity of variance
3. Normality
4. Independence of error
Normality and Independence of Error
Normality and independence of error were discussed in chapter 6. As was the case with the independent t-test and F-test, multiple regression is robust to the assumption of normality. In terms of the independence of error assumption, however, a violation creates severe problems as it did in chapter 6 with the independent t-test and F-test. As was the case in chapter 6, independence implies that subjects are responding independently of each other. Moreover, the independence of error assumption suggests that residual values e,=E(Y r Y') are independent and normally distributed with a mean of zero and constant variance. Stevens
(1986, p. 87; 1996) noted that the residuals are only independent when n is large relative to the number of predictors; however, residuals do have different variances. Since the last two assumptions were covered extensively in chapter 6, attention will only be given to the first two assumptions. 7. Multivariate Research Statistical Methodology Using SPSSX and SAS 179

Linearity
Linearity is a linear relationship between the predictors and the dependent variable. Scatter diagrams can be used to investigate this assumption. If linearity is violated, other regression techniques must be employed such as ones for curvilinear relationships. An example of such a statistic is the coefficient eta, also called the correlation ratio.
Homoscedasticity
Homoscedasticity of variance, or constant variance, means the variances for columns are equal, and the variances for rows are equal.
This implies that, within a distribution, the scatter is the same throughout; or there is uniformity in spread about the regression line. This assumption suggests that if data is sectioned into columns, the variability of Y is the same from column to column. Similarly, if data is sectioned into rows, the variability of Y would be the same from row to row. Figures 1-3 illustrate equal variability among rows, equal variability among columns and violations of homoscedasticity, respectively.
Tables 11-15 graphically illustrate how to ascertain assumption violations with scatter plots from SPSSX.

180

Test Anxiety: Applied Research

7. Multivariate Research Statistical Methodology Using SPSSX and SAS 181

Figure 1
Equal Variability among Rows

Figure 2
Equal Variability Among Columns

A

X
Equal variability among the rows

Equal variability among the columns

182

Test Anxiety: Applied Research

Figure 3
Violation of Homoscedastic Relationship

7. Multivariate Research Statistical Methodology Using SPSSX and SAS 183
Regression assumptions can be checked by using a scatter plot of the e,'s standardized (standardized or studentized residuals) with Y's
(predicted values of Y). If no systematic pattern or clustering of residuals occur, one can assume that the assumption is tenable. To standardize residuals, we divide each residual [ei=2(Y-Y')] by the standard deviation of the residuals, which is the standard error of estimate for the residuals.
It can be expressed by the following (Pedhazur, 1982, pp. 28, 36):

2 -

S Est
Where Sy=
/ s u m of the squares/degree of freedom

1 -

= /E(Y-Y') 2 /N-k-l

0 -

R2 = the squared multiple correlation coefficient or the Pearson ProductMoment Correlation between a set of predictors and a dependent variable.
Sy = the standard deviation of Y
N = the sample size
K = the number of predictors.

-1 -

Therefore, the standardized residual (e, standardized) =

-2 -

Y y / N - k - l = e/S y v/l-R 2

-3
-3

-2

-1

0

Technically, residuals can be adjusted in two ways. That is, by means of standardized residuals and studentized residuals. Standardized residuals have a mean of 0 and a standard deviation of 1. A studentized residual is a residual divided by an estimate of its standard deviation.
This is the reason that standardized and studentized residuals can take on different values with the same data set; however, usually standardized and studentized residuals are close in value.
Residuals are plotted to test the normality assumption. In addition, the studentized residuals should randomly scatter about the horizontal line defined by 0 which is found on the SPSSX printout. The control lines for

184

7. Multivariate Research Statistical Methodology Using SPSSX and SAS 185

Test Anxiety: Applied Research

obtaining residual scatter plots are provided with control line for the backward regression example that will soon follow.

same data. The six methods on SPSSX are forward entry, backward elimination, stepwise selection, forced entry, forced removal, and test.

7.13 WHEN REGRESSION ASSUMPTIONS
APPEAR TO BE VIOLATED
Rarely are the assumptions of regression not violated in one way or another. Usually the correction for the violation of one assumption will also correct other violations; however, this is not the case for the independence assumption. In terms of equal variances, if the homogeneity assumption is violated, often the variance stabilizing techniques will also correct violations of normality. When the assumptions of regression appear to be violated, a data transformation of variables may stabilize variance and achieve normality. Transformation may be on the independent variables, dependent variables, or both. The three most common transformations is the square root, logarithm to the base 10, and the negative reciprocal. If there is a moderate violation of assumptions, the square root may be an adequate transformation.
When there is a strong violation of assumptions, the logarithm to the base 10 provides a better transformation; however, if there are extreme violations of assumptions, the negative reciprocal provides the strongest transformation of data. The negative reciprocal is recommended over the reciprocal, since it preserves the order of observations. Commonly, when residual plots are positively skewed, the logarithmic transformation may be helpful. Nevertheless, the square root transformation is common for negatively skewed distributions.
SPSSX Users Guide 3rd edition (1988, pp. 109-140) discusses numeric transformations. Many complicated transformations can be performed on SPSSX. SPSSX uses the COMPUTE command to perform data transformations. For example, to obtain the square root of a variable called Yl, the command is: Compute X=SQRT(Y1). In this example, the transformed variable is called X.

Forward Entry or Selection
With the forward entry method, predictor variables are entered into the regression equation one at a time. At each step, the predictors not in the regression equation are examined for entry. Basically, the first predictor to enter the regression equation is the one with the largest correlation with the criterion. If this predictor is significant, the predictor with the largest semipartial (a variant of a partial correlation or the correlation of several variables with one or more variables held constant) correlation with the criterion is considered. The formula for the partial correlation is:

7.14 SIX COMPUTER METHODS OF SELECTING
PREDICTORS AND REGRESSION MODELS
SPSSX (1983, p. 604) describes six methods the computer can use for selecting predictors. The reader should be aware that these are six methods to build regression equations. Even with the same data, each method can lead to a different regression equation. This will be illustrated later by applying the forward, backward, and stepwise methods to the

r

i2

f

I3r23

'12.3

r123 denotes the correlation between variables 1 and 2, with 3 held constant. The formula for the semipartial correlation which is used with
R 2 is:
1

12.3(s)

or r !(2.3)

The semipartial corrleation, also known as the part correlation, can be denoted as r](2.3). This notion states that Variable 3 has been partialed out from Variable 2 but not from Variable 1; hence, this is the correlation between Variables 1 and 2 once Variable 3 has been held constant from
Variable 2 and not Variable 1.
Once a predictor fails to make a significant contribution to predictions, the process is terminated. The difficulty with forward entry is that it does not permit the removal of predictors from a regression equation once they are entered. In contrast, stepwise regression, a modification of forward entry, permits predictors to be entered and removed from regression equations at different steps in the process.

186

Test Anxiety: Applied Research

7. Multivariate Research Statistical Methodology Using SPSSX and SAS 187

Backward Elimination
Backward elimination is the reverse of the forward entry selection process. With the backward elimination, one starts with the squared multiple correlation of Y and all k predictors in the regression equation.
The predictor that contributes least to the regression sum of squares when entered last is deleted, and a new value of R2 with the remaining k-1 predictors is calculated. Now, the predictor that contributes least to the regression sum of squares when entered last is deleted and a new value of
R2 using only k-2 predictors is calculated. The process continues until k predictors have been retained. In summary, the backward elimination process starts with a regression with all the predictors and attempts to eliminate them from the equation one at a time.

Test is an easy way to test a variety of models using R2 and its test of significance as a gauge for the "best" model.

Stepwise Selection
Stepwise selection is a variation of the forward entry process; however, at each stage a test is made to determine the usefulness of a predictor. Hence, a predictor that was earlier selected can be deleted if it loses its usefulness.

1 - Include forces XI and X2 into the prediction equation.

Forced Entry
With forced entry, predictors are entered that satisfy a tolerance criterion. Specifically, predictors can be forced into a regression equation in a specific order. For example, using the enter subcommand on SPSSX, one can enter variables in a specific order in a regression equation. If one wanted to enter variable XI followed by X2, the SPSSX subcommands are: Enter XI/Enter X2/.
For SAS, if we had a two-predictor case, the control line for forcing predictors XI and X2 into a prediction equation is: Model Y = XI X
2/IncIude=2 Selection=Stepwise.
Forced Removal
Forced removal is the exact opposite of the forced entry process; however, predictors are removed from a regression equation in a specific order. Test
Test is a process used to test various regression models. For example, it is used for model testing or for finding the "best" model for a data set.

SAS Stepwise Regression
Data Regress;
Input XI 1-2X2 4-5 Y 7-8;
Cards;
Data Lines
Proc Reg Simple Corr;
Model Y = XI X2/Include = 2 Selection = Stepwise;

Table 10
7.15 SPSSX Control Lines for Running the
Backward Elimination Process
The control lines for running multiple regression are similar for each selection process. Essentially, the changes needed in the subcommand are: dependent=dependent variable/selection procedure key word.
Title 'Backward elimination with two predictors'
Datalist/Xl 1-2 X2 4-5 Y 7-8
Begin data
52 51
58
53 52
57
53 52
58
54 53
55
54
55
54
55 55
55
55 56
53
57 58
55
58 59
53

188

Test Anxiety: Applied Research

58 .60
52
End data
Regression variables=Y XI X2/ dependent=y/backward/ residuals/
ScatterpIot=(*zresid.Y).
(*zresid,X2),
(*zresid,Xl)/

7. Multivariate Research Statistical Methodology Using SPSSX and SAS 189

Analysis of Variance
Df

Sum of
Squares

Regression (k)

2

29.14

14.57

Residual (n-k-1)

7

10.86

1.55

1
2

Mean
Square

F=9.29
1 - This is the subcommand for getting the backward elimination procedure. We could request the forward or stepwise procedures by changing the dependent subcommand to dependent= y/forward/ and dependent=y/stepwise/, respectively.

B

X2
2 - The scatterplot subcommand provides residual plots, which are used to test the assumptions of multiple regression. On the printout, the standardized residuals are given the name *ZRESID, and the predicted values are given the name *PRED.

Variable

-.285714

XI

-.428571

Constant

92.285714

Y'= -.4286X, - .2857X2 + 92.29
Figure 4
Backward Elimination Two Predictors
Selected Printout from SPSSX

Multiple R
R Squared
Adjusted R
Squared
Standard
Error

.85
.73
.65
1.25

Test Anxiety: Applied Research

190

Table 11
Graphical Checks for Violations of Regression Assumptions
Histogram of Standardized Residuals
(* = 1 Cases,

NExpN
0
0
0
0
0
0
1
1
0
3
1
1
1
1
0
1
0
0
0
0
0

.01
.02
.04
.09
.18
.33
.55
.81
1.06
1.25
1.32
1.25
1.06
.81
.55
.33
.18
.09
.04
.02
.01

Out
3.00
2.67
2.33
2.00
1.67
1.33
1.00
.67
.33
.00
-.33
-.67
-1.00
-1.33
-1.67
-2.00
-2.33
-2.67
-3.00
Out

. : = Normal Curve)

**

*

This graph is a check for normality. The first and last out contain residuals more than 3.16 standard deviations from the mean. In practice, these outlier residuals would be examined in order to determine how they occurred. In large data sets, these outlier values may be the result of data entry errors. When observed and expected frequencies overlap, a colon is printed. It is not realistic to expect the residuals to be exactly normal.

7. Multivariate Research Statistical Methodology Using SPSSX and SAS 191

The asterisks indicate a few cases that strayed from normality; however, overall the residuals appear to approach normality. Note: it is assumed that standardized residuals have equal variances; however, this has been shown not to be the case. The SPSSX Advanced Statistics Guide suggests that studentized residuals, in contrast to standardized residuals, more accurately reflect differences in true error variances and are recommended instead of standardized residuals. Studentized residuals can be obtained on SPSSX with the keyword *SRESID. The subcommand residuals= outliers (resid, sresid, sdresid, mahal, cook, lever)/ produces outlier statistical information. See the SPSSX Advanced Statistics Guide (p. 61) for additional information on outlier statistics.

Test Anxiety: Applied Research

7. Multivariate Research Statistical Methodology Using SPSSX and SAS 193

Table 12
Normal Probability (P-P) Plot
Standardized Residual

192

line results. Also, this plot suggests that the residuals appear close to normality. Table 13
Standardized Scatterplot

Normal Probability (.?-?) Plot
Standardized Residual
1.0 •

• -••

Across - Y Down - *ZRESID
.



*

*

*

Standardized Scatterplot
Across - Y
Down - *ZRESID

*•*

Out •+

*******

.75 •

+

•--•--•

+

+

•+

3 •



Symbols:

2 •



Max M

1 ••

••

0 ••

H

i•

A

**

0 b s e .5 r v e d
.25 •

*****
*****
*****
**

-1




******

.25

.5

.75

•--«• Expected
1.0

-3 -Out • 3 - 2 - 1

The normal probability plot is another check for normality. It compares the residuals with expected values under the assumption of normality. If the normal probability and residuals are identical, a straight



- 2 .05.
(Answer: This indicates that the test statistic Chi Square is not significant at the .05 level.)
Interpret the following: r = . 9 1 , N = 1 0 .
(Answer: Using table H and locating an r value with number of pairs minus two for degrees freedom (N-2 or 10-2 = 8), a critical value of r = .765 at the .01 level for a two-tailed test. This indicates statistical significance at the .01 level. The results can be rewritten as r(8),
JK.01.)
Interpret the following:
Suppose the Spearman rank-order correlation coefficient (rho) is determined to be -.70, with 9 pairs.
What does this result mean?
(Answer: Using Table I for rho with 9 pairs, the critical value at the
.05 level is .683 for a two-tailed test. This indicates that statistical

7. Multivariate Research Statistical Methodology Using SPSSX and SAS 221

significance was reached at this level. The results could be rewritten as rho(9) = -.70, p.05.)
19. Can a regression equation be applied to exercise 18?
(Answer: no, since the correlation was not significant.)
20. Below are the t scores for 8 subjects on the TAI and STAI, two test anxiety instruments. If a score of 59 was obtain on the TAI, what would be the predicted value of STAI?

Subjects
1.
2.
3.
4.
5.
6.
7.
8.

TAI
57
56
55
54
54
53
52
51

STAI
63
59
61
57
55
57
51
53

Answer: The regression equation is as follows: y = -39.43 + 1.79X
Solving for X = 59 is y =-39.43+ 1.79(59) = 66.18.
21. Jane obtained a t score of 60 on the TAI. Assume that scores on the
TAI are normally distributed with a mean of 50 and a standard deviation of 10. What is the probability of Jane getting a t score this large? Hint: One must use Table A for area under the normal curve.

222

Test Anxiety: Applied Research

(Answer: Probability can be found for normally distributed variables by finding the area beyond a certain z score, or the area in the smaller portion of the normal curve. A t score of 60 corresponds to a z score of 1, which cuts off 34.13% of the area on the right side of the normal curve. The entire right side of the normal curve represents
50% of the area, since this represents half of the normal curve. The area beyond this point can be found by subtracting .50-.3413 = . 1587.
This is a proportion or probability. Therefore, the probability of obtaining a t score of 60 on the TAI = . 1587, or approximately a 16% chance.) 22. Suppose Jim obtained a raw score of 110 on a test that had a population mean of 80 and a population standard deviation of 16.
Assume that the scores on the test are normally distributed. Five hundred ten students took this test. How many students in Jim's class scored above his raw score of 110?
(Answer: The first step is to find Jim's z scores, which is found by the following formula: z = x-u u is the population mean or 80 a a is the population standard deviation or 16 x = Jim's score of 110
,
, .. .
110-80
. o_c by substitution, z =
= 1.875
16
The probability of getting a score this large (1.88 rounded) is .0304, which is the proportion of 510 students scoring this high. The final step for this solution is multiplying the proportion by the sample size.
So, .0304(510)=15.504 or approximately 16 students.)
23. Describe the meaning of the following notation:
A X B X C(AB)
(This is a three-factor design with factor C nested within A by B. If one were to run this analysis on SPSSX, to indicate that C(AB), the
SPSSX subcommand would be as follows:
C within A by B.)
References
Bock, R. D. (1975). Multivariate statistical methods in behavioral research. New York: McGraw-Hill.
Borg, W. R., & Gall, M. D. (1983). Educational research: An introduction. New York: Longman.

7. Multivariate Research Statistical Methodology Using SPSSX and SAS 223

Finn, J. (1974). A general model for multivariate analysis. New York:
Holt, Rinehart and Winston.
Greenhouse, S., & Geisser, J. (1959). On methods in the analysis of profile data. Psychometrika, 24, 95-112.
Huck, S., Cormier, W., & Bounds, W. (1974). Reading statistics and research. New York: Harper and Row.
Huynh, H., & Feldt, L. (1976). Estimation of the Box correction for degrees of freedom from sample data in the randomized block and split plot designs. Journal of Educational Statistics, 7,69-82.
Johnson, N., & Wichern, D. (1982). Applied multivariate statistical analysis. Englewood Cliffs, NJ: Prentice-Hall.
Keppel, G. (1983). Design and analysis: A researcher's handbook.
Englewood Cliffs, NJ: Prentice-Hall.
Kesselman, H., Rogan, J., Mendoza, J., & Breen, L. (1980). Testing the validity conditions of repeated measures F tests. Psychological
Bulletin, 57,479-481.
Lindman, H. R. (1992). Analysis of variance in experimental design.
New York: Springer-Verlag.
Pedhazar, E. (1982). Multiple regression in behavioral research. New
York: Holt, Rinehart and Winston.
Sapp, M. (1997). Counseling and psychotherapy: Theories, associated research, and issues. Lanham, MD: University Press of America.
SPSSX user's guide (2nd ed.). (1983). Chicago: SPSS, Inc.
SPSSX user's guide (3rd ed.). (1988). Chicago: SPSS, Inc.
Stevens, J. P. (1990). Intermediate statistics: A modern approach.
Hillsdale, NJ: Lawrence Erlbaum Associates.
Stevens, J. P. (1986). Applied multivariate statistics for the social sciences. Hillsdale, NJ: Lawrence Erlbaum Associates.
Stevens, J. P. (1996). Applied multivariate statistics for the social sciences (3rd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.
Tatsuoka, M. M. (1988). Multivariate analysis: Techniques for educational and psychologicalres earch (2nd ed.). New York: Wiley.

Part II: Measurement Issues

8. Measurement Issues

227

Chapter 8
CONTENTS
8.1
Measurement Issues
8.2 Testing the Dimensionality of the Worry Component of the Test
Anxiety Inventory with Economically and Educationally At-Risk
High School Students: Employing Item Response Theory
Analysis and Principal Components Analysis

8. Measurement Issues

229

8.1 MEASUREMENT ISSUES
The measurement of test anxiety is an extremely complicated process.
Currently, classical test theory, item response theory, and generalizability theory are the approaches that dominate measurement theory.
Classical theory is the model often taught in introductory psychological measurement courses. Psychologists have used this theory of measurement since the turn of the century. Often, it is used to find reliability measures such as test-retest, internal consistency, and so on. It is also referred to as the true score theory, and it has the following mathematical model:
X=T+E
X = a person's score or an observed score
T = a person's true score
E = the error score
Theoretically, reliability can be expressed as the ratio of true score variance divided by the observed score variance. If we symbolize reliability as rxx, it can be expressed mathematically as
3

t g2 true score variance observed score variance

And if Se denotes the standard error of measurement,

The standard error estimates intraindividual variability, and it provides a measure of variability of a person's score from repeated testings.
To summarize, classical theory states that a person's score is composed of two components: the person's true score and random error.
In contrast, item response theory (IRT) states that a person's score is the product of his or her ability and the easiness of a given item. IRT computes the odds that a participant will succeed on an item, denoted as
Zn = the participant's ability
Ei = item easiness
According to IRT, the greater a participant's ability level the more likely he or she will answer a particular item correctly.
Generalizability (G) theory subsumes and extends classical test theory (Brennan, 1983). G theory can simultaneously estimate several

231

Test Anxiety: Assessment

8. Measurement Issues

sources of error variance and interactions among variance sources.
Unlike classical test theory, which estimates one source of variance, G theory can estimate multiple sources of variance simultaneously. For example, test-retest reliability only estimates error due to time; internal consistency is based on error due to test items. The reader can consult
Brennan, who shows how to perform generalizability studies using an
ANOVA-type approach.
The study that follows describes the testing of the dimensionality of the worry component of the test anxiety inventory with economically and educationally at-riskhigh school students employing item response theory and principal components analysis.

greater the likelihood that an individual with a certain ability will solve that item (Fox & Jones, 1998).
IRT makes no assumptions about the abilities of the calibration sample. It allows the calibration of a test across an entire range of possibilities, even when everyone in the calibration sample gets the same score (Wright & Panchapakesan, 1969). This is possible because although everyone can theoretically get the same total score, they tend to differ on those items that they get correct. Once a pool of test items conforms to an item response analysis model, and has been calibrated, one can compute a calibration curve of estimated abilities for every possible score on any test or subtest (Wright & Panchapeksan, 1969) without the need for additional standardization samples.
IRT has two major statistical assumptions. First, it assumes unidimensionality. This assumption holds when a test measures only one underlying construct. The second assumption is local independence. It assumes that the error component of each item cannot be correlated across items. When a single score can be used to predict performance, and local independence holds, a test is said to be unidimensional (Allen & Yen,
1979, p. 57; Hambleton, 1983, p. 5).
In terms of item analysis models, IRT and classical theory have similar mathematical models. For example, classical test theory has the following model:
X=T+E
Where X = a person's score
T = the person's true score
E = the error.
The classical test theory model is often used to describe classical reliability theory. It states that a person's score (X) has two component parts: the person's true score (T) and some random error (E). A person's true score is unobservable; it is a theoretical construct, and it represents the part of a person's score not affected by error. This model is based upon three assumptions:
1. True and error scores are uncorrelated.
2. The mean of error scores in the population of examinees equals zero.
3. Error scores on parallel tests are not correlated.
IRT has the following model (Hambleton, 1983), p. 5):

230

8.2 TESTING THE DIMENSIONALITY OF THE
WORRY COMPONENT OF THE TEST ANXIETY
INVENTORY WITH ECONOMICALLY AND
EDUCATIONALLY AT-RISK HIGH SCHOOL STUDENTS:
EMPLOYING ITEM RESPONSE THEORY ANALYSIS
AND PRINCIPAL COMPONENTS ANALYSIS
Item Response Theory
Hambleton and Swaminathan (1985, pp. 1-14) and Samejima (1969) compared and contrasted classical test theory models and neoclassical or modern test theory models (item response theory models). They found that each theory can produce a distinct form of item analysis. These researchers noted that classical test theory is based on "weak assumptions"; these are assumptions that can be "easily" met by most tests. Wright (1977a, 1977b), one of the earlier proponents and developers of IRT, distinguished between IRT and classical models by drawing the analogy between an individual approach to assessment versus a group method, with IRT being the individualistic model (Wright &
Panchapakesan, 1969). The group assessment model (classical model) is based on standardization samples: the number of individuals who succeeded on an item and item difficulty, all of which are very dependent on the appropriateness of the standardization sample involved.
IRT, also called latent trait theory, makes no assumptions about the individuals involved. Instead, this simple model posits that a person's score is the project, in the arithmetic sense, of his or her ability and the easiness of an item. Essentially, the more capable a person, the greater his or her changes of succeeding on an item. Also, the easier an item, the greater the likelihood that an individual with a certain ability will solve that item (Fox & Jones, 1998).

Yi = Pi© + fii

Where y, = the latent response to item i

232

Test Anxiety: Assessment

Pi0 = the correlation between Yi and 6. This is the regression coefficient of Yi-on 0.
0 is a person's ability level or odds of success on an item; theoretically, it can range ± °°; however, in practice the value seldom exceeds ±4.0. e, = an error component with a mean of zero and a standard deviation
0(6,).

One can see that the classical test theory model and IRT model have some similarities. For example, both models are mathematical functions with error components. In addition, p;6, which is analogous to a person's true score under the classical model, is unobservable. For example, one can also view 0 as a person's underlying ability or latent trait measured by items on a test. When we present an item to a person, for example the ith item, a response Yi is elicited; this response can have the same values as
0. When an item is a perfect indicator of a trait, y, and 0 are equal. For practical purposes, Yi has an error component because it is not a perfect indicator of a trait. We see a similar occurrence with the classical model where a person's true score has to be adjusted with an error component
(Hambleton, 1983, p. 4). In summary, IRT models have two assumptions:
1) a single ability underlies test performance—called unidimensionality of the test and 2) the relationship between item performance and ability can be represented by one, two, or three parameter logistic function.
The Rasch or 1-parameter logistic model, the simplest IRT model, is the one employed in this study. It is based upon an objective method of transforming data into measurements. There are two conditions for this process. First, the calibration of the measuring instrument must be independent of the items used for calibration. Second, the measurement of objects must be independent of the instrument (Fox & Jones, 1998).
The Rasch model is based on object-free instrument calibration and instrument-free calibration, which allows generalization beyond the particular instrument (i.e., set of items) used. The Rasch model simply states that when a person encounters a test item his or her outcome is governed by the product of his or her ability and the easiness of the item
(proportion of correct responses). For example, the odds of success Oni are: Oni = ZnEi
Zn = ability of subject
Ei = item easiness or proportion of correct responses

8. Measurement Issues

233

Specifically, the higher the person's ability level the more likely he or she is to answer a particular item correctly. Likewise, the easier a particular item the greater the likelihood that any person will get it correct.
In terms of independence for obtaining odds of success, the Rasch model states (Waller, 1973, 1976):
OAB = 0 A 0 B
O^ = Odds of obtaining AB
Odds denotes the ratio of success to failure during probability calculations. The Rasch model states that the parameters 0A and 0B (odds of obtaining AB) are independent.
Also, under the Rasch model:
Py = Person "i" on the "j t h " item.
Pij = _Z;_E: , which implies that Po = Pr (Person "i" gets the
1+Z| Ej "j l h " item correct given his or her ability).
This equation indicates that the probability that a person will get an item correct with easiness Ej is the product ZnEj divided by one plus the product ZnEj.
With the Rasch model, we estimate Zr For example if:
r.j = {1 means person i is correct on item i
{0 means person j is wrong on item j

Pr is the probability of obtaining a given response vector.
Therefore, if the person's response vector V, is: v i = {fij} = {1110001000}

Remember that l's mean an item is correct and O's an item is incorrect. We want a Z, that maximizes the probability of obtaining the correct data.

The principle of maximum likelihood estimation states that one should use an estimate of the parameter (i.e., ability) value that maximizes the probability or likelihood of getting the data obtained, V,.
So,Pr(V i ) = Pr(rii=l r a =l r,3=l r u =0 rls=0 r,6=0 r,7=l r,,=0 r,9=0)

234

Test Anxiety: Assessment

8. Measurement Issues

Qij is the probability of not obtaining a given response vector.
Only if we assume local independence will
Pr (Y)=
Pjjrij Qij lrij , which is the likelihood function.
Substituting for Pr =

+

z

E

i

And Qij = 1 - Pij

z,
1 +

E z i

•>. (

Zi

[

1 +

J

1

E
Z

,

,E,

If we let a person's ability parameter 0i5 = log Zv
Then, 6;, = log eZi5 where e is a constant that equals 2.7183, and log e is read logarithm to the base e of Z-, or logarithm of Z, to the base of e.
Well, log eZi = Z, = e.
A person's difficulty parameter d = log eEj = d = eEi p p

u u _
1 + e e i e*

Logs are used to transform the scale so the center is around zero.
The derivative — = P1J Q...
IJ
d8
This implies that there is a maximum likelihood that is unique.
Summarizing, IRT can be employed to test the unidimensionality of a scale or test. If unidimensionality holds, students' ability level can be obtained and is independent of the instrument used. Essentially, IRT is a method of item analyses or scaling that is not based on group norms.

235

The Dimensionality of the Worry Component of the TAI
Traditionally, test anxiety, as measured by the Test Anxiety Inventory
(TAI), is composed of two components: worry and emotionality (Liebert
& Morris, 1967; Morris & Liebert, 1970; Morris & Perez, 1972; Sapp,
1993, 1996b; Spielberger, 1980). Worry is the cognitive concern about performance whereas emotionality is the physiological reaction to anxiety
(Sapp, 1994a, 1994b, 1996b).
Sarason (1972, 1980) pointed out that cognitive overload during a test situation or learning situation can result in impaired learning and performance. Essentially, test anxiety makes demands on the attention by dividing attention into task-relevant and task-irrelevant cognitions and behaviors. Benjamin, McKeachie, and Lin (1987) characterized these cognitive models of test anxiety along the lines of a computer model.
They viewed a student's capacity to store information as analogous to a computer's capacity to store data; hence, their model is called an information processing view of test anxiety.
Doctor and Altman (1969) factor analyzed Liebert and Morris' Test
Anxiety Questionnaire (TAQ), a precursor to the TAI, and found that worry is more complex than Spielberger had conceptualized it. They did find that worry appears to be a trait factor whereas emotionality corresponds to a state factor; however, they further differentiated worry into cognitive perceptual features that result in a cognitive perceptual distortion of unpleasantness and self-evaluation. In addition, Finger and
Galassi (1977) found that treatments designed to reduce worry also reduced emotionality.
Their research suggests that worry and emotionality interact in some complex fashion. There is also research by
Richardson, O'Neil, Whitmore, and Judd (1977) that failed to confirm the factor structure of the Liebert-Morris Scale using confirmatory factor analysis. Specifically, they did not find worry and emotionality emerging as factor structures.
Salam6 (1984) conducted factor analysis and path analysis on the worry and emotionality components of test anxiety and found that neither worry nor emotionality were unidimensional. Salami's factor analysis revealed four factors for the worry component, each of which had high internal consistency. The factors composing worry were:
(1) lack of confidence in ability to succeed,
(2) fear of failure and its consequences to academic career,
(3) expressing fearful anticipations of failure and worry about the consequences of failure, and

237

Test Anxiety: Assessment

8. Measurement Issues

(4) fear of social devaluations.
Salame's (1984) path analysis showed that "fear of social devaluation" appears to be a direct or indirect precursor to the other four factors. Path analysis also indicated that doubts about one's competence, lack of ability to succeed, and fear of failure serve as intervening variables between precursors of test anxiety and subsequent test anxiety reactions.
Salame's (1984) factor analysis of the emotionality component revealed two factors: "apprehension" and "other anxiety reactions." The apprehension factor consisted of items that measure one's concern about exams without references to failure relating to one's ability or the degree of preparation for an exam. This factor appears to be a pure concern variable. The second factor, other anxiety reaction, included psychophysiological reactions, restlessness, rapid heartbeat, jitteryness, and situation and personality interferences.
Hagtvet (1984), Hagtvet and Johnsen (1992), and Schwarzer, Van der
Ploeg and Spielberger (1982, 1989) also questioned the assumed unitary structure of the worry and emotionality components of test anxiety.
Worry and emotionality are seen as determined by fear of failure.
Actually, test anxiety as contextualized by Hagtvet is a fear of failure construct. This is in direct opposition to Sarason's (1980) theory; Sarason
(1984) reported that individuals scoring high on the test anxiety rarely experienced fear of failure. Also, Sarason (1984) suggests that test anxiety scales may measure cognitive components of test anxiety in contrast to affective ones. Finally, Hagtvet (1984) suggests that fear of failure, worry, and emotionality are three different constructs; however, fear of failure can produce worry and emotionality response; hence they can serve as test anxiety measures.
The purpose of this study is to apply the Rasch model of item analysis to worry test anxiety measures obtained from economically and educationally at-risk high school students.

developmental summer program. Forty percent of the students were
African American, 32% were Latino, 17% were Asian American, and
11% did not identify their racial heritage. Forty-eight percent of the students were male, and 52% were female. Students were high school seniors, ranging between 17 and 19 years of age. Many of the students were from a large metropolitan urban area located within a large northeastern state. The average family income of participating students was below $17,000.
The computer program used for the item response analysis (IRT) was an augmented version of LOGOG (Kolakowski & Bock, 1973). IRT assumes a one-dimensional underlying trait (Samejima, 1969). This program estimates ability and difficulty parameters through a quasimarginal maximum likelihood procedure that continues through an iterative process (Newton-Raphson) until a convergent criterion is obtained. Specifically, the estimation procedure is as follows: first, ability and difficulty parameters are estimated together, then difficulty parameters are estimated singly, next ability and difficulty parameters are estimated again, and finally the ability and difficulty parameters are estimated with estimations of ability held constant. LOGOG rank orders participants and distributes them into 10 score groups or fractiles (Waller,
1989). The Rasch model assumes aunidimensional underlying trait. Chisquare goodness-of-fit statistics allow one to determine if items deviate from unidimensionality. LOGOG calculates a goodness-of-fit chi-square for each item. Items with the best fit have the smallest chi-square values.
A chi-square (x2) goodness-of-fit test statistic (Kolakowski & Bock,
1973) is used to test the unitary dimensionality of the worry scale.
Significant x1 tests for any items of the worry scale suggests that it is not contributing to unidimensionality. Also, if the overall %2 test is significant, this suggests that unidimensionality is not tenable for the entire instrument. We wanted to determine if the worry subscale of the
TAI consisted of more than one component; therefore, in addition to the
Rasch analysis, we performed principle components analysis on the worry scale using the SPSSX computer program (1988).

236

Method
Participants
One hundred and one high school students were identified as being both educationally and economically disadvantaged. Students were attending an Educational Opportunity Program (EOP) at a northeastern university. This is a program designed to provide academic assistance for educationally and economically disadvantaged high school students.
During the summer of 1993, students were enrolled in a five-week

Independent Variables
Students were given diagnostic tests to assess writing, mathematical, and reading ability. Each test was given under standardized conditions; proctors read the instructions for each test aloud, while students read them silently. 238

Test Anxiety: Assessment

8. Measurement Issues

Essays constituted the writing test. Students were required to respond to one of two possible passages on two controversial contemporary issues: drug legislation or women in combat. Students chose one of the passages and constructed an essay. One hour and fifteen minutes was the alotted time students had to complete their responses.
Department of English faculty and graduate assistants from the
English Department and the university's learning skills center rated essays. Raters looked for the following:
Ability to identify an issue and develop an argument,
• Capacity to proceed in a logical and orderly manner,
• Skills to specify general and abstract points,
• The ability to address intended audience with appropriate tone and sense of purpose,
• Ability to write clearly and economically,
• Ability to express oneself in a natural voice,
• Using grammar correctly, and
• Spelling and punctuating correctly.
The above criteria were rated on a scale of 1 to 6 by one rater. The rating was used to place students in English courses. In summary, the writing test was a diagnostic and placement tool used to assess the writing skill level of all entering first time college students at this university.
The Department of Mathematical Sciences faculty constructed the mathematics tests. Primarily, this test was used to place students into the appropriate math courses. The math test was multiple choice, with five options for each question, and comprised of 43 total questions divided into three clusters of increasing difficulty. Scoring in the highest cluster suggested readiness for a basic calculus course, scoring within the second cluster indicated placement for a pre-calculus class, and scoring within the third cluster indicated placement in beginning or intermediate algebra.
These tests were administered to all students during a 2'/2 day summer orientation. worry subscale were paraphrased to give readers a feel for the actual items. TAI 3 - Thoughts about my grade interferes with my course work on tests.
TAI 4 - Important exams cause me to freeze.
TAI 5 - Exams cause me to question if I will get through school.
TAI 6 - Working hard on a test causes me to get confused.
TAI 7 - Thoughts of performing poorly on tests interfere with my concentration.
TAI 14 - I defeat myself when working on important exams.
TAI 17 - I think about failing when taking exams.
TAI 20 - I forget facts I really know during exams.
Sapp (1993, 1996b) reported validity coefficients of the worry scale of .79 for males and .70 for females. Reliability coefficients for a six month period was .66. The worry raw scores, using a table of norms, were converted into t-scores which have a mean of 50 and a standard deviation of 10.

Dependent Variable
The worry subscale of the Test Anxiety Inventory (TAI) was used as a measure of test anxiety. The worry subscale consists of eight items designed to measure the worry component of test anxiety. Since copyright permission could not be obtained, the eight items of the TAI

239

Research Design
Students were pretested on the worry subscale of the TAI and randomly assigned to one of three battery of tests designed to measure writing, mathematics, and reading ability. The three test sequences students were assigned to were: (1) writing, mathematics, and reading;
(2) mathematics, writing, and reading; (3) reading, writing, and mathematics. After taking the test battery, students were posttested on the
TAI worry test anxiety. Posttest worry scores were used for the IRT and principal components analysis.
Results
The results of the item response analysis attempted to fit the eight items of the worry scale to a unidimensional model using the LOGOG.
It is assumed that each item contributes to unidimensionality; therefore, if one item deviates from unidimensionality the scale is probably not unidimensional. LOGOG suggests that the eight items of the worry test of the TAI are not measuring a unidimensional construct, %2 = 211.5538
(100), p

Similar Documents

Premium Essay

Research

...The Research Process Writers usually treat the research task as a sequential process involving several clearly defined steps. No one claims that research requires completion of each step before going to the next. Recycling, circumventing, and skipping occur. Some steps are begun out of sequence, some are carried out simultaneously, and some may be omitted. Despite these variations, the idea of a sequence is useful for developing a project and for keeping the project orderly as it unfolds. Exhibit 3–1 models the sequence of the research process. We refer to it often as we discuss each step in subsequent chapters. Our discussion of the questions that guide project planning and data gathering is incorporated into the model (see the elements within the pyramid in Exhibit 3–1 and compare them with Exhibit 3–2). Exhibit 3–1 also organizes this chapter and introduces the remainder of the book. The research process begins much as the vignette suggests. A management dilemma triggers the need for a decision. For MindWriter, a growing number of complaints about postpurchase service started the process. In other situations, a controversy arises, a major commitment of resources is called for, or conditions in the environment signal the need for a decision. For MindWriter, the critical event could have been the introduction by a competitor of new technology that would revolutionize the processing speed of laptops. Such events cause managers to reconsider their purposes or...

Words: 376 - Pages: 2

Premium Essay

Research

...requirements of talking on the phone.  These predictions were derived from basic theories on limited attention capacities. 2. Microsoft Company has basic research sites in Redmond, Washington, Tokoyo, Japan etc.at these research sites work on fundamental problems that underlie the design of future products. For example a group at Redmond is working natural language recognition soft wares, while another works on artificial intelligence. These research centres don’t produce new products rather they produce the technology that is used to enhance existing products or help new products. The product are produced by dedicates product groups. Customization of the products to match the needs of local markets is sometimes carried out at local subsidiaries. Thus, the Chinese subsidiary in Singapore will do basic customizations of programs such as MS office adding Chinese characters and customizing the interface. APPLIED RESEARCH * INTERNATIONAL TELECOMMUNICATION AND INFORMATION TECHNOLOGY SERVICE COMPANY Offering customers products and services for ‘connected life and work’  Project: 1. Research focused on the organisation’s tendency to appoint ‘safe’ senior executives, rather than those who were able to drive change through the business, and enable a culture of calculated business risk and growth.  This research led to a programme that created different and improved relationships with executive search agencies, established a positive shift in the interaction between the...

Words: 282 - Pages: 2

Premium Essay

Research

...Myresearch About 30 million other animals. Animal experimentation by scientists can be cruel and unjust, but at the same time it can provide long term benefits for humanity. Animals used in research and experiments have been going on for 2,000 years and keep is going strong. It is a widely debated about topic all over the world. Some say it is inhuman while others say it's for the good of human kind. There are many different reasons why people perform experiments and why others total disagree with it. Each year 20 million animals are produce and breed for the only purpose but to be tested on. Fifty-three thousands of animals are used each year in medical and veterinary schools. The rest is used in basic research. The demand for animals in the United States is 50 million mice, 20 million rats, and aThis includes 200,000 cats and 450,000 dogs. The world uses about 200-250 million animals per year. The problem with working with animals is that they cannot communicate their feelings and reactions. Other people say that they can communicate and react to humans just a well as one person to another. Some of the animals the research's use are not domesticated which makes them extremely hard to control and handle. The experiments that go on behind closed doors are some of the most horrific things a human could think of too torture somebody or something. Animals in labs are literally used as models and are poked at and cut open like nothing is happening. When drug are tested on animals...

Words: 355 - Pages: 2

Premium Essay

Research

...Research Methodology & Fundamentals of MR. 100 Marks Course Content 1. Relevance & Scope of Research in Management. 2. Steps Involved in the Research Process 3. Identification of Research Problem. 4. Defining MR problems 5. Research Design 6. Data – Collection Methodology, Primary Data – Collection Methods / Measurement Techniques – Characteristics of Measurement Techniques – Reliability, Validity etc. – Secondary Data Collection Methods – Library Research, References – Bibliography, Abstracts, etc. 7. Primary and Secondary data sources and data collection instruments including in-depth interviews, projective techniques and focus groups 8. Data management plan – Sampling & measurement 9. Data analysis – Tabulation, SPSS applications data base, testing for association 10. Analysis Techniques – Qualitative & Quantitative Analysis Techniques – Techniques of Testing Hypothesis – Chi-square, T-test, Correlation & Regression Analysis, Analysis of Variance, etc. – Making Choice of an Appropriate Analysis Technique. 11. Research Report Writing. 12. .Computer Aided Research Methodology – use of SPSS packages Reference Text 1. Business Research Methods – Cooper Schindler 2. Research Methodology Methods & Techniques – C.R.Kothari 3. D. K. Bhattacharya: Research Methodology (Excel) 4. P. C. Tripathy: A text book of Research Methodology in...

Words: 5115 - Pages: 21

Premium Essay

Research

...Research Research is a systematic inquiry to describe, explain, predict and control the observed phenomenon. Research involves inductive and deductive methods (Babbie, 1998). Inductive methods analyze the observed phenomenon and identify the general principles, structures, or processes underlying the phenomenon observed; deductive methods verify the hypothesized principles through observations. The purposes are different: one is to develop explanations, and the other is to test the validity of the explanations. One thing that we have to pay attention to research is that the heart of the research is not on statistics, but the thinking behind the research. How we really want to find out, how we build arguments about ideas and concepts, and what evidence that we can support to persuade people to accept our arguments. Gall, Borg and Gall (1996) proposed four types of knowledge that research contributed to education as follows: 1. Description: Results of research can describe natural or social phenomenon, such as its form, structure, activity, change over time, relationship to other phenomena. The descriptive function of research relies on instrumentation for measurement and observations. The descriptive research results in our understanding of what happened. It sometimes produces statistical information about aspects of education. 2. Prediction: Prediction research is intended to predict a phenomenon that will occur at time Y from information at an earlier time X. In educational...

Words: 1179 - Pages: 5

Premium Essay

Research

...STEP 1etasblish the need for research We have to consider if it is a real need for doing a research? Research takes time and costs money. If the information is already available, decisions must be made now, we cant afford research and costs outweigh the value of the research, then the research is not needed. Step 2 define the problem or topic State your topic as a question. This is the most important step. Identify the main concepts or keywords in your question. Problem maybe either specific or general. Step 3 establish research objective Research objectives, when achieved, provide the Information necessary to solve the problem identified in Step 2. Research objectives state what the researchers must do. Crystallize the research problems and translate them into research objective. At this point, we will pin down the research question. Step 4 determine research design The research design is a plan or framework for conducting the study and collecting data. It is defined as the specific methods and procedures you use to acquire the information you need. based on the research objectives, we will determine the most appropriate research design: qualitative and/ or quantitative. • Exploratory Research: collecting information in an unstructured and informal manner. • Descriptive Research refers to a set of methods and procedures describing marketing variables. • Causal Research (experiments): allows isolation of causes and...

Words: 1099 - Pages: 5

Premium Essay

Research

...Research Article Research is important to every business because of the information it provides. There is a basic process to researching information and that process begins by deciding what information needs to be researched. The next step is to develop a hypothesis, which describes what the research paper is about and what the researcher’s opinion is regarding the topic. The research article chosen for this paper is titled, “The Anchor Contraction Effect in International Marketing Research.” The hypothesis for this paper is, “This raises the issue of whether providing responses on rating scales in a person’s native versus second language exerts a systematic influence on the responses obtained.” Simply explained, the hypothesis of this paper is to determine whether research questions should be in a person’s native language rather than expecting them to respond to questions in a language in which they might not be fluent. The hypothesis of this paper was accepted based on the research data gathered by the research team. This hypothesis was supported by nine studies using a variety of research methods. The research methods provided data that demonstrated the level of inaccuracy based on questions being asked in a language that was not the respondent’s native language. The research data provided insight into the probability of more accurate results when the respondent was asked questions in a manner that related well with their culture. There are several implications...

Words: 322 - Pages: 2

Premium Essay

Research

...ACE8001: What do we mean by Research? & Can we hope to do genuine Social Science Research (David Harvey)  What do we mean by research? What might characterise good research practice? There is no point in us trying to re-invent the wheel - other and probably more capable people than us have wrestled with this problem before us, and it makes good sense and is good practice to learn what they have discovered.  In other words - we need to explore more reliable and effective methods and systems for the pursuit of research than we have been doing so far. What is research? Dictionary Definitions of Research: * "The act of searching closely or carefully for or after a specified thing or person" * "An investigation directed to discovery of some fact by careful study of a subject" * "A course of scientific enquiry" (where scientific = "producing demonstrative knowledge") Howard and Sharp (HS) define research as:  "seeking through methodical processes to add to bodies of knowledge by the discovery or elucidation of non-trivial facts, insights and improved understanding of situations, processes and mechanisms".  [Howard, K. and Sharp, J.A. The Management of a student research project, Gower, 1983 - a useful and practical “how to do it” guide] Two other, more recent guides are: Denscombe, Martyn, 2002, Ground rules for good research: a 10 point guide for social research,  Open University Press. Robinson Library Shelf Mark: 300.72 DEN, Level 3 (several copies)...

Words: 4067 - Pages: 17

Free Essay

Research

...solve analytic models or whatever, but they often fail to demonstrate that they have thoroughly thought about their papers—in other words, when you push them about the implicit and explicit assumptions and implications of their research models, it appears that they haven’t really given these matters much thought at all.[1] Too often they fall back on saying that they are doing what they are doing because that is the way it is done in the prior literature, which is more of an excuse than a answer. (Of course, once a researcher reaches a certain age, they all feel that youngsters aren’t as good as they were in the good old days!) Therefore, in this class we shall go beyond simply studying research in managerial accounting. For many of you, this is your first introduction to accounting research and to PhD level class. Hence, in these classes we shall also learn how to solve business problems systematically and to understand what it means to have thoroughly “thought through” a paper. We begin not with academic research, but with some real world cases, because we should never forget that ours is an applied research field: accounting research is a means towards the end of understanding business and is not an end in itself, in the way pure science research is. Developing a systematic procedure for solving a real world business problem is the starting point for developing a...

Words: 2437 - Pages: 10

Free Essay

Research

...manger know about research when the job entails managing people, products, events, environments, and the like? Answer: Research simply means a search for facts – answers to questions and solutions to problems. It is a purposive investigation. It is an organized inquiry. It seeks to find explanations to unexplained phenomenon to clarify the doubtful facts and to correct the misconceived facts. Research is the organized and systematic inquiry or investigation which provides information for solving a problem or finding answers to a complex issue. Research in business: Often, organization members want to know everything about their products, services, programs, etc. Your research plans depend on what information you need to collect in order to make major decisions about a product, service, program, etc. Research provides the needed information that guides managers to make informed decisions to successfully deal with problems. The more focused you are about your resources, products, events and environments what you want to gain by your research, the more effective and efficient you can be in your research, the shorter the time it will take you and ultimately the less it will cost you. Manager’s role in research programs of a company: Managing people is only a fraction of a manager's responsibility - they have to manage the operations of the department, and often have responsibilities towards the profitability of the organization. Knowledge of research can be very helpful...

Words: 4738 - Pages: 19

Free Essay

Research

...Volume 3, number 2 What is critical appraisal? Sponsored by an educational grant from AVENTIS Pharma Alison Hill BSC FFPHM FRCP Director, and Claire Spittlehouse BSc Business Manager, Critical Appraisal Skills Programme, Institute of Health Sciences, Oxford q Critical appraisal is the process of systematically examining research evidence to assess its validity, results and relevance before using it to inform a decision. q Critical appraisal is an essential part of evidence-based clinical practice that includes the process of systematically finding, appraising and acting on evidence of effectiveness. q Critical appraisal allows us to make sense of research evidence and thus begins to close the gap between research and practice. q Randomised controlled trials can minimise bias and use the most appropriate design for studying the effectiveness of a specific intervention or treatment. q Systematic reviews are particularly useful because they usually contain an explicit statement of the objectives, materials and methods, and should be conducted according to explicit and reproducible methodology. q Randomised controlled trials and systematic reviews are not automatically of good quality and should be appraised critically. www.evidence-based-medicine.co.uk Prescribing information is on page 8 1 What is critical appraisal What is critical appraisal? Critical appraisal is one step in the process of evidence-based clinical practice. Evidencebased clinical practice...

Words: 4659 - Pages: 19

Free Essay

Research

...Contents TITLE 2 INTRODUCTION 3 BACKGROUND OF THE STUDY 3 AIM 4 OBJECTIVES 4 RESEARCH QUESTIONS 4 LITERATURE REVIEW 5 METHODOLOGY AND DATACOLLECTION 5 POPULATION AND SAMPLING 6 DATA ANALYSIS METHODS 6 PARTICIPANTS IN THE STUDY 7 STUDY PERIOD (GANTT CHART) 8 STUDY RESOURCES 9 REFERENCES 9 BIBLIOGRAPHY 9 APPENDICES: 10 * The Impact of Motivation through Incentives for a better Performance - Adaaran Select Meedhupparu Ahmed Anwar Athifa Ibrahim (Academic Supervisor) Applied Research Project to the Faculty of Hospitality and Tourism Studies The Maldives National University * * Introduction As it is clear, staff motivation is important in all the sectors especially in the tourism sector where we require highly skilled employees to get the best of their output to reach the organizational goals. Therefore, organizations spend a lot on their staff motivation in terms of different incentive approaches, such as financial benefits, training and development, appreciations, rewards and promotions. As mentioned in the title, the outline of the findings will be focused on the motivation of the staffs on improving their performances by the different incentive packages that they get at the resort. This study will be executed at Adaaran Meedhupparu by giving questionnaire to the staff working in different departments to fill up and return to the scholar to examine the current situation of staff satisfaction on motivation to do...

Words: 2768 - Pages: 12

Premium Essay

Research

...Importance Of Research Research is tool, which is utilized by my organizations and co-operations to have a fundamental knowledge of goods, products, and also to finding out consumer behavior. It is also a systematic investigation into the study of materials and sources inn order to establish facts and reach new conclusions. Research informed the marketers of Glidden paint because it helped them start from were the audience was which was Walmart in this case scenario. A marketer finds out what you want, and creates or finds product that fits you. Research aided the marketers of Glidden paint to come to a conclusion that not only should you hypothesize, you should also carry out experiments as well. In this case we see that the help of experiments helped them realize that Walmart’s brand name of saving money was not deterring the perceived quality of the paints. Meaning that Walmart’s cheap pricing of goods played no part in the durability of the paint. Research helped the marketers of Glidden paint realize that they could revamp the Walmart paint section, which has been ignored for years. They are confident that Glidden paint will do great numbers because they have raised awareness and created a media platform that consumers can interact with the most. Not only has research helped in satisfying consumer wants, it also gives the marketer an in-depth knowledge on the frequent changes of consumer taste. Research helped Glidden paint marketers realize...

Words: 319 - Pages: 2

Premium Essay

Research

...goal of the research process is to produce new knowledge or deepen understanding of a topic or issue. This process takes three main forms (although, as previously discussed, the boundaries between them may be obscure): * Exploratory research, which helps identify and define a problem or question. * Constructive research, which tests theories and proposes solutions to a problem or question. * Empirical research, which tests the feasibility of a solution using empirical evidence. There are two ways to conduct research: Primary research Using primary sources, i.e., original documents and data. Secondary research Using secondary sources, i.e., a synthesis of, interpretation of, or discussions about primary sources. There are two major research designs: qualitative research and quantitative research. Researchers choose one of these two tracks according to the nature of the research problem they want to observe and the research questions they aim to answer: Qualitative research Understanding of human behavior and the reasons that govern such behavior. Asking a broad question and collecting word-type data that is analyzed searching for themes. This type of research looks to describe a population without attempting to quantifiably measure variables or look to potential relationships between variables. It is viewed as more restrictive in testing hypotheses because it can be expensive and time consuming, and typically limited to a single set of research subjects. Qualitative...

Words: 498 - Pages: 2

Free Essay

Research

...Marketing Department, University of Strathclyde, Glasgow G4 0RQ, United Kingdom e-mail: 1sh@ukm.my 1,3 ABSTRACT This study examined the adequacy of using undergraduate student samples in research on online consumer attitudes by comparing the attitudes of students (n = 161) towards online retailing services with the attitudes of non-students (n = 252) towards such services. A structured questionnaire administered online was used to gather data on perceptions, satisfaction, and behavioral intentions with regard to online retailing services. The t-test results showed that, in general, students' attitude towards online retailing services is similar to that of non-students. Therefore, undergraduate students may be reasonable surrogates for consumers in research on online retailing. Keywords: internet users, electronic commerce, online consumer attitudes, online retailing services, student surrogates INTRODUCTION The usage of the internet as a communication and transaction medium in consumer markets is growing rapidly (Castells, 2000; Hart, Doherty, & EllisChadwick, 2000). In line with this expansion, consumer-based electronic commerce has become an emerging research area (e.g. Demangeot & Broderick, 2006, 2007; Teo, 2006; Tih & Ennis, 2006a, 2006b). In particular, a stream of research addressing issues related to online consumer attitudes (e.g. George, 2004; Wang, Chen, Chang, & Yang, 2007) and behaviors (see Cheung, Chan, & Limayem, 2005 for a review) has emerged. Although...

Words: 5659 - Pages: 23