Free Essay

Chi Square

In:

Submitted By paola1292
Words 5119
Pages 21
Chapter 23

Chi-Square Tests

23
Chi-Square Procedures
The Chi-Square Formula
The Chi-Square Critical Value
Chi-Square Goodness of Fit Test
Chi-Square Test of Independence
Cautions in Using Chi-Square
Dr. Helen Ang studied the relationship between predominant leadership style and educational philosophy of administrators in Christian colleges and universities for her Ed.D. dissertation in 1984.1
Leadership Style was a categorical variable with the following five levels (with percentages of the 113 administrators studied): team administrator (high people/high task: 23%), constituency-centered (moderate people/moderate task: 16%), authorityobedience (low people/high task: 4%), comfortable-pleasant (high people/low task:
38%), and caretaker (low people/low task: 19%).2
Educational Philosophy Profile was a categorical variable with the following six levels (with percentages): idealism (7%), realism (4%), neo-thomism (15%), pragmatism (58%), existentialism (1%), and “eclectic” (16%).3
Applying the Chi-Square Test of Independence, Dr. Ang found that the variables
Leadership Style and Educational Philosophy were independent (χ2 = 21.676, χ2cv =
31.410, a=0.05, df=20).4
The chi in chi-square is the Greek letter χ, pronounced ki as in kite. Chi-square
(χ2) procedures measures the differences between observed (O) and expected (E) frequencies of nominal variables, in which subjects are grouped in categories or cells.
There are two basic types of chi-square analysis, the Goodness of Fit Test, used with a single nominal variable, and the Test of Independence, used with two nominal variables. Both types of chi-square use the same formula.

The Chi Square Formula
The chi-square formula is as follows:

Helen C. Ang, “An Analytical Study of the Leadership Style of Selected Academic Administrators in
Christian Colleges and Universities as Related to their Educational Philosophy Profile,” (Fort Worth,
Texas: Southwestern Baptist Theological Seminary, 1984).
2
3
4
Ibid., 28-29, 46
Ibid., 45
Ibid., 47
1

© 4th ed. 2006 Dr. Rick Yount

2 3-1

Research Design and Statistical Analysis in Christian Ministry

IV: Statistical Procedures

where the letter O represents the Observed frequency -- the actual count -- in a given cell. The letter E represents the Expected frequency -- a theoretical count -- for that cell.
Its value must be computed.
The formula reads as follows: “The value of chi-square equals the sum of O-E differences squared and divided by E.” The more O differs from E, the larger χ2 is.
When χ2 exceeds the appropriate critical value, it is declared significant.

The Goodness of Fit Test
The Goodness of Fit Test is applied to a single nominal variable and determines whether the frequencies we observe in k categories fit what we might expect. Some textbooks call this procedure the Badness of Fit Test because a significant χ2 value means that Observed counts do not fit what we Expect. The Goodness of Fit Test can be applied with equal or proportional expected frequencies (EE, PE).

Equal Expected Frequencies
Equal expected frequencies are computed by dividing the number of subjects (N) by the number of categories (k) in the variable. A classic example of equal expected frequencies is testing the fairness of a die. If a die is fair, we would expect equal tallies of faces over a series of rolls.

The Example of a Die
Let’s say I roll a real die 120 times (N) and count the number of times each face (k
= 6) comes up. The number “1” comes up 17 times, the number “2” 21 times, “3” 22 times, “4” 19 times, “5” 16 times, and “6” 25 times. Results are listed under the “O” column below.
We would Expect a count of 20 (E=N/k) for each of the six faces (1-6). This E value of 20 is listed under the “E” column below.
O
1
2
3
4
5
6

E

17
21
22
19
16
25
120

20
20
20
20
20
20
120

O-E
-3
1
2
-1
-4
5
Σ(O-E) = 0

(O-E)²
9
1
4
1
16
25

(O-E)²/E
.45 (=9/20)
.05
.20
.05
.80
1.25
χ2 = 2.80

The chart above shows the step-by-step procedure in computing the chi-square formula. Notice that both O and E columns add to the same value (N=120).

Computing the Chi Square
The first step is to subtract expected frequencies (E) from the observed (O). These differences fall under the “O-E” column. Notice that Σ(O-E)=0, just as Σx=0.
The second step is to square the differences. These squares are found under the
“(O-E)2” column.
The third step is to divide the squared differences by the expected values. Each of these values, shown in the last column, is the portion of the chi-square total derived

2 3-2

© 4th ed. 2006 Dr. Rick Yount

Chapter 23

Chi-Square Tests

from each category. For example, the largest contributor of the chi-square is the high tally in category “6”. It yields 1.25 of the 2.80 total.
The fourth step is to sum the values in the last column to produce the final chisquare value — in this case, 2.80.

Testing the Chi Square Value
The computed value of χ2 is compared to the appropriate critical value. The critical value is found in the Chi-square Table (see Appendix A3-3). Using α and df, locate the critical value from the table.
For the Goodness of Fit Test, the degrees of freedom (df) equal the number of categories (k) minus one (df=k-1). In our example above, the critical value (α=0.05, df=5) is 11.07. Since the computed value (2.80) is less than the critical value (11.07), we declare the χ2 not significant.

Translating into English
What does this non-significant χ2 mean in English? The observed frequencies of the six categories of die rolls do not significantly differ from the expected frequencies.
The observed frequencies have a “good fit” with what was expected. Or, simply stated, “The die is fair.”
Had the computed value been greater than 11.07, the χ2 would have been declared significant. This would mean that the difference between observed and expected values is greater than we expect by chance. The observed frequencies would have a “bad fit” with what was expected. Or simply stated, “The die is loaded.”
Equal E is usually an unrealistic assumption of the break-down of categories. A better approach is to compute proportional expected frequencies (PE).

Proportional Expected Frequencies
With proportional expected frequencies, the expected values are derived from a known population. Suppose you are in an Advanced Greek class of 100 students. You notice a large number of women in the class, and wonder if there are more women in the class than one might expect, given the student population. Using equal E’s, you would use the value (E=N/k) of 50. But you know that women make up only 15% of the student population. This gives you expected frequencies of 15 women (.15 x 100) and 85 men (.85 x 100). This latter design is far more accurate than the EE value of 50.

The Example of Political Party Preference
Suppose you want to study whether political party preference has changed since the last Presidential election. A poll of 1200 voters taken four years before showed the following breakdown: 500 Republicans, 400 Democrats, and 300 Independents. The ratio equals 5:4:3. In your present study, you poll 600 registered voters and find 322
Republicans, 184 Democrats, and 94 Independents.5 The null hypothesis for this study is that party preference has not changed in four years. That is, your hypothesis is that the present observed preferences are in a ratio of 5:4:3.

Computing the Chi Square Value
Compute the expected frequencies as follows. The ratio of 5:4:3 means there are
5+4+3=12 parts. Twelve parts divided into 600 voters yield 50 voters per part (600/
5
Dennis E. Hinkle, William Wiersma, and Stephen G. Jurs, Basic Behavioral Statistics (Boston: Houghton Mifflin Company, 1982), 308-310

© 4th ed. 2006 Dr. Rick Yount

2 3-3

Research Design and Statistical Analysis in Christian Ministry

IV: Statistical Procedures

12=50).
The first category, Republicans, has 5 parts (5:4:3), or 5x50=250 Expected voters.
The second, Democrats, has 4 (5:4:3) parts, or 4x50=200 Expected voters. The third,
Independents, has 3 parts (5:4:3), or 3x50=150 Expected voters. Putting this in a table as before, we have the following:
O
Rep
Dem
Ind

E

322
184
94
600

250
200
150
600

O-E
72
-16
-56
Σ(O-E) = 0

(O-E)2

(O-E)2/E

5184
256
3136

20.74
1.28
20.91 χ2 = 42.93

Notice that both O and E columns add to 600 (N). Notice that the O-E column adds to zero. Notice that the E values are unequal, reflecting the 5:4:3 ratio derived from the earlier poll. The resulting χ2 value equals 42.93.

Testing the Chi Square
The critical value (α=0.05, df=2) is 5.991. Since the computed value of 42.93 is greater than the critical value of 5.991, we declare the chi-square value significant.
The observed values do not fit the expected values.

Translate into English
Since the recent poll does not fit the ratio of 5:4:3 found in the earlier poll, we can say that party preference has changed over the last four years.

Eyeball the Data
But HOW has political party preference changed? We can determine this by what some statisticians call “eye-balling the data.” The greatest part of the chi square value came from Republicans and Independents.
R
D
I

322
184
94

250
200
150

72
-16
-56

5184
256
3136

20.74
1.28
20.91

Looking at the O-E column, we see that we observed more Republicans than we expected (322 > 250), and fewer Independents than expected (94 < 150), based on data from four years before. It is this twisting ( ) effect that causes the large chisquare value.
In summary, the Goodness of Fit procedure tests one variable across k categories.
The computed value is tested for significance at α and df = k-1. The expected frequencies for each category can be equal (EE) or proportional (PE).

Chi-Square Test of Independence
The test of independence analyzes the relationship between two nominal variables. The procedure uses the special terms independent to mean not related, and not

2 3-4

© 4th ed. 2006 Dr. Rick Yount

Chapter 23

Chi-Square Tests

independent to mean related. The two nominal variables form a contingency table of cells. The Contingency Table
My wife’s Master’s thesis studied the relationship between whether Schools for the Deaf identified giftedness in their students (Schools) and whether the schools were predominantly aural/oral, total communication, or a combination ( language preference).6 The column variable schools had two levels: Level I schools of the deaf did not identify students as “gifted,” while Level II schools of the deaf did.
The row variable language preference had three levels. Aural/Oral schools are those who emphasized speech-reading methods of education of the deaf — they did not use sign language. Total Comm schools emphasized the total communication method of deaf education, which includes American Sign Language. Both schools used both approaches.

Each of the 47 schools in the study were categorized by both variables and placed into one of 6 cells. How many deaf schools identify giftedness in their students (II) and use total communication as their primary approach? [15]. How many schools use aural/oral methods and do not identify giftedness in their students (I)? [3].
The table also includes margin totals, labelled “Total.” The total number of aural/ oral schools, regardless of school type, for example, was 3. The total number of Type I schools, regardless of language preference, was 27. The margin totals for the row variable are called row totals (3, 35, 9). The margin totals for the column variable are called column totals (27, 20). The sum of column totals (47) equals the sum of row totals (47) — a good check on math accuracy. Margin totals are the means by which expected values are computed.

Expected Cell Frequencies
Each cell requires an Expected value to match its O value. Expected cell frequencies are computed from the margin totals. Using the above contingency table, let’s focus on the Expected value for the upper left cell.
The three necessary numbers to compute the upper left cell E value are 47 (Total),
27 “category I” (Column total) and 3 “aural/oral methods” (row total).
The number of schools we would expect for this cell, given no relationship between the two variables is found by multiplying Column 1 Total (27) by Row 1 Total (3),
Barbara Parish Yount, “An Analytical Study of the Procedures for Identifying Gifted Students in
Programs for the Hearing-Impaired”, (Master of Arts Thesis, Texas Woman's University, 1986). The term
“aural/oral” refers to use of speech-reading and speech skills in teaching. The term “total communication” refers to using any mode of communication, especially American Sign Language, in teaching.
6

© 4th ed. 2006 Dr. Rick Yount

2 3-5

Research Design and Statistical Analysis in Christian Ministry

IV: Statistical Procedures

divided by the Total (47), or,

E=

(27x3) / 47 = 1.723
/|
\ col 1

row 1

total

Putting this in more general terms, we can show the computation of the Expected values for all cells in a 3x4 contingency table.

The above table shows three levels of a column variable (1, 2, 3) and four levels of a row variable (I, II, III, IV). Once the observed frequencies are placed in the table and margin totals computed, expected values for each cell can be computed. The Expected value for cell 3,27 is found by multiplying the cell's row total (C) by its column total (Y) and dividing by the Table total (T). Once the expected cell frequencies are computed, the remainder of the computation is the same as demonstrated before. O-E, (O-E)2, (OE)2/E for each cell.

Degrees of Freedom
We determine df for the Test of Independence by the formula df = (r-1)(c-1), where r = the number of rows and c = the number of columns in the contingency table. For a contingency table of 5 rows and 6 columns, the degrees of freedom would be (5-1)(6-1) or 20. (Each variable loses one degree of freedom).

Application to a Problem
Let’s apply this to our example on deaf schools. The expected frequencies are shown bold-faced in parentheses () below. It is suggested that you compute several of these to insure your understanding of the procedure.

Cell 3,2 refers to the cell at row 3, column 2, shown in the table as the shaded cell.

7

2 3-6

© 4th ed. 2006 Dr. Rick Yount

Chapter 23

Chi-Square Tests

Putting the O and E values into a chart, we have the following computations:
O
3
20
4
0
15
5

E
1.72
20.11
5.17
1.28
14.90
3.83

(O-E)2
1.638
.012
1.369
1.638
.010
1.369

(O-E)
1.28
-0.11
-1.17
-1.28
.10
1.17

(O-E)2/E
0.953
.001
.265
1.280
.001
.357

χ2 = 2.857 df = (3-1)(2-1) = 2 χ2cv = 5.991

The computed value of 2.857 is smaller than the critical value of 5.991. Therefore, the value is declared not significant. The statistical decision is to retain the null hypothesis. In terms of this study, language preference and school category are not related. It appears that educational approach is unrelated to identifying giftedness in deaf students in these 47 deaf schools.

Party Preference Revisited
Does gender relate to party preference? Let’s categorize our 600 voters on these two variables and test this. Again, expected values are shown in (). Here’s the data:
Male

Female

Total

Republican

170
(187.83)

152
(134.17)

322

Democrat

112
(107.33)

72
(76.67)

184

68
(54.83)

26
(39.17)

94

Independent

350

Total

250

600

Here’s our chart. Identify the O’s and E’s above in the chart below.

RM
DM
IM
RF
DF
IF

O
170
112
68
152
72
26

E
187.83
107.33
54.83
134.17
76.67
39.17

(O-E)
-17.83
4.67
13.17
17.83
-4.67
-13.17

(O-E)2
317.91
21.81
173.45
317.91
21.81
173.45 χ2 =

© 4th ed. 2006 Dr. Rick Yount

(O-E)2/E
1.69
.20
3.16
2.37
.28
4.43
12.13

2 3-7

Research Design and Statistical Analysis in Christian Ministry

IV: Statistical Procedures

The computed value of 12.13 is larger than the critical value of 5.991 (0.05, df=2).
Therefore, the value is declared significant.
The statistical decision is to reject the null hypothesis.
In terms of this study, this result means that gender and political party preference are related. One’s political preference is influenced by his or her gender. How are these two variables related? We can answer this by “eyeballing the data” in the table.
The greatest part of the chi square comes from the
FEMALE-INDEPENDENT
(IF) cell. We observe fewer women independents (↓) than we expect by chance (26 vs.
39.17).
The second highest value comes from the
MALE-INDEPENDENT (IM) cell. We observe more male independents (↑) than we expect by chance (68 vs. 54.83). Notice that men outnumber women across independent.
FEMALE-REPUBLICAN (RF) cell. We
The third highest value comes from the observe more women republicans (↑) than we expect by chance (152 vs. 134.17).
MALE-REPUBLICAN (RM) cell. We
The fourth highest value comes from the observe fewer male republicans (↓) than we expect by chance (170 vs. 187.83). Notice that women outnumber men across republican.
RM
DM
IM
RF
DF
IF

170
112
68
152
72
26

187.83
107.33
54.83
134.17
76.67
39.17






-17.83
4.67
13.17
17.83
-4.67
-13.17

317.91
21.81
173.45
317.91
21.81
173.45

1.69
.20
3.16
2.37
.28
4.43

The arrows show the twisting motion in the table that indicates that the two variables are related.

Strength of Association
The chi-square test of independence tells you whether two nominal variables are related or not. It does not tell you how strong that relationship is. When you produce a significant chi-square (two variables are related), it is natural to wonder how strong the relationship is. Two procedures can provide such measures: the Contingency
Coefficient (C) and Cramer’s phi (φC).

Contingency Coefficient
The contingency coefficient (C) computes a “Pearson r” type correlation coefficient from a computed value. The formula is

If you get, say, a chi-square value of 63.383 (significant at α = 0.001) with a sample size of 390, then you can compute the degree of association by

2 3-8

© 4th ed. 2006 Dr. Rick Yount

Chapter 23

Chi-Square Tests

If we were to compare this to a maximum value of 1.00, we would conclude that
0.398 is a weak correlation. But the maximum value for C is not 1.00. It is estimated by another formula:

where k is the number of categories in the variable with the fewer categories. Let’s say in our case that one of our variables has 6 categories and the other has 3. Then, k = 3.
The maximum value C can take is then computed as Cmax = (3-1)/3, or 0.817. Comparing 0.389 to 0.817, we would say that we have a moderately strong correlation.8

Cramer’s Phi
While the contingency coefficient is popular, a better alternative to the measurement of association in a contingency table is Cramer’s phi. The advantage of this procedure is that it ranges from 0.00 to +1.00 and is independent of the size of the table. Cramer’s Phi is defined as

Cautions in Using Chi-Square
Chi square is a simple yet powerful statistic. It lends itself well to categorical data gained through questionnaires or interviews. It can also be used with continuous data that has been categorized — dividing test scores into high, medium, and low categories for example. This latter approach is easy, but there may be better ways (z, t, F) to analyze them, as we’ve already seen.
There are, however, dangers to avoid in choosing this technique. These include small expected frequencies, the assumption of independence, the inclusion of nonoccurrences, and whether this approach should be your primary statistical tool.

Small expected frequencies
When expected cell frequencies are small, the computed chi square does not fit the distribution of the statistic correctly. In this case, the results of significance testing is suspect. How small is small? Howell takes the conservative position that all expected frequencies be at least 5.9
Others hold that the average expected cell frequency should be 5. That is, the ratio of subjects to cells must be greater than 5. Plan ahead. Don’t make the mistake of one of my students who planned to use chi-square to study two variables: one had 5 levels and the other 6 levels, giving 30 cells. He thought that 50 subjects would be “more than plenty.” Dividing 50 by 30 (N/k) gave him an average cell size of 1.67. To get up to the minimum of 5, he needed 150 subjects.
Another student of mine, having supposedly read the preceding warning, sugHinkle, p. 320

8

© 4th ed. 2006 Dr. Rick Yount

Howell, p. 105

9

2 3-9

Research Design and Statistical Analysis in Christian Ministry

IV: Statistical Procedures

gested a dissertation using a chi-square table of 16 rows and 16 columns (256 cells) and considered 200 subjects more than enough.10
The reason is power. Fewer subjects than “5 per cell” will not allow the chisquare procedures to detect relationships that may exist. If you plan correctly, but lose subjects during the study, or find some category tallies to be much smaller than anticipated, remember that your significance tests are suspect.

Assumption of Independence
We noted in Barb's study of deaf schools that each one of the 47 schools were placed in one and only one cell in the contingency table. Each school was independent of every other school. The assumption of independence means that each subject is located in one and only one cell in the contingency table.
This mistake is easy to make — usually by having subjects respond more than once. A student came into my office with a contingency table of tally marks in the fall in 1981 -- my first semester on faculty. His table was the result of $200 in mailings,
$300 to a statistician across town, and the prior 10 months of his life. He had listed various educational programs down one side of the contingency table, and five levels of ratings across the top. Each subject checked off a rating for each program. He had
60 subjects and 300 tallies! The observations were not independent (each subject made five responses in the table). He had produced a chi-square value, but the value was meaningless. I encouraged him strongly to go back to his statistican and have him work out another approach to analyzing his data. Proper Planning Prevents Poor
Performance — and sleepless nights, as well.

Inclusion of Non-Occurrences
There is one final warning I would make about use of chi-square, and this involves the handling of non-occurances. Let’s say you ask 20 men and 20 women whether they favor “Variable” or not. Seventeen men and eleven women say "Yes."
With 28 “yes” responses, we can compute equal E’s as 28/2=14. The analysis would be set up as follows:

Male
Female

O
17
11

E
14
14

(O-E)2
9
9

O-E
3
-3
0

χ2

(O-E)2/E
0.643
0.643
= 1.286

This faulty design produces a chi square of 1.286 and is not significant. The fault lies in the fact that the number of “no’s” for males and females is excluded.
The correct approach is to build a contingency table as follows, which includes both yes and no responses:
Male
Yes
No

10

Female

17
3
20

11
9
20

28
12
40

16x16x5 = 1280 subjects minimum

2 3-10

© 4th ed. 2006 Dr. Rick Yount

Chapter 23

Male Yes
Male No
Female Yes
Female No

Chi-Square Tests

O
17
3
11
9

E
14
6
14
6

O-E
3
-3
-3
3
0

(O-E)2
9
9
9
9

(O-E)2/E
0.643
1.500
0.643
1.500
2
χ = 4.286

Now χ2 = 4.286 and is significant (χ2cv = 3.84, df = 1, 0.05). Looking only at "yes" responses (excluding "no"s) invalidated the test. Further, it lowered the value of chi square, leaving us with a non-significant finding -- incorrectly.

Chi-Square as Primary Statistic?
Some students make the mistake of depending on a single chi square test as their dissertation's only statistical tool. A doctoral student worked six months collecting data and synthesizing literature. He walked into my office with a 3x5 contingency table. I entered his 15 observed frequencies into my computer, hit the RUN key, and a second later the answer flashed on the screen: NOT SIGNIFICANT.
After a moment of shock, he said, “Six months of my life. . .and it took a second to say its not significant?!” He had very little to say about his subjects because he had rested all his analytical hopes on a single chi-square statistic. His dissertation's Chapter
Three (Procedure for Analysis of Data) was thin. His dissertation passed only after additional (unplanned) weeks of research and writing.
It is better to use the t-Test, ANOVA, or multiple regression as a primary statistic.
Then use several chi square tests to analyze secondary variables, or sub-hypotheses, in your study. For example, "Does gender, income level, geographic location, year of birth, marital status, age saved, years in the ministry, education level. . .relate to your main variable?"
Or, spend your time and energy developing the process and meaning of the variable categories themselves. Dr. Helen Ang, featured at the beginning of the chapter, used chi square to test the relationship between “leadership style” and “educational philosophy.” The chi square was simple, but creating the instruments to measure these variables -- the main focus of the dissertation -- was difficult.

Summary
In this chapter we’ve introduced the concept of non-parametric, or distributionfree, statistics. We’ve looked at the chi-square Goodness of Fit tests with both equal and proportional expected frequencies. We’ve studied the chi-square Test of Independence. The concept of degrees of freedom was discussed. We’ve illustrated how the chi-square statistic is computed, how the critical value is obtained and what “significance” means in English.

Example
Dr. Roberta Damon's dissertation was cited (p. 19-1) for her use of the one11
This student is now professor at a prominent Christian university, author of many books, and a prominent leader in his professional organization, proving that “unsignificant research findings” need not impair one's career!
12
Roberta Damon, “A Marital Profile,” p. 70

© 4th ed. 2006 Dr. Rick Yount

2 3-11

Research Design and Statistical Analysis in Christian Ministry

IV: Statistical Procedures

sample z-Test. She also used the chi-square Test of Independence to analyze relationships among several other variables.
First, she found that level of marital satisfaction and age category were not independent among missionary wives of her sample (χ2 = 7.525, χ2cv = 5.99, df=2,
0.05). The younger wives expressed higher marital satisfaction than older wives.
Second, she found that conflict resolution and age category were not independent among missionary wives of her sample (χ2 = 6.4513, χ2cv = 5.99, df=2, 0.05). The younger wives were more satisfied with the way conflict is resolved in their marriage than older women.12

Vocabulary contingency coefficient contingency table
Cramer’s phi distribution-free tests equal expected frequencies expected frequencies margin totals observed frequencies proportional expected frequencies

measures strength of assoc’n between two nominal variables () table of rows and columns in chi square test of independence measures strength of assoc’n between two nominal variables (c) statistics which do not assume a normal distribution of data
E-values computed by dividing N by k theoretical values by which observed frequencies are tested (E) sums of counts used to compute E’s in chi-sq test of independence actual counts of subjects in chi-square categories (O)
E-values computed by known percentages in population

Study Questions
1. What are the critical values for the following conditions:
a. 3 rows, 1 column, p=0.05
b. 5 rows, 3 columns, p=0.01
c. 4 rows, 9 columns, p=0.005
2. Define df for both Goodness of Fit and Test of Independence. Demonstrate how that “k-1” and “(r-1)(c-1)” are the proper terms for the two df’s.
3. You’ve done your analysis and your computed chi square is less than the critical value.
What does this mean, given you are testing one variable?
4. If you have a table with 5 rows (margin totals A..E) and 6 columns (margin totals U..Z), what would the expected value of the cell at row 4, column 2 be?

Sample Test Questions
1. All of the following kinds of data can be tested with chi square except
A. dichotomized data
B. categorized ratio data
C. nominal data
D. continuous interval data

2 3-12

© 4th ed. 2006 Dr. Rick Yount

Chapter 23

Chi-Square Tests

2. The term (O-E) in chi square is most closely related to ___ in the z-test.
A. X2
B. X
C. x
D. x2
3. A “significant chi square” for the Test of Independence means that
A. two nominal variables are related.
B. two independent groups are different.
C. two dependent groups are different.
D. two categorical variables are independent.
4. A contingency table has 2 columns and 8 rows. The proper df is
A. 16
B. 14
C. 7
D. 1

© 4th ed. 2006 Dr. Rick Yount

2 3-13

Research Design and Statistical Analysis in Christian Ministry

2 3-14

IV: Statistical Procedures

© 4th ed. 2006 Dr. Rick Yount

Similar Documents

Free Essay

Chi Square

...CHI-SQUARE TEST Adapted by Anne F. Maben from "Statistics for the Social Sciences" by Vicki Sharp The chi-square (I) test is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories. Do the number of individuals or objects that fall in each category differ significantly from the number you would expect? Is this difference between the expected and observed due to sampling error, or is it a real difference? Chi-Square Test Requirements 1. Quantitative data. 2. One or more categories. 3. Independent observations. 4. Adequate sample size (at least 10). 5. Simple random sample. 6. Data in frequency form. 7. All observations must be used. Expected Frequencies When you find the value for chi square, you determine whether the observed frequencies differ significantly from the expected frequencies. You find the expected frequencies for chi square in three ways: I . You hypothesize that all the frequencies are equal in each category. For example, you might expect that half of the entering freshmen class of 200 at Tech College will be identified as women and half as men. You figure the expected frequency by dividing the number in the sample by the number of categories. In this exam pie, where there are 200 entering freshmen and two categories, male and female, you divide your sample of 200 by 2, the number of categories, to get 100 (expected frequencies) in each category. 2. You determine the expected...

Words: 1536 - Pages: 7

Free Essay

Chi-Square Analysis

...Pearson Chi-Square significance value is less than .05 which means income can affect the probability that a person will eat at Hobbit’s Choice. Probable Hobbit’s patrons are more likely to make between $50,000 and 74,999 (93%) a year than non-probable patrons (7%). Income | Probable Patron | Non-Probable Patron | <$15,000 | 0% | 100% | $15,000 to 24,999 | 0% | 100% | $25,000 to 49,999 | 0% | 100% | $50,000 to 74,999 | 3% | 97% | $75,000 to 99,999 | 62.5% | 32.5% | $100,000 to 149,999 | 93% | 7% | $150,000+ | 84.8% | 15.2% | *Please see Appendix ____ for SPSS Output * The Pearson Chi-Square significance value is less than .05 which means that educational level has an effect on the probability that a person will be a patron of Hobbit’s Choice. In other words, level of education differentiates patrons from non-patrons. Probable Hobbit’s Choice patrons are more likely to have a Doctorate degree (77.8%) than non-patrons (22.2%). In fact, most/all (which one?) probable patrons have more than some college. 0% of survey respondents that list “no degree” are probable patrons. Educational Level | Probable Patron | Non-Probable Patron | Some College or Less | 0% | 100% | Associate Degree | 21.4% | 78.6% | Bachelor’s Degree | 27.7% | 72.3% | Master’s Degree | 39.5% | 60.5% | Doctorate Degree | 77.8% | 22.2% | *Please see Appendix ____ for SPSS Output * Gender does not differentiate patrons from non-patrons because its Pearson Chi-Square significance...

Words: 2269 - Pages: 10

Free Essay

Chi Square Test

...ANSWERS Process of Science (9.21) How Is the Chi-Square Test Used in Genetic Analysis? Lab Notebook Chi-Square test for Case 1 | | | | | | | |Phenotype |Observed No. (o) |Expected No. (e) |(o-e) |(o-e) 2 |(o-e) 2 | | | | | | |e | |Red eyes |31 |33 |2 |4 |0.1212 | |Sepia eyes |13 |11 |2 |4 |0.3636 | | |0.4848 | |(2 (to the nearest ten-thousandth) | | Questions 1. Why is it important to remove the adults in the parental generation? It is important to keep the generations separate so that you know you are crossing only F1 flies. 2. What generation will their offspring be? The new offspring are the F2 generation. 3. Based on the data obtained, is the cross in Case 1 monohybrid or dihybrid? Explain. The cross is monohybrid because only one trait –eye color– is involved...

Words: 449 - Pages: 2

Free Essay

Chi Square Test

...The goal of this exercise is to assess your understanding of the chi square test. Please read the problem carefully and answer the question. Good luck! Assume you have data below that displays the number of students who elect different undergraduate majors. Number of Students Selecting Different Majors | |Computer Sciences |English Literature | | | | |Pre-Med | | |Education |Engineering |Total | |50 |85 |25 |60 |80 |300 | We want to know whether those numbers differ due to chance. In other words, at 0.01 level of confidence, are some majors selected more often than others, or is the selection pattern essentially random? The null hypothesis is that the programs are equally preferred. Create a table that shows the computation of the Chi Square statistic [6 POINTS]. Use a decision rule to determine whether the null hypothesis is rejected or not [4 POINTS]. Solution: Ho: The majors are equally preferred (probability of liking each major = 1/5). HA: The majors are not equally preferred. (Using the Chi Square Statistic to evaluate to what extent the hypothesis and data have a good fit. [pic] Where, Oi is actual frequency observed in cell i ...

Words: 344 - Pages: 2

Free Essay

Chi Square

...Abstract: The purpose of my project is to find out two things about students at my school: 1. Is hair related to eye color? 2. Is favorite color related to favorite ice cream flavor? I took a survey of students, and used the chi square (χ2) statistic to see if the data is related. The χ2 statistic showed that hair color and eye color are related, but favorite color and favorite ice cream flavor are not related. Purpose: To use statistics to find out two things about students at my school: 1. Is hair related to eye color? 2. Is favorite color related to favorite ice cream flavor? Research: I chose this project because I wanted to learn more about probability and statistics. I can use statistics to answer a question about students at my school. χ2 is used to compare sets of descriptive data. Descriptive data are things like colors, flavors, names, and other things that cannot be described by just a number, like height or weight. I picked hair color and eye color because I thought they would be related. I wanted to test this. I picked favorite color and favorite ice cream flavor because I didn’t think they would be related. I wanted to test this also. Hypotheses: First Hypothesis: Eye color and hair color will be related. In statistical terms: Null Hypothesis (H0): There is no relationship between eye color and hair color. Alternative Hypothesis (HA): There...

Words: 1501 - Pages: 7

Premium Essay

Crosstabulation & Chi Square

...Crosstabulation & Chi Square Robert S Michael Chi-square as an Index of Association After examining the distribution of each of the variables, the researcher’s next task is to look for relationships among two or more of the variables. Some of the tools that may be used include correlation and regression, or derivatives such as the t-test, analysis of variance, and contingency table (crosstabulation) analysis. The type of analysis chosen depends on the research design, characteristics of the variables, shape of the distributions, level of measurement, and whether the assumptions required for a particular statistical test are met. A crosstabulation is a joint frequency distribution of cases based on two or more categorical variables. Displaying a distribution of cases by their values on two or more variables is known as contingency table analysis and is one of the more commonly used analytic methods in the social sciences. The joint frequency distribution can be analyzed with the chi2 square statistic ( χ ) to determine whether the variables are statistically independent or if they are associated. If a dependency between variables does exist, then other indicators of association, such as Cramer’s V, gamma, Sommer’s d, and so forth, can be used to describe the degree which the values of one variable predict or vary with those of the other variable. More advanced techniques such as log-linear models and multinomial regression can be used to clarify the relationships contained...

Words: 3702 - Pages: 15

Premium Essay

Marketing Research Cases 14 and 15

...Case 14.1 1. Correlations | | Prefer Drive Less than 30 Minutes | Prefer Unusual Desserts | Prefer Large Variety of Entrees | Prefer Unusual Entrees | Prefer Drive Less than 30 Minutes | Pearson Correlation | 1 | .768** | .806** | .765** | | Sig. (2-tailed) | | .000 | .000 | .000 | | N | 400 | 400 | 400 | 400 | Prefer Unusual Desserts | Pearson Correlation | .768** | 1 | .823** | .868** | | Sig. (2-tailed) | .000 | | .000 | .000 | | N | 400 | 400 | 400 | 400 | Prefer Large Variety of Entrees | Pearson Correlation | .806** | .823** | 1 | .831** | | Sig. (2-tailed) | .000 | .000 | | .000 | | N | 400 | 400 | 400 | 400 | Prefer Unusual Entrees | Pearson Correlation | .765** | .868** | .831** | 1 | | Sig. (2-tailed) | .000 | .000 | .000 | | | N | 400 | 400 | 400 | 400 | **. Correlation is significant at the 0.01 level (2-tailed). | Null Hypothesis- No relation between preference to drive 30 minutes or less and preference of menu items Alternative Hypothesis- There is a relation between the preference to drive 30 minutes or less and preference of menu items Interpretation-All the correlations have sig values that are significantly different from zero. So, we reject the null hypothesis. The correlations are positive and they are in the moderate range. As the preference to drive 30 minutes or less increases, so do preferences for unusual deserts, large variety of entrees, and unusual entrees. Correlations | | Prefer Drive Less than 30 Minutes | Prefer...

Words: 3383 - Pages: 14

Free Essay

One Way Anova

...ONE WAY ANOVA One-way analysis of variance (abbreviated one-way ANOVA) is a technique used to compare means of two or more samples (using the F distribution). This technique can be used only for numerical data. The ANOVA tests the null hypothesis that samples in two or more groups are drawn from populations with the same mean values. To do this, two estimates are made of the population variance. These estimates rely on various assumptions. The ANOVA produces an F-statistic, the ratio of the variance calculated among the means to the variance within the samples. If the group means are drawn from populations with the same mean values, the variance between the group means should be lower than the variance of the samples, following the central limit theorem. A higher ratio therefore implies that the samples were drawn from populations with different mean values. Descriptives | | N | Mean | Std. Deviation | Std. Error | 95% Confidence Interval for Mean | Minimum | Maximum | | | | | | Lower Bound | Upper Bound | | | QUALITY | 1 | 19 | 3.89 | .809 | .186 | 3.50 | 4.28 | 2 | 5 | | 2 | 12 | 3.83 | .937 | .271 | 3.24 | 4.43 | 1 | 5 | | Total | 31 | 3.87 | .846 | .152 | 3.56 | 4.18 | 1 | 5 | PRICE | 1 | 19 | 2.95 | .911 | .209 | 2.51 | 3.39 | 1 | 5 | | 2 | 12 | 2.75 | 1.055 | .305 | 2.08 | 3.42 | 1 | 5 | | Total | 31 | 2.87 | .957 | .172 | 2.52 | 3.22 | 1 | 5 | BRAND | 1 | 19 | 4.11 | .809 | .186 | 3.72 | 4.50 | 3 | 5 | | 2 | 12 | 4.17 | .577 | .167...

Words: 1377 - Pages: 6

Free Essay

Student

...| | | | | | | CROSSTABS VARIABLES ANALYZED | | | | | | | Row Variable ->> | Do you use Friendly Market regularly | | | | | | Column Variable ->> | I always pay cash. | | | | | | | | | | | | | | | | Observed Frequencies |   |   |   | | | | | | Disagree | Neutral | Agree | Grand Total | | Statistical Values | No | 9 | 62 | 19 | 90 | | Chi Sq | df | Sig | Yes | 16 | 39 | 17 | 72 | | 5.38 | 2 | 0.07 | Grand Total | 25 | 101 | 36 | 162 | | | | | | | | | | | | | | There is NO significant association between these two variables. | | | | | (95% level of confidence) | | | | | | | | In total of 162 populations provide the answer for both questions, in those 17 peoples agrees the statement, 19 peoples not agreeing the statement, in total of 101 peoples giving neutral answers ,in that 62 peoples agrees the statement, 39 peoples not agreeing the statement. Using the chi square calculation, chi square values is 5.38, with degree of freedom 2, the significance of chi square value is 0.077 so, the null hypothesis is true therefore probability of 0.077(0.077%) case payment in the friendly market either in cash or credit card. Recommendation: The customer does not care about the mode of payment. Cross tabulation Analysis | | | | | | | | | | | | | | | | | | | | | | | | | | CROSSTABS VARIABLES ANALYZED | | | | | | | Row Variable...

Words: 1190 - Pages: 5

Free Essay

Descriptive Statistics

...300 | 100.0% | 0 | .0% | 300 | 100.0% | Sex * Stock Trading | 300 | 100.0% | 0 | .0% | 300 | 100.0% | Sex * Chatting | 300 | 100.0% | 0 | .0% | 300 | 100.0% | Sex * News/Weather | 300 | 100.0% | 0 | .0% | 300 | 100.0% | Sex * Music Crosstab | | Music | Total | | 0 | 1 | | Sex | 1 | Count | 91 | 97 | 188 | | | % within Sex | 48.4% | 51.6% | 100.0% | | | % within Music | 65.0% | 60.6% | 62.7% | | 2 | Count | 49 | 63 | 112 | | | % within Sex | 43.8% | 56.3% | 100.0% | | | % within Music | 35.0% | 39.4% | 37.3% | Total | Count | 140 | 160 | 300 | | % within Sex | 46.7% | 53.3% | 100.0% | | % within Music | 100.0% | 100.0% | 100.0% | Chi-Square Tests | | Value | df | Asymp. Sig. (2-sided) | Exact Sig. (2-sided) | Exact Sig. (1-sided) | Pearson Chi-Square | .611a | 1 | .434 | | | Continuity Correctionb | .438 | 1 | .508 | | | Likelihood Ratio | .612 | 1 | .434 | | | Fisher's Exact Test | | | | .474 | .254 | Linear-by-Linear...

Words: 4191 - Pages: 17

Premium Essay

Stats

...maximum of 10 rows and 10 columns. f) T F Frequency graphs can determine the mode, Box & Whiskers does not. g) T F The birth data from the Anaheim Ducks and Los Angeles Kings proved Outliers was correct. h) T F For the Hypergeometric distribution the value of p changes each time an object is selected. i) T F Heights of adult males is a good example of the Poisson distribution. j) T F When children give their age, it’s continuous; for adults it’s integer. k) T F If a LUMAT template cell is colored, you can enter data or labels. l) T F The Box and Whiskers template gives indicators of data being normal, uniform or exponential. m) T F Goodness of Fit templates use the Chi-square distribution to give the probability of a fit. n) LUMAT stands for: Learning to Use Managerial Analysis Templates. o) The name of our Excel Training program is ExcelEverest. p) If the pieces of a pie chart in Excel add up to only...

Words: 1158 - Pages: 5

Free Essay

Paper on Grass

...Experimental Design and Analysis of Variance Review: chi square = we want to know whether a data set fits a certain distribution/independence model. We use the chi square distribution, then we check how far away the test statistic is from 0. As data set becomes farther away from what you expect to get, you get larger differences between expected model and actual model (you get a larger test statistic) Components of ANOVA: Factor – independent variable. We want this variable to be qualitative. Classifications of the factor is called the treatments. (ex. Color of the light vs. response variable ie height of the plant. Light is qualitative, treatments are the kinds of lights such ash red, white, violet, green. In anova, the response variable must be quantitative. If not quantitative, then go back to chi square test) When we design an experiment, the factors are controlled by you. But sometimes some factors are difficult to control, and if we want to do an experiment on that we will have to just look at observational data. Example of this kind of factor is the weather. Regardless, usually to test whether a certain factor has an effect on a response variable, we do replication. We look at replicating the experiment on more units. The more the better. If we find differences between the growths (in the mongo seeds) we do not know if this is true for the whole population, so the more elements of sample we have the better. Gasoline Mileage Case: Factor: Gas Type. Treatments:...

Words: 630 - Pages: 3

Premium Essay

No-Show Clinical Data Analytics

...No-show rates range between 15% to 30% in an ambulatory setting and lead to wasted resources, increased financial burdens and inaccurate or missed diagnoses of patients (Goldman et al., 1982). Previous studies have shown that various patient factors can predict future no-show behavior. For example, the type of appointment scheduled for a patient can predict patient absenteeism (Zeber, Pearson, & Smith, 2009). Zeber et al. found that colonoscopy appointments are the most commonly missed appointments (Zeber et al., 2009). Furthermore, previous missed appointments is one of the most significant predictors of no-show appointments (Dove & Schneider, 1981). Studies have also shown that patients’ various psychosocial diagnoses are indicators of missed appointments (Goldman et al., 1982). Patients diagnosed with at least one psychological diagnosis, including mood disorders, such as depression and bipolar disease, anxiety disorders, such as panic attacks and posttraumatic stress disorder, and thought disorders, such as schizophrenia and personality disorders, were more likely to miss appointments compared to patients without psychological diagnoses (Savageau et al., 2004). Finally, Perron et al. showed that patients with substance abuse disorders are more likely to miss appointments (Perron et al., 2010). In order to reduce no-show rates in a hospital gastrointestinal (GI) clinic this project analyzed potential indicators of missed appointments. Based on a conceptual model grouping...

Words: 1517 - Pages: 7

Premium Essay

Whatever

...4/7/2014 Basic Statistics: An Overview Basic Statistics: Review  Descriptive Statistics  Scatter graph  Measures of central tendency  Mean  Median, quartile, deciles, percentile  Mode  Weighted mean  GM  HM  Measures of dispersion  Range,  IQR  Semi IQR  Mean deviation  Standard deviation  Variance  Coeff of variation   Inferential Statistics  Populations  Sampling  Estimation of Parameters   Point Estimation Interval Estimation Unbiased Minimum Variance Consistency Efficiency  Properties of Point Estimators      Statistical Inference: Hypothesis Testing    T test F test Chi square test   Measures of shape of the curve  Moments  Skewness  kurtosis Probability distributions  Normal Distribution  T-student Distribution  Chi-Square Distribution  F Distribution Index Number   Etc. Correlational Statistics  Covariance  Correlations  regressions 1 4/7/2014 Some Terminology  Variables are things that we measure, control, or  manipulate .They may be classified as: 1. Quantitative i.e. numerical  Continuous: takes fractional values ex. height in cm  Discrete : takes no fractional values ex. GDP  Random Variable: If the value of a variable cannot be  predicted in advance Non random : If the value of a variable cannot be  predicted in advance  Some Terminology 2. Qualitative i.e. non numerical 1. Nominal: Items are usually categorical and may have numbers...

Words: 1759 - Pages: 8

Premium Essay

Jack Get by

...a manuscript (unless the p value is less than .001). Please pay attention to issues of italics and spacing. APA style is very precise about these. Also, with the exception of some p values, most statistics should be rounded to two decimal places. 
Mean and Standard Deviation are most clearly presented in parentheses: The sample as a whole was relatively young (M = 19.22, SD = 3.45). The average age of students was 19.22 years (SD = 3.45). 
Percentages are also most clearly displayed in parentheses with no decimal places: Nearly half (49%) of the sample was married. 
Chi-Square statistics are reported with degrees of freedom and sample size in parentheses, the Pearson chi-square value (rounded to two decimal places), and the significance level: The percentage of participants that were married did not differ by gender, χ2(1, N = 90) = 0.89, p = .35. 
T Tests are reported like chi-squares, but only the degrees of freedom are in parentheses. Following that, report the t statistic (rounded to two decimal places) and the significance level. There was a significant effect for gender, t(54) = 5.43, p < .001, with men receiving higher scores than women. 
ANOVAs (both one-way and two-way) are reported like the t test, but there are two degrees-of-freedom numbers to report. First report the between-groups degrees of freedom, then report the within-groups degrees of freedom (separated by a comma). After that report the F statistic (rounded off to two decimal places)...

Words: 570 - Pages: 3