Read this article to learn about the contingency tables and tests for independence of factors for data analysis.
Survival:
Is the probability of being male or female independent of being alive or dead? Let us use data on ruffed grouse.
You collect mortality data from 100 birds you radio-collared and test the following hypothesis:
ADVERTISEMENTS:
Ho:
The 2 sets of attributes (death and sex of bird) are unrelated (independent). Expected values for each cell can be can be calculated by multiplying the row total by the column total and dividing by the grand total.
EX:
Expected Value = 70 * 67/100 = 46.9
ADVERTISEMENTS:
1. Calculate the chi-square value.
2. Knowing the critical value for 1 degree of freedom (alpha = 0.05) is anything greater than 3.84146, what can you conclude about the independence of these two factors?
Chi Square Test for Independence (2-Way (Chi-square) (SPSS Output)):
A large-scale national randomized experiment was conducted in the 1980s to see if daily aspirin consumption (as compared to an identical, but inert placebo) would reduce the rate of heart attacks. This study (The Physicians Health Study) was described in one of the episodes of the statistics video series A against All Odds.
ADVERTISEMENTS:
Here are the actual results from the study using 22,071 doctors who were followed for 5 years:
Data > Weight Cases > Weight Cases by > click over Count Analyze > Descriptive Statistics > Cross tabs
Row: aspirin
Column: heart attack
Statistics > Chi-square
Cells > row percentages
Chi-square = 25.01 critical chi-square = 3.84
Statistical decision: Reject H0
Conclusion:
A chi-square analysis indicated that there was a significant relationship between aspirin condition and incidence of heart attacks, chi-square (1, N = 22,071) = 25.01, p<.001. A greater percentage of heart attacks occurred for participants taking the placebo (M=1.7%) compared to those taking aspirin (M=0).
Contingency Table Approach:
When items are classified according to two or more criteria, it is often of interest to decide whether these criteria act independently of one another. For example, suppose we wish to classify defects found in wafers produced in a manufacturing plant, first according to the type of defect and, second, according to the production shift during which the wafers were produced. If the proportions of the various types of defects are constant from shift to shift, then classification by defects is independent of the classification by production shift.
On the other hand, if the proportions of the various defects vary from shift to shift, then the classification by defects depends upon or is contingent upon the shift classification and the classifications are dependent.
In the process of investigating whether one method of classification is contingent upon another, it is customary to display the data by using a cross classification in an array consisting of r rows and c columns called a contingency table. A contingency table consists of r x c cells representing the r x c possible outcomes in the classification process.
Let us construct an industrial case, example A total of 309 water defects were recorded and the defects were classified as being one of four types, A, B, C, and D.
At the same time each wafer was identified according to the production shift in which it was manufactured, 1, 2, or 3. Contingency table classifying defects in wafers according to type and production shift.
These counts are presented in the following table:
(Note: The numbers in parentheses are the expected cell frequencies). Column probabilities Let PA be the probability that a defect will be of type A. Likewise, define pB, pC, and pD as the probabilities of observing the other three types of defects items.
These probabilities, which are called the column probabilities, will satisfy the requirement condition:
PA + PB + PC+ PD= 1
Row probabilities: By the same token, suppose pi (i=1, 2, or 3) be the row probability that a defect will have occurred during shift i, where p1 + p2 + p3 = 1.