In this post, I am going to show you how to implement a Chi-Squared test (also Chi-Square or χ^{2} test) in R. In this post, we are only dealing with the most straight forward scenario.

The χ^{2} test is used when you want to test, if two nominal or categorical variables correlate.

It is probably the most widely used test in linguistics as it is computationally and conceptually rather straight forward. The underlying logic is that you compare the distribution of observed values to a distribution you would get/expect, if the variables were not correlated. The larger the difference between the observed values and the expected values, the higher the probability that the variables are actually correlated (effect on each other).

A warning is in order here: Although frequently used, the χ^{2} test is almost never appropriate as it does not take other variables into account which may also have an effect on the dependent variable. Thus multivariate designs (Configural Frequency Analysis (CFA), multivariate regression models, etc.) are almost always a better choice.

This post exemplifies the implementation of the χ^{2} test in R and focuses on the simplest scenario where we want to find out if two variables are correlated. To get valid results, 80% of cells must have values of 5 or higher.

In this example, we want to test if British English speakers (BrE) differ significantly from American English speakers (AmE) in terms of their use of the two near-synonymous hedges „kind of“ and „sort of“.

First, we load packages that may be useful and generate some data, next we visualize the data and then we perform the actual test. In addition, we also calculate the effect sizes appropriate for χ^{2} tests.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | ### --- Prepare analysis # Remove all lists from the current workspace rm(list=ls(all=T)) # Install packages we need or which may be useful # (to activate just delete the #) library(vcd) # initialize package # Now we put in the numbers to create the table example.table <- matrix(c(181, 177, 655, 67), ncol = 2,byrow = TRUE) # Now we label our table rownames(example.table)<-c("kind.of", "sort.of") colnames(example.table)<-c("BrE", "AmE") example.table <- as.table(example.table) example.table # BrE AmE #kind.of 181 177 #sort.of 655 67 # Visualize data par(mfrow=c(1, 2)) # plot two plots in two columns in one window assocplot(example.table) mosaicplot(example.table, shade = TRUE, type = "pearson", main = "") par(mfrow=c(1, 1)) # restore original graphical parameters |

The plots suggest that speakers of AmE prefer „kind of“ while speakers of BrE prefer „sort of“.

1 2 3 4 5 6 7 | # Perform test chisq.results <- chisq.test(example.table, corr = F) chisq.results # inspect results # Pearson's Chi-squared test #data: example.table #X-squared = 220.7339, df = 1, p-value < 2.2e-16 |

The results indicate that American and British English differ highly significantly with respect to their preference for „sort of“ and „kind of“ (cf. the low p-value). We could stop here, but this would be suboptimal as it is preferable to have a look at the expected values and also to provide the effect size of the effect of variety.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | # You may also want to consider the expected values, i.e # the values which were expected given that the H0 were correct # H0 = BrE and AmE do not differ in term of "sort of" and # "kind of" use. teststatistics = chisq.test(example.table) teststatistics$expected # BrE AmE #kind.of 277.1185 80.88148 #sort.of 558.8815 163.11852 # Which cell contributes most? # To answer this question you need to extract the # Chi-Square values of each cell teststatistics$residuals^2 # BrE AmE #kind.of 33.33869 114.22602 #sort.of 16.53082 56.63839 # In order to find out how strong the correlation between # the two variables, i.e. variety = BrE, AmE; # hedge = sort of, kind of, we need to calculate the # effect size of the correlation. With respect to 2 by 2 # tables, the appropriate effect size measure is the phi coefficient. # Option 1: Manual calculation of the Phi Coefficient phi.coefficient = sqrt(teststatistics$statistic / sum(example.table) * (min(dim(example.table))-1)) phi.coefficient #X-squared #0.4497359 # Option 2 – Automatic calculation of the Phi Coefficent # This method is handier and provides more information. In # particular, the assocstats function provides Cramer’s V, # another effect size measure which is used if the table has # more than two columns or rows. assocstats(example.table) # X^2 df P(> X^2) #Likelihood Ratio 211.71 1 0 #Pearson 220.73 1 0 #Phi-Coefficient : 0.452 #Contingency Coeff.: 0.412 #Cramer's V : 0.452 |

In our case, the appropriate measure of effect size is the Phi Coefficient (φ) which indicates a weak to moderate correlation (.1 or smaller = weak, .3 ~ moderate, .5 or higher = strong).

The write-up of the results should be something like this:

A Chi-Squared test confirmed a highly significant correlation between variety of English (AmE vs BrE) and particle use („sort of“ vs „kind of“) (χ2 = 220.7339, df = 1, p < .001***, φ = .452).

References

Bortz, Jürgen. ^{6}2005. *Statistik für Human- und Sozialwissenschaftler*. Heidelberg: Springer.

Bortz, Jürgen, Gustav A. Lienert & Klaus Boehnke. ^{3}2008. *Verteilungsfreie Methoden in der Biostatistik*. Heidelberg: Springer Medizin Verlag Heidelberg.

Field, Andy, Jeremy Miles & Zoe Field. ^{2}2012. *Discovering statistics using R*. London, Thousand oaks, CA, New Delhi, Singapore: SAGE.

Gries, Stefan Th. 2009. *Statistics for linguists with R. A Practical Introduction*. Berlin & New York: Mouton de Gruyter.