Two-Level CFA

In this post, I am going to show you how to implement a CFA (configural frequency analysis) with only two level configurations in R (here is the code for the TwoLevelCFA function). The cfa function (written beautifully as far as I can see by Stefan Funke) is set up in a way that it only works properly on data with at least three level configuration. According to Bortz, Lienert, and Boehnke (2008: 155-157), CFAs can also be applied to data where there are only two configurations. We are thus using Stafan Funke’s function, extracting the parameters necessary for us and then calculating p-vales, effect sizes, and some other helpful parameters. Please keep in mind, that this procedure is somewhat sloppy, as we do not calculate Q but work on χ2 statistics.

Configural frequency analyses are used when you want to investigate, if there are significant cells in a larger table. This is necessary as χ2 tests only provide information on whether there is, somewhere in a table, a cell which significantly differs from what we would expect if there is no correlation between the variables we are interested in. We use the quite conservative Bonferroni correction to account for multiple testing (we have to use a correction as not doing so would leave us vulnerable to alpha errors, i.e. that we claim something is significant although it is in reality not).

First, we load packages that may be useful and generate some data and then we perform the actual test. In addition, we also calculate the effect sizes appropriate for χ2 tests.

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 ############################################################### # Remove all lists from the current workspace rm(list=ls(all=T)) # Install package(s) (if not already installed) #install.packages("cfa") #install.packages("lattice") # Load package(s) library(cfa) library(lattice) ############################################################### ### --- Before performing the test, we will create a table ### --- on which we will apply the CFA. ### --- For now, we will create a data frame, we call "mydata" corpus <- c(rep("Corpus_C", 3), rep("Corpus_D", 3), rep("Corpus_B", 3), rep("Corpus_A", 3)) form <- c(rep(c("Variant_1", "Variant_2", "Variant_3"), 4)) counts <- c(1, 29, 9, 12, 41, 29, 700, 1031, 928, 305, 731, 568) mydata <- as.data.frame(matrix(cbind(corpus, form), ncol = 2)) mydata[, 3] <- as.numeric(counts) colnames(mydata) <- c("corpus","form","counts") # inspect the data we have created mydata   # corpus form counts #1 Corpus_C Variant_1 1 #2 Corpus_C Variant_2 29 #3 Corpus_C Variant_3 9 #4 Corpus_D Variant_1 12 #5 Corpus_D Variant_2 41 #6 Corpus_D Variant_3 29 #7 Corpus_B Variant_1 700 #8 Corpus_B Variant_2 1031 #9 Corpus_B Variant_3 928 #10 Corpus_A Variant_1 305 #11 Corpus_A Variant_2 731 #12 Corpus_A Variant_3 568   ################################################################ ### --- Prepare data for CFA ################################################################ # Now, we are going to modify the existing function written by Stefan Funke # () # Prepare data for analysis mydatanew = mydata[,-3] # inspect mydatanew head(mydatanew)   # corpus form #1 Corpus_C Variant_1 #2 Corpus_C Variant_2 #3 Corpus_C Variant_3 #4 Corpus_D Variant_1 #5 Corpus_D Variant_2 #6 Corpus_D Variant_3   counts = mydata[,3] # test <- mydata[c(3:5, 7,8), 1:2] # test2 <- cbind(mydata[,1], mydata\$form) # inspect counts head(counts)   # 1 29 9 12 41 29   ################################################################ ### --- Perform CFA ################################################################ # First, we are using Funke's (2009) function and perform a cfa raw.cfa <- cfa(cfg = mydatanew, cnts = counts) # Next, we are determining the critical chi-squared values for # alpha = .05, .01 and .001 BUT we are taking into account that we # are performing multiply tests and so we are applying # Bonferroni's correction (corrected alpha = uncorrected alpha / number of tests). crit.05 <- round(rep(qchisq((0.05/12), 1, lower.tail = F), 12), 4) crit.01 <- round(rep(qchisq((0.01/12), 1, lower.tail = F), 12), 4) crit.001 <- round(rep(qchisq((0.001/12), 1, lower.tail = F), 12), 4) # We now determine the level of significance for each configuration sig <- as.vector(unlist(sapply(raw.cfa[][, 5], function(x) { ifelse(x < qchisq((0.05/12), 1, lower.tail = F), "n.s.", ifelse(x >= qchisq((0.001/12), 1, lower.tail = F), "p < .001 ***", ifelse(x >= qchisq((0.01/12), 1, lower.tail = F), "p < .01 **", ifelse(x >= qchisq((0.05/12), 1, lower.tail = F), "p < .05 *")))) } ))) # We will now extract the columns which are of interest to us. new.cfa <- cbind( as.character(raw.cfa[][, 1]), raw.cfa[][, 2], round(raw.cfa[][, 3], 4), round(raw.cfa[][, 5], 4), crit.05, crit.01, crit.001, sig) # We now determine the level of significance for each configuration type <- as.vector(unlist(apply(new.cfa, 1, function(x) { ifelse(x == "n.s.", "n.a.", ifelse(x < x, "type", "anti-type")) } ))) # Calculate an approximate effect size (phi) eff <- as.vector(unlist(apply(new.cfa, 1, function(x) { sum.obs <- sum(as.numeric(new.cfa[, 2])) x <- round(sqrt(as.numeric(x)/sum.obs), 4) } ))) # Add type vector to our data table cfa.rslt <- cbind(new.cfa, type, eff) colnames(cfa.rslt) <- c("configuration", "obs.freq", "exp.freq", "chi.squared", "crit.x2 (.05)", "crit.x2 (.01)", "crit.x2 (.001)", "significance", "type vs. anti-type", "effect.size (phi)") # Display resulting table cfa.rslt   # configuration obs.freq exp.freq chi.squared crit.x2 (.05) crit.x2 (.01) crit.x2 (.001) significance type vs. anti-type effect.size (phi) # [1,] "Corpus_A Variant_1" "305" "372.4617" "12.2189" "8.2097" "11.1655" "15.4811" "p < .01 **" "type" "0.0528" # [2,] "Corpus_B Variant_1" "700" "617.4411" "11.0391" "8.2097" "11.1655" "15.4811" "p < .05 *" "anti-type" "0.0502" # [3,] "Corpus_C Variant_2" "29" "16.2974" "9.9006" "8.2097" "11.1655" "15.4811" "p < .05 *" "anti-type" "0.0475" # [4,] "Corpus_C Variant_1" "1" "9.0561" "7.1665" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0404" # [5,] "Corpus_B Variant_2" "1031" "1111.1515" "5.7816" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0363" # [6,] "Corpus_A Variant_2" "731" "670.2847" "5.4997" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0354" # [7,] "Corpus_D Variant_1" "12" "19.0411" "2.6037" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0244" # [8,] "Corpus_C Variant_3" "9" "13.6464" "1.5821" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.019" # [9,] "Corpus_D Variant_2" "41" "34.2664" "1.3232" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0174" #[10,] "Corpus_A Variant_3" "568" "561.2536" "0.0811" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0043" #[11,] "Corpus_B Variant_3" "928" "930.4074" "0.0062" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0012" #[12,] "Corpus_D Variant_3" "29" "28.6925" "0.0033" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "9e-04"

The results indicate that there are three significant configurations:
„Corpus_A Variant_1“ which is a type, i.e. here there are more observed cases than we would expect if there was no significant correlation between the independent variables.
„Corpus_B Variant_1“ and „Corpus_C Variant_2“ which are significant anti-types, i.e. there are less observed cases than we would expect if there was no significant correlation between the independent variables.

We will now write a function which performs a 2-level cfa for us. The function requires a data frame as input where the configurations are in the first two columns and the observed cases are in the last, i.e. the third column.

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 ################################################################ ### --- Write function to perform CFA on two-level configurations ################################################################ # We write a function which takes as it argument a data frame in which # the last column holds the counts and the first two columns hold # the configuartions TwoLevelCFA <- function(data) { # split data frame into configurations cnts <- data[, ncol(data)] cfg <- data[, 1:(ncol(data)-1)] # First, we are using Funke's (2009) function and perform a cfa raw.cfa <- cfa(cfg = cfg, cnts = cnts) # Next, we are determining the critical chi-squared values for # alpha = .05, .01 and .001 BUT we are taking into account that we # are performing multiply tests and so we are applying # Bonferroni's correction (corrected alpha = uncorrected alpha / number of tests). crit.05 <- round(rep(qchisq((0.05/12), 1, lower.tail = F), nrow(data)), 4) crit.01 <- round(rep(qchisq((0.01/12), 1, lower.tail = F), nrow(data)), 4) crit.001 <- round(rep(qchisq((0.001/12), 1, lower.tail = F), nrow(data)), 4) # We now determine the level of significance for each configuration sig <- as.vector(unlist(sapply(raw.cfa[][, 5], function(x) { ifelse(x < qchisq((0.05/nrow(data)), 1, lower.tail = F), "n.s.", ifelse(x >= qchisq((0.001/nrow(data)), 1, lower.tail = F), "p < .001 ***", ifelse(x >= qchisq((0.01/nrow(data)), 1, lower.tail = F), "p < .01 **", ifelse(x >= qchisq((0.05/nrow(data)), 1, lower.tail = F), "p < .05 *")))) } ))) # We will now extract the columns which are of interest to us. new.cfa <- cbind( as.character(raw.cfa[][, 1]), raw.cfa[][, 2], round(raw.cfa[][, 3], 4), round(raw.cfa[][, 5], 4), crit.05, crit.01, crit.001, sig) # We now determine the level of significance for each configuration type <- as.vector(unlist(apply(new.cfa, 1, function(x) { ifelse(x == "n.s.", "n.a.", ifelse(x < x, "type", "anti-type")) } ))) # Calculate an approximate effect size (phi) eff <- as.vector(unlist(apply(new.cfa, 1, function(x) { sum.obs <- sum(as.numeric(new.cfa[, 2])) x <- round(sqrt(as.numeric(x)/sum.obs), 4) } ))) # Add type vector to our data table cfa.rslt <- data.frame(new.cfa, type, eff) colnames(cfa.rslt) <- c("configuration", "obs.freq", "exp.freq", "chi.squared", "crit.x2 (.05)", "crit.x2 (.01)", "crit.x2 (.001)", "significance", "type vs. anti-type", "effect.size (phi)") # return results return(cfa.rslt) } # We will now apply our function to the data we have created initially mydata <-   TwoLevelCFA(mydata)   # Here are the results # configuration obs.freq exp.freq chi.squared crit.x2 (.05) crit.x2 (.01) crit.x2 (.001) significance type vs. anti-type effect.size (phi) # [1,] "Corpus_A Variant_1" "305" "372.4617" "12.2189" "8.2097" "11.1655" "15.4811" "p < .01 **" "type" "0.0528" # [2,] "Corpus_B Variant_1" "700" "617.4411" "11.0391" "8.2097" "11.1655" "15.4811" "p < .05 *" "anti-type" "0.0502" # [3,] "Corpus_C Variant_2" "29" "16.2974" "9.9006" "8.2097" "11.1655" "15.4811" "p < .05 *" "anti-type" "0.0475" # [4,] "Corpus_C Variant_1" "1" "9.0561" "7.1665" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0404" # [5,] "Corpus_B Variant_2" "1031" "1111.1515" "5.7816" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0363" # [6,] "Corpus_A Variant_2" "731" "670.2847" "5.4997" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0354" # [7,] "Corpus_D Variant_1" "12" "19.0411" "2.6037" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0244" # [8,] "Corpus_C Variant_3" "9" "13.6464" "1.5821" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.019" # [9,] "Corpus_D Variant_2" "41" "34.2664" "1.3232" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0174" #[10,] "Corpus_A Variant_3" "568" "561.2536" "0.0811" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0043" #[11,] "Corpus_B Variant_3" "928" "930.4074" "0.0062" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0012" #[12,] "Corpus_D Variant_3" "29" "28.6925" "0.0033" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "9e-04"

References

Bortz, Jürgen, Gustav A. Lienert & Klaus Boehnke. 32008. Verteilungsfreie Methoden in der Biostatistik. Heidelberg: Springer Medizin Verlag Heidelberg.

Ein Gedanke zu „Two-Level CFA“

1. Daniela

Hi Martin, thank you for this very helpful script. I tried to perform this, but I ran into some issues. Under version R 3.1.1 as well as R 3.1.3 I receive the message that no package named „cfa“ exists. Have you encountered this as well? Would this also work with the „confreq“ package? https://mran.microsoft.com/package/confreq/#dtable

Thanks a lot!