Two-Level CFA

In this post, I am going to show you how to implement a CFA (configural frequency analysis) with only two level configurations in R (here is the code for the TwoLevelCFA function). The cfa function (written beautifully as far as I can see by Stefan Funke) is set up in a way that it only works properly on data with at least three level configuration. According to Bortz, Lienert, and Boehnke (2008: 155-157), CFAs can also be applied to data where there are only two configurations. We are thus using Stafan Funke’s function, extracting the parameters necessary for us and then calculating p-vales, effect sizes, and some other helpful parameters. Please keep in mind, that this procedure is somewhat sloppy, as we do not calculate Q but work on χ2 statistics.

Configural frequency analyses are used when you want to investigate, if there are significant cells in a larger table. This is necessary as χ2 tests only provide information on whether there is, somewhere in a table, a cell which significantly differs from what we would expect if there is no correlation between the variables we are interested in. We use the quite conservative Bonferroni correction to account for multiple testing (we have to use a correction as not doing so would leave us vulnerable to alpha errors, i.e. that we claim something is significant although it is in reality not).

First, we load packages that may be useful and generate some data and then we perform the actual test. In addition, we also calculate the effect sizes appropriate for χ2 tests.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
###############################################################
# Remove all lists from the current workspace
rm(list=ls(all=T))
# Install package(s) (if not already installed)
#install.packages("cfa")
#install.packages("lattice")
# Load package(s)
library(cfa)
library(lattice)
###############################################################
### --- Before performing the test, we will create a table
### --- on which we will apply the CFA.
### --- For now, we will create a data frame, we call "mydata"
corpus <- c(rep("Corpus_C", 3), rep("Corpus_D", 3), rep("Corpus_B", 3), rep("Corpus_A", 3))
form <- c(rep(c("Variant_1", "Variant_2", "Variant_3"), 4))
counts <- c(1, 29, 9, 12, 41, 29, 700, 1031, 928, 305, 731, 568)
mydata <- as.data.frame(matrix(cbind(corpus, form), ncol = 2))
mydata[, 3] <- as.numeric(counts)
colnames(mydata) <- c("corpus","form","counts")
# inspect the data we have created
mydata
 
# corpus form counts
#1 Corpus_C Variant_1 1
#2 Corpus_C Variant_2 29
#3 Corpus_C Variant_3 9
#4 Corpus_D Variant_1 12
#5 Corpus_D Variant_2 41
#6 Corpus_D Variant_3 29
#7 Corpus_B Variant_1 700
#8 Corpus_B Variant_2 1031
#9 Corpus_B Variant_3 928
#10 Corpus_A Variant_1 305
#11 Corpus_A Variant_2 731
#12 Corpus_A Variant_3 568
 
################################################################
### --- Prepare data for CFA
################################################################
# Now, we are going to modify the existing function written by Stefan Funke
# (<http://cran.r-project.org/web/packages/cfa/cfa.pdf>)
# Prepare data for analysis
mydatanew = mydata[,-3]
# inspect mydatanew
head(mydatanew)
 
# corpus form
#1 Corpus_C Variant_1
#2 Corpus_C Variant_2
#3 Corpus_C Variant_3
#4 Corpus_D Variant_1
#5 Corpus_D Variant_2
#6 Corpus_D Variant_3
 
counts = mydata[,3]
# test <- mydata[c(3:5, 7,8), 1:2]
# test2 <- cbind(mydata[,1], mydata$form)
# inspect counts
head(counts)
 
#[1] 1 29 9 12 41 29
 
################################################################
### --- Perform CFA
################################################################
# First, we are using Funke's (2009) function and perform a cfa
raw.cfa <- cfa(cfg = mydatanew, cnts = counts)
# Next, we are determining the critical chi-squared values for
# alpha = .05, .01 and .001 BUT we are taking into account that we
# are performing multiply tests and so we are applying
# Bonferroni's correction (corrected alpha = uncorrected alpha / number of tests).
crit.05 <- round(rep(qchisq((0.05/12), 1, lower.tail = F), 12), 4)
crit.01 <- round(rep(qchisq((0.01/12), 1, lower.tail = F), 12), 4)
crit.001 <- round(rep(qchisq((0.001/12), 1, lower.tail = F), 12), 4)
# We now determine the level of significance for each configuration
sig <- as.vector(unlist(sapply(raw.cfa[[1]][, 5], function(x) {
 ifelse(x < qchisq((0.05/12), 1, lower.tail = F), "n.s.",
 ifelse(x >= qchisq((0.001/12), 1, lower.tail = F), "p < .001 ***",
 ifelse(x >= qchisq((0.01/12), 1, lower.tail = F), "p < .01 **",
 ifelse(x >= qchisq((0.05/12), 1, lower.tail = F), "p < .05 *")))) } )))
# We will now extract the columns which are of interest to us.
new.cfa <- cbind(
 as.character(raw.cfa[[1]][, 1]),
 raw.cfa[[1]][, 2],
 round(raw.cfa[[1]][, 3], 4),
 round(raw.cfa[[1]][, 5], 4),
 crit.05, crit.01, crit.001, sig)
# We now determine the level of significance for each configuration
type <- as.vector(unlist(apply(new.cfa, 1, function(x) {
 ifelse(x[8] == "n.s.", "n.a.",
 ifelse(x[2] < x[3], "type", "anti-type")) } )))
# Calculate an approximate effect size (phi)
eff <- as.vector(unlist(apply(new.cfa, 1, function(x) {
 sum.obs <- sum(as.numeric(new.cfa[, 2]))
 x <- round(sqrt(as.numeric(x[4])/sum.obs), 4) } )))
# Add type vector to our data table
cfa.rslt <- cbind(new.cfa, type, eff)
colnames(cfa.rslt) <- c("configuration", "obs.freq", "exp.freq", "chi.squared", "crit.x2 (.05)", "crit.x2 (.01)", "crit.x2 (.001)", "significance", "type vs. anti-type", "effect.size (phi)")
# Display resulting table
cfa.rslt
 
# configuration obs.freq exp.freq chi.squared crit.x2 (.05) crit.x2 (.01) crit.x2 (.001) significance type vs. anti-type effect.size (phi)
# [1,] "Corpus_A Variant_1" "305" "372.4617" "12.2189" "8.2097" "11.1655" "15.4811" "p < .01 **" "type" "0.0528" 
# [2,] "Corpus_B Variant_1" "700" "617.4411" "11.0391" "8.2097" "11.1655" "15.4811" "p < .05 *" "anti-type" "0.0502" 
# [3,] "Corpus_C Variant_2" "29" "16.2974" "9.9006" "8.2097" "11.1655" "15.4811" "p < .05 *" "anti-type" "0.0475" 
# [4,] "Corpus_C Variant_1" "1" "9.0561" "7.1665" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0404" 
# [5,] "Corpus_B Variant_2" "1031" "1111.1515" "5.7816" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0363" 
# [6,] "Corpus_A Variant_2" "731" "670.2847" "5.4997" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0354" 
# [7,] "Corpus_D Variant_1" "12" "19.0411" "2.6037" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0244" 
# [8,] "Corpus_C Variant_3" "9" "13.6464" "1.5821" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.019" 
# [9,] "Corpus_D Variant_2" "41" "34.2664" "1.3232" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0174" 
#[10,] "Corpus_A Variant_3" "568" "561.2536" "0.0811" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0043" 
#[11,] "Corpus_B Variant_3" "928" "930.4074" "0.0062" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0012" 
#[12,] "Corpus_D Variant_3" "29" "28.6925" "0.0033" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "9e-04"

The results indicate that there are three significant configurations:
„Corpus_A Variant_1“ which is a type, i.e. here there are more observed cases than we would expect if there was no significant correlation between the independent variables.
„Corpus_B Variant_1“ and „Corpus_C Variant_2“ which are significant anti-types, i.e. there are less observed cases than we would expect if there was no significant correlation between the independent variables.

We will now write a function which performs a 2-level cfa for us. The function requires a data frame as input where the configurations are in the first two columns and the observed cases are in the last, i.e. the third column.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
################################################################
### --- Write function to perform CFA on two-level configurations
################################################################
# We write a function which takes as it argument a data frame in which
# the last column holds the counts and the first two columns hold
# the configuartions
TwoLevelCFA <- function(data) {
# split data frame into configurations
 cnts <- data[, ncol(data)]
 cfg <- data[, 1:(ncol(data)-1)]
# First, we are using Funke's (2009) function and perform a cfa
 raw.cfa <- cfa(cfg = cfg, cnts = cnts)
# Next, we are determining the critical chi-squared values for
# alpha = .05, .01 and .001 BUT we are taking into account that we
# are performing multiply tests and so we are applying
# Bonferroni's correction (corrected alpha = uncorrected alpha / number of tests).
 crit.05 <- round(rep(qchisq((0.05/12), 1, lower.tail = F), nrow(data)), 4)
 crit.01 <- round(rep(qchisq((0.01/12), 1, lower.tail = F), nrow(data)), 4)
 crit.001 <- round(rep(qchisq((0.001/12), 1, lower.tail = F), nrow(data)), 4)
# We now determine the level of significance for each configuration
 sig <- as.vector(unlist(sapply(raw.cfa[[1]][, 5], function(x) {
 ifelse(x < qchisq((0.05/nrow(data)), 1, lower.tail = F), "n.s.",
 ifelse(x >= qchisq((0.001/nrow(data)), 1, lower.tail = F), "p < .001 ***",
 ifelse(x >= qchisq((0.01/nrow(data)), 1, lower.tail = F), "p < .01 **",
 ifelse(x >= qchisq((0.05/nrow(data)), 1, lower.tail = F), "p < .05 *")))) } )))
# We will now extract the columns which are of interest to us.
 new.cfa <- cbind(
 as.character(raw.cfa[[1]][, 1]),
 raw.cfa[[1]][, 2],
 round(raw.cfa[[1]][, 3], 4),
 round(raw.cfa[[1]][, 5], 4),
 crit.05, crit.01, crit.001, sig)
# We now determine the level of significance for each configuration
 type <- as.vector(unlist(apply(new.cfa, 1, function(x) {
 ifelse(x[8] == "n.s.", "n.a.",
 ifelse(x[2] < x[3], "type", "anti-type")) } )))
# Calculate an approximate effect size (phi)
 eff <- as.vector(unlist(apply(new.cfa, 1, function(x) {
 sum.obs <- sum(as.numeric(new.cfa[, 2]))
 x <- round(sqrt(as.numeric(x[4])/sum.obs), 4) } )))
# Add type vector to our data table
 cfa.rslt <- data.frame(new.cfa, type, eff)
 colnames(cfa.rslt) <- c("configuration", "obs.freq", "exp.freq", "chi.squared", "crit.x2 (.05)", "crit.x2 (.01)", "crit.x2 (.001)", "significance", "type vs. anti-type", "effect.size (phi)")
# return results
 return(cfa.rslt)
 }
# We will now apply our function to the data we have created initially
mydata <- 
 
TwoLevelCFA(mydata) 
 
# Here are the results
# configuration obs.freq exp.freq chi.squared crit.x2 (.05) crit.x2 (.01) crit.x2 (.001) significance type vs. anti-type effect.size (phi)
# [1,] "Corpus_A Variant_1" "305" "372.4617" "12.2189" "8.2097" "11.1655" "15.4811" "p < .01 **" "type" "0.0528" 
# [2,] "Corpus_B Variant_1" "700" "617.4411" "11.0391" "8.2097" "11.1655" "15.4811" "p < .05 *" "anti-type" "0.0502" 
# [3,] "Corpus_C Variant_2" "29" "16.2974" "9.9006" "8.2097" "11.1655" "15.4811" "p < .05 *" "anti-type" "0.0475" 
# [4,] "Corpus_C Variant_1" "1" "9.0561" "7.1665" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0404" 
# [5,] "Corpus_B Variant_2" "1031" "1111.1515" "5.7816" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0363" 
# [6,] "Corpus_A Variant_2" "731" "670.2847" "5.4997" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0354" 
# [7,] "Corpus_D Variant_1" "12" "19.0411" "2.6037" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0244" 
# [8,] "Corpus_C Variant_3" "9" "13.6464" "1.5821" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.019" 
# [9,] "Corpus_D Variant_2" "41" "34.2664" "1.3232" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0174" 
#[10,] "Corpus_A Variant_3" "568" "561.2536" "0.0811" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0043" 
#[11,] "Corpus_B Variant_3" "928" "930.4074" "0.0062" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "0.0012" 
#[12,] "Corpus_D Variant_3" "29" "28.6925" "0.0033" "8.2097" "11.1655" "15.4811" "n.s." "n.a." "9e-04"

References

Bortz, Jürgen, Gustav A. Lienert & Klaus Boehnke. 32008. Verteilungsfreie Methoden in der Biostatistik. Heidelberg: Springer Medizin Verlag Heidelberg.

Ein Gedanke zu „Two-Level CFA

Schreibe einen Kommentar zu Daniela Antworten abbrechen

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.