In this post I want to show you how to write and call functions in R.

Functions are an extremely powerful feature of r especially as they can easily be written and customized. Very generally, functions are objects in R which can be called to perform customized calculations or tasks. Functions typically require arguments on which they perform some action.

The general form of a function is *function(argument _{1}, argument_{2}, …, agument_{n})*. To exemplify what this means and how you can write your own function, I am going to write a function which will produce a customized output table for a simple linear regression model. But before we write a rather complicated function, we are writing a very simple example, so you understand the basic logic behind it. We will write a function called my.summary which reports the mean, the median, and the standard deviation for a numeric vector.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | ############################################################### ### writing a very simple function my.summary <- function(x) { ifelse(is.numeric(x) == FALSE, return("ERROR: vector must be numeric"), x == x) m1 <- mean(x) m2 <- median(x) sd1 <- sd(x) result <- list(m1, m2, sd1) names(result) <- c("Mean", "Median", "Standard Dviation") return(result) } ############################################################### ### --- Test our function ############################################################### # create avector test <- c(1,2,3,4,5,6,7,8,9) my.summary(test) # below is the output of our function #> $Mean #>[1] 5 #> #>$Median #>[1] 5 #> #>$`Standard Dviation` #>[1] 2.738613 ############################################################### |

Our simple example function works and return the mean, the median, and the standard deviation when we apply it to a numeric vector.

Now for the more complex example:

The way that our more complex function will work is that the “summary” function for a regression model will produce a list which contains several objects. Our function will then extract some of these objects from the regression summary and reorganize them into a neat table.

The table will consist out of 7 columns and 8 rows. We are thus creating 8 vectors with 7 elements each. Empty elements will be represented as empty cells.

I am going to annotate the code to make it as clear as I can so that you can follow what I do.

So, here we go…

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | ############################################################### ### Writing functions ############################################################### ### --- Prepare data # Remove all lists from the current workspace rm(list=ls(all=T)) # We are going to start right away by creating an object which # we call "lm.summary" and we define it as a function with # a single argument (x). Th argument is a lm object so a # simple linear regression model – to make it more comprehensible, # you can simply think of x as standing for # lm(prep.ptw ~ year, data = slrm.data) lm.summary <- function(x) { # next, we are defining another function in this function which # will produce summaries of p-values. So instead of the actual # value, the p.nice function will output the level of significance # at which the p-value is significant. For example, if we # have a p-value of .03, the p.nice function will output "p<.05*" p.nice <- function(z) { as.vector(unlist(sapply(z, function(w) { # the line below reads: if the value is lower than 0.001 print "p<.001***" ifelse(w < .001, return("p < .001***"), # the line below reads: if the value is lower than 0.01 print "p<.01**" ifelse(w < .01, return("p < .01**"), # the line below reads: if the value is lower than 0.05 print "p<.05*" # else print the exact value (of w) ifelse(w < .05, return("p < .05*"), return(w)))) } ))) } # the next section will create a vector called "intercept" # with 6 elements which we are going to extract from the # summary of the lm object. This vector will represent a row in # the table with the parametsr of the coefficient, i.e. its # estimate, standard error, t-value, and p-value. intercept <- c( # the first element of "intercept" is the first element of fourth object # of the summary list. And because this element is a coefficient # estimate, i.e. a numeric value, we are going to round it # to 2 decimal places. round(summary(x)[[4]][1], 2), # the next element is empty because it will be in a column showing # standardized betas and we do not get standardized betas for the # intercept in a linear regression model. "", round(summary(x)[[4]][3], 2), round(summary(x)[[4]][5], 2), round(summary(x)[[4]][7], 4), p.nice(summary(x)[[4]][7])) # the next vector will hold the parameters for the predictor predictor <- c( # extract the estimate for the predictor and round the value # to 4 decimal places round(summary(x)[[4]][2], 2), # extract the standardized beta for the predictor and round the value # to 4 decimal places round(lm.beta(x)[[1]], 4), # extract the standard error for the predictor and round the value # to 4 decimal places round(summary(x)[[4]][4], 2), # extract the t-value for the predictor and round the value # to 4 decimal places round(summary(x)[[4]][6], 2), # extract the p-value for the predictor and round the value # to 4 decimal places round(summary(x)[[4]][8], 4), # apply the p.nice function to the p-value and return the nice p-value p.nice(summary(x)[[4]][8])) # in the fowllowing , we create vectors which hold only one # character string as the last element: we now add a row # to the table which separates the rows for the predictor # from the overall model statistics # first, we create a vector which will serve as a header for # the model statz mdl.statz <- c("", "", "", "", "", "Value") # we are now extracting the number of cases in our model nbcases <- c("", "", "", "", "", length(summary(x)[[3]])) # extract the residual standard error rse <- c("", "", "", "", "", round(summary(x)[[6]], 2)) # extract the common R<sup>2</sup> value multR2 <- c("", "", "", "", "", round(summary(x)[[8]], 4)) # extract the adjusted R<sup>2</sup> value (this R<sup>2</sup> value # provides an estimate of how much the common R<sup>2</sup> value # will vary, if we the cross-validate it, i.e. how much variation there # would be, if we took many samples and replicated the model many times adjR2 <- c("", "", "", "", "", round(summary(x)[[9]], 4)) # extract the F statistic of the overall model F <- c("", "", "", "", "", round(summary(x)[[10]][1], 2)) # extract the p p <- c("", "", "", "", "", round(summary(x)[[4]][8], 4)) # we are almost done: now, we are going to create a table # called <em>slrm.tb</em> by binding the vectors together # (the command/function <em>rbind</em> means: bind # vectors as rows together slrm.tb <- rbind(intercept, predictor, mdl.statz, nbcases, rse, multR2, adjR2, F, p ) # now, we add column names to the table colnames(slrm.tb) <- c(colnames(summary(x)[[4]])[1], "Std. Beta", colnames(summary(x)[[4]])[c(2:4)], "P-value sig.") # now, we add rownames to the table rownames(slrm.tb) <- c( rownames(summary(x)[[4]])[1], rownames(summary(x)[[4]])[2], "Model statistics", "Number of cases in model", paste("Residual standard error", paste("on", summary(x)[[7]][2],"DF")), "Multiple R-squared", "Adjusted R-squared", paste("F-statistic", paste("(", round(summary(x)[[10]][2], 0), ", ", round(summary(x)[[10]][3], 0), ")", sep = "", collapse = "")), "Model p-value") # we converting the table into a data frame slrm.tb <- as.data.frame(slrm.tb) # now we tell R to print the resulting table to the console, once # it is done with this prodedure. return(slrm.tb) } # voila: we are done with writing a function which outputs # a customized summary table for a simple linear regression model |

The next issue is how we can utilize our function whenever we want. R makes that really easy: all we need to do is to define the source (the path to the function). So you only need to save the above code in an editor (I use TinnR as it highlights R syntax) and save it as an R file (I saved the code above as *slr.summary.tb.R*) in my directory for customized R functions. The path thus is “C:\\R /slr.summary.tb” (I need to use double backslashes in front of directories as my machine runs on Windows). The only thing you need to add is the *source* command which tells R that it is supposed the function from the path which is its argument.

So let’ check how to call and execute our function.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | ############################################################### ### Calling functions ############################################################### # clean our workspace rm(list=ls(all=T)) # Initiate packages we need library(QuantPsyc) library(car) # load our function source("C:\\R/slr.summary.tb.R") # now we read in some data slr.data <- read.delim("C:\\MyProjects\\SimpleLinearRegression/slr.data.txt", header = TRUE) # remove columns we do not need slr.data <- as.data.frame(cbind(slr.data$datems, slr.data$P.ptw)) colnames(slr.data) <- c("year", "prep.ptw") # add column names slr.data <- slr.data[!is.na(slr.data$year) == T, ] # delete incomplete cases prep.lm <- lm(prep.ptw ~ year, data = slr.data) # call our function slr.summary(prep.lm) # and here is our output: # Estimate Std. Beta Std. Error t value Pr(>|t|) P-value sig. #(Intercept) 102.09 10.86 9.4 0 p < .001*** #year 0.02 0.1039 0.01 2.56 0.0107 p < .05* #Model statistics Value #Number of cases in model 603 #Residual standard error on 601 DF 21.11 #Multiple R-squared 0.0108 #Adjusted R-squared 0.0091 #F-statistic (1, 601) 6.55 #Model p-value 0.0107 |