Plotting examples – Boxplots in R

Probably my favorite way to display data are boxplots. Boxplots are used if you want to display one numeric vector or when you have a categorical and a numeric variable, e.g. you are looking at reaction times cross different groups are frequencies across the sex and age. The advantage over other displays lies in the fact that boxplots show aspects of the underlying distribution and also allows statistical inferences directly from the display. Quick R offers a very nice introduction to boxplots and I highly recommend you have a look at the link.

The example I chose is very complex but you can easily adapt it to your needs and delete code which produces things you don’t want or need. In fact, like always with R, there are a lot of options that can specify – simply modify the code to match your needs.
But let’s start and set up the boxplots: In a first step, we are going to generate some data and set up a data frame called “df”:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#####################################################
### R Script "Visualizations with R: Boxplot"
#####################################################
###                   START
#####################################################
# Remove all lists from the current workspace
rm(list=ls(all=T))
# set up fictitious data
ES <- rnorm(100, 50, 10)
HS <- rnorm(100, 50, 15)
SS <- rnorm(100, 35, 5)
duration <- c(ES, HS, SS)
speakers <- c(rep("ES", 100), rep("HS", 100), rep("SS", 100))
df <- data.frame(speakers, duration)
df[, 2] <- as.numeric(df[, 2])
# inspect data
head(df)
 
# and this is what the first rows of the data frame look like:
 
#>  speakers duration
#>1       ES 58.58587
#>2       ES 45.10878
#>3       ES 70.49455
#>4       ES 51.82427
#>5       ES 51.55624
#>6       ES 57.09725
 
#####################################################

In a next step, we are going to create the simplest boxplot possible (it doesn’t look very fancy yet, but we are going to customize it later on…)
The function we use to set up a boxplot is simply called “boxplot” and it takes the variables to be plotted and the data set as mandatory arguments.

1
2
3
4
5
#####################################################
# set up a first simple box plot
boxplot(duration ~ speakers, data = df)
 
#####################################################

Here is our first (very hmm let’s say basic) boxplot:

boxplotexp1

After haing created a first very simple boxplot, we are going to customize it and make it look much nicer.
To do so, we are going to make use of the inbuild arguments taht can be used to specify features of our boxplot. Something that is not really neccessary but which allows you to specify and customize axes is to not draw them at frist, but draw them separately from the plot – and this is exactly, what we are goign to do now:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
 
#####################################################
# set up a nicer box plot
boxplot(duration ~ speakers,
  data = df, # the data we want to display
  main = "", # you could specify a title here
  ylab = "Duration (ms)", # label of the y-axis
  ylim = c(0, 100), # label of the x-axis
  axes = F, # do not draw axes yet
  notch = T, # include notches
  col = c("lightgreen", "lightgrey", "lightblue")) # create boxplots with different colors
 
# now, we create the x-axis
axis(1, # set up the x-axis (1 = x, 2 = y)
  at = 1:3, # we specify the locations where we want the tickmarks
  labels = c("", "", ""), # you could specify the text here
  lty = 1, # we define the linetype (1 = straight line)
  col = "black", # the tickmarks should be black
  las = .8) # the font size should be 80% of the normal size
 
# we now set up the y-axis
axis(2, # set up y-axis
  at = c(0, 20, 40, 60, 80, 100), # create tick marks at the specified locations
  labels= c("0", "20", "40", "60", "80", "100"), #create text at the specified locations
  lty = 1, # we define the linetype (1 = straight line)
  col = "black", # the tickmarks should be black
  las = .8) # the font size should be 80% of the normal size
 
#####################################################

Here is our customized boxplot:
boxplotexp2

Now, we are goign to finish off our customized boxplot by including +-symbols at the location of the means and also add text which provides the values of the means for each group.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
 
#####################################################
mtext(c("Group 1", "Group 2", "Group 3"), # create specified text 
  side = 1, # put text along the x-axis
  line = 3, # place text at the 3rd line of the x-axis
  at = 1:3) # put text at location 1 to 3
 
text(1:3, 
  c(as.vector(by(df$duration, df$speakers, mean))[1], 
  as.vector(by(df$duration, df$speakers, mean))[2],
  as.vector(by(df$duration, df$speakers, mean))[3]), 
  "+")
 
text(1:3, 
  c(-1.0, -1.0, -1.0, -1.0), 
  cex = 0.85, 
  labels = paste("mean\n",
  c(round(as.vector(by(df$duration, df$speakers, mean))[1], 2), 
    round(as.vector(by(df$duration, df$speakers, mean))[2], 2),
    round(as.vector(by(df$duration, df$speakers, mean))[3], 2), 
    sep = "")))
rug(jitter(df$duration), 
  side=4)
grid()
box()
 
###############################################################

Below is what the code produces.

boxplotexp

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.