Archiv der Kategorie: Allgemein

Text Mining with R: Building a Text Classifier

This post will exemplify how to create a text classifier with R, i.e. it will implement a machine-learning algorithm, which classifies texts as being either a speech by Barack Obama or Mitt Romney. The script is based on Timothy DAuria’s YouTube tutorial „How to Build a Text Mining, Machine Learning Document Classification System in R!“ (https://www.youtube.com/watch?v=j1V2McKbkLo). As it has been suggested that it may be helpful to make the speeches available for download to render this example reproducible, the respective folders with the speeches are accessible here and the code for downloading the speeches is available here.
Weiterlesen

(Syntactic) Parsing in R

This post will exemplify how to syntactically parse a corpus with R (here is the code with the paRsing function). Syntactic Parsing is a form of annotating text in which POS tags are assigned to lexical items and then lexical items are grouped together in phrasal constituents. Syntactic parsing is thus an extension of POS tagging as syntactic parsing requires POS tagging. This post will not go into the theoretical background and various approaches to syntactic parsing – syntactic parsing is quite complex both in terms of theory and practical implementation – but it will simply show how you can use R to parse some text based on the Apache OpenNLP Maxent Parser.
Weiterlesen

Two-Level CFA

In this post, I am going to show you how to implement a CFA (configural frequency analysis) with only two level configurations in R (here is the code for the TwoLevelCFA function). The cfa function (written beautifully as far as I can see by Stefan Funke) is set up in a way that it only works properly on data with at least three level configuration. According to Bortz, Lienert, and Boehnke (2008: 155-157), CFAs can also be applied to data where there are only two configurations. We are thus using Stafan Funke’s function, extracting the parameters necessary for us and then calculating p-vales, effect sizes, and some other helpful parameters. Please keep in mind, that this procedure is somewhat sloppy, as we do not calculate Q but work on χ2 statistics.

Weiterlesen

Plotting examples – Boxplots in R

Probably my favorite way to display data are boxplots. Boxplots are used if you want to display one numeric vector or when you have a categorical and a numeric variable, e.g. you are looking at reaction times cross different groups are frequencies across the sex and age. The advantage over other displays lies in the fact that boxplots show aspects of the underlying distribution and also allows statistical inferences directly from the display. Quick R offers a very nice introduction to boxplots and I highly recommend you have a look at the link.

Weiterlesen

Plotting Examples – Line Graph in R

R is extremely versatile when it comes to plotting data but it can be troublesome to use R for visualizations – particularly when you are not yet as used to R. In the following I will show you how to set up a line graph with three lines representing different mean frequencies of three groups during three stages of a process. I frequently use line plots and as a colleague struggled to set one up in R I thought that including an example may be of interest for some of you.
Weiterlesen