**TUTORIALS**

Below are links to tutorials I created for the Language Technology and Data Analysis Laboratory (LADAL).

**DATA SCIENCE BASICS**- This LADAL tutorial provides some useful tips and tricks about working with computers, e.g. how to keep your computer running smoothly.
- This LADAL tutorial introduces basic conecpts of data science
- This LADAL tutorial introduces quantitative reasoning
- This LADAL tutorial introduces basic concepts of quantitative research (methodology)

**INTRODUCTION TO R**- This LADAL tutorial represents an introduction R for (absolute) beginners
- This LADAL tutorial introduces string processing with R
- This LADAL tutorial introduces regular expression in R
- This LADAL tutorial shows how to deal with (cerate, manipulate, and process) tabulated data in R

**DATA VISUALIZATION**- This LADAL tutorial introduces data visualization with R
- This LADAL tutorial exemplifies how to create common visualization types (scatter plot, line graph, bar pots, box plots, etc.) with R
- This LADAL tutorial exemplifies how to generate some lesser known but very useful visualization types in R
- This LADAL tutorial introduces geo-spatial data visualization (mapping) with R
- This LADAL tutorial shows how to generate interactive data visualizations in R using GoogleViz

**STATISTICS**- This LADAL tutorial introduces descriptive statistics.
- This LADAL tutorial introduces basic inferential statistics
- This LADAL tutorial introduces fixed- and mixed effects regression
- This LADAL tutorial introduces tree-based models
- This LADAL tutorial introduces cluster and correspondence analysis
- This LADAL tutorial introduces other grouping procedures like Semantic Vector Space models

**TEXT ANALYTICS / TEXT MINING / CORPUS LINGUISTICS**- This LADAL tutorial introduces text analysis and distant reading.
- This LADAL tutorial shows how to generate keyword-in-context concordances in R
- This LADAL tutorial introduces Network Analysis in R
- This LADAL tutorial introduces Co-occurrence and Collocation Analysis in R
- This LADAL tutorial introduces Topic Modeling with R
- This LADAL tutorial introduces Sentiment Analysis with R
- This LADAL tutorial shows how to add part-of-speech annotation (pos-tagging) and syntactic parsing in R for English, German, Spanish, Italian, and Dutch.

**CASE STUDIES /FOCUS TUTORIALS****Creating vowel charts with Praat and R**This LADAL tutorial shows how to extract formant values in Praat and use these to create a vowel chart in R.

**Text Mining with R: Building a Text Classifier**

This tutorial exemplifies how to create a text classifier with R, i.e. it will implement a machine-learning algorithm, which classifies texts as being either a speech by Barack Obama or Mitt Romney. The script is based on Timothy DAuria’s YouTube tutorial “How to Build a Text Mining, Machine Learning Document Classification System in R!” (https://www.youtube.com/watch?v=j1V2McKbkLo). The data is available here and the code for downloading the speeches is available here.**Corpus Linguistics: Gender and Age Differences in Swearing**

This LADAL tutorial exemplifies how to perform a simple corpus analysis with R by focusing on gender and age differences in swear word use in Irish English.**PDF to txt**

This LADAL tutorial shows how to extract the text from pdf-files into txt-files for further processing.**Webcrawling and -scraping with R**

This LADAL tutorial shows how to crawl and scrape websites using R.

**FOR STUDENTS
**

**General Notes for Students attending my Courses (Merkblatt für Seminare)**

You will find a documents with general information about my seminars here. Please read this document in case you are attending or plan to attend one of my seminars! (last updated 2015/02/16)**Model term paper**

You will find a model term paper here. This model term paper includes information about the structure, content, and formatting of term papers. You can also use it as a template for your own term papers and use the formatting within the model. (last updated 2015/04/08)**Course Materials**

“Introduction to English Linguistics” [sdm_download id=”469″ fancy=”0″]

“Methods in Linguistics/Methoden der Linguistik” [sdm_download id=”461″ fancy=”0″]

**PROGRAMMING / SOFTWARE DEVELOPMENT / CORPUS LINGUISTICS**

Below you can find some resources such as scripts and data sets that you may find useful.

**R scripts**- Chi Squared test for subtables of 2*k tables (R script)
- Configural Frequency Analysis for data with only two level configurations (R script)
- Function written by Tony Breyal for downloading text from websites (to create corpora containing web data) (R script)
- Function providing nice summaries of simple linear regressions (R script)
- Function providing nice summaries of multiple linear regressions (R script)
- Function providing nice summaries of fixed-effects binomial logistic regressions linear regression (R script)
- Function providing nice summaries of step-wise step-up model fitting of fixed-effects binomial logistic regressions linear regression (R script)
- Function providing nice summaries of step-wise step-up model fitting of mixed-effects binomial logistic regressions linear regression (R script)
- Function providing nice summaries of step-wise step-down model fitting of mixed-effects binomial logistic regressions linear regression (R script)
**Biodata scripts & data sets**(last updated 2015/02/09)

If you find any bugs in the code or mistakes in the results, please let me know so I can correct the scripts and update the results.- ICE Canada: word counts and biodata (R script, result)
- ICE GB-R2: word counts and biodata (R script, result)
- ICE India: word counts and biodata (R script, result)
- ICE Ireland 1.2.2: word counts and biodata (R script, result)
- ICE Jamaica: word counts and biodata (R script, result)
- ICE New Zealand: word counts and biodata (R script, result)
- ICE Philippines: word counts and biodata (R script, result)
- ICE Singapore: word counts (R script, result)
- ICE Hong Kong: word counts (R script, result)
- SBCAE: word counts and biodata (R script, result)

**TestCorpus**

A small sample corpus for testing functions.

(last updated 2020/09/25)