Resources

TUTORIALS

Below are links to tutorials I created for the Language Technology and Data Analysis Laboratory (LADAL).

  • DATA SCIENCE BASICS
  • INTRODUCTION TO R
  • DATA VISUALIZATION
  • STATISTICS
  • TEXT ANALYTICS / TEXT MINING / CORPUS LINGUISTICS
  • CASE STUDIES /FOCUS TUTORIALS
    • Creating vowel charts with Praat and R
      This LADAL tutorial shows how to extract formant values in Praat and use these to create a vowel chart in R.
    • Text Mining with R: Building a Text Classifier
      This tutorial exemplifies how to create a text classifier with R, i.e. it will implement a machine-learning algorithm, which classifies texts as being either a speech by Barack Obama or Mitt Romney. The script is based on Timothy DAuria’s YouTube tutorial “How to Build a Text Mining, Machine Learning Document Classification System in R!” (https://www.youtube.com/watch?v=j1V2McKbkLo). The data is available here and the code for downloading the speeches is available here.
    • Corpus Linguistics: Gender and Age Differences in Swearing
      This LADAL tutorial exemplifies how to perform a simple corpus analysis with R by focusing on gender and age differences in swear word use in Irish English.
    • PDF to txt
      This LADAL tutorial shows how to extract the text from pdf-files into txt-files for further processing.
    • Webcrawling and -scraping with R
      This LADAL tutorial shows how to crawl and scrape websites using R.

FOR STUDENTS

  • General Notes for Students attending my Courses (Merkblatt für Seminare)
    You will find a documents with general information about my seminars here. Please read this document in case you are attending or plan to attend one of my seminars! (last updated 2015/02/16)
  • Model term paper
    You will find a model term paper here. This model term paper includes information about the structure, content, and formatting of term papers. You can also use it as a template for your own term papers and use the formatting within the model. (last updated 2015/04/08)
  • Course Materials
    “Introduction to English Linguistics” [sdm_download id=”469″ fancy=”0″]
    “Methods in Linguistics/Methoden der Linguistik” [sdm_download id=”461″ fancy=”0″]

PROGRAMMING / SOFTWARE DEVELOPMENT / CORPUS LINGUISTICS
Below you can find some resources such as scripts and data sets that you may find useful.

  • R scripts
    • Chi Squared test for subtables of 2*k tables (R script)
    • Configural Frequency Analysis for data with only two level configurations (R script)
    • Function written by Tony Breyal for downloading text from websites (to create corpora containing web data) (R script)
    • Function providing nice summaries of simple linear regressions (R script)
    • Function providing nice summaries of multiple linear regressions (R script)
    • Function providing nice summaries of fixed-effects binomial logistic regressions linear regression (R script)
    • Function providing nice summaries of step-wise step-up model fitting of fixed-effects binomial logistic regressions linear regression (R script)
    • Function providing nice summaries of step-wise step-up model fitting of mixed-effects binomial logistic regressions linear regression (R script)
    • Function providing nice summaries of step-wise step-down model fitting of mixed-effects binomial logistic regressions linear regression (R script)
    • Biodata scripts & data sets (last updated 2015/02/09)
      If you find any bugs in the code or mistakes in the results, please let me know so I can correct the scripts and update the results.

  • TestCorpus
    A small sample corpus for testing functions.

(last updated 2020/09/25)

Loading

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert