I've recently given a workshop at Donuts and Distribution, a statistics reading group for the Second Language Studies program at Michigan State University, on the topic of visualizing data. The presentation slides and workshop materials can be found on my RPubs page, Part 1 and Part 2. These were designed for novice/intermediate audience. Part 1 … Continue reading Visualizing second language research data in R using ggplot2
I have started using R relatively recently, but I see more and more people learning and using it around me. In this post, I offer little tips that might be helpful for beginners setting up their RStudio environment. It is small things that make your lives better, and if you don't already know, here are … Continue reading Getting comfortable with the RStudio environment
I've created a shiny document analyzing the Detroit Tigers' batting statistics on the first 60 games. Click here to view the document.
In this post, I'm documenting how to reformat the XML-formatted files outputted by the Stanford CoreNLP tool. This might not be the most elegant way to go about it, but this is something that works for me. Here, I will be using R and the XML files produced in the previous step. Creating tagged text The … Continue reading Working with XML-formatted text annotations in R
As the title suggests, this is a guide to automatically annotating raw texts using the Stanford CoreNLP. This tool carries out a similar function as the cleanNLP and spaCy combination that I have discussed in a previous post. When working with CoreNLP, the annotation itself does not require using R and the annotated output is … Continue reading A guide to using the Stanford CoreNLP Tools for automatic text annotation
If you're working with language data, you probably want to process text files rather than strings of words you type on to an R script. Here is how to deal with files. Refer to the previous post for setting the tools up if needed. Again, please see the pdf version to see the R script output. … Continue reading A basic guide to using NLP for corpus analysis with R (Part 2): Processing text files
This is Part 1 of a basic guide for setting up and using a natural language processing (NLP) tool with R. I specifically utilze the spaCy “industrial strength natural language processing” Python library, and an R wrapper called cleanNLP that provides tools for annotating texts and obtaining data tables. In this post, I will explain … Continue reading A basic guide to using NLP for corpus analysis with R (Part 1): Installing Python, spaCy, and cleanNLP