I've recently given a workshop at Donuts and Distribution, a statistics reading group for the Second Language Studies program at Michigan State University, on the topic of visualizing data. The presentation slides and workshop materials can be found on my RPubs page, Part 1 and Part 2. These were designed for novice/intermediate audience. Part 1 … Continue reading Visualizing second language research data in R using ggplot2
I have started using R relatively recently, but I see more and more people learning and using it around me. In this post, I offer little tips that might be helpful for beginners setting up their RStudio environment. It is small things that make your lives better, and if you don't already know, here are … Continue reading Getting comfortable with the RStudio environment
I've created a shiny document analyzing the Detroit Tigers' batting statistics on the first 60 games. Click here to view the document.
In this post, I'm documenting how to reformat the XML-formatted files outputted by the Stanford CoreNLP tool. This might not be the most elegant way to go about it, but this is something that works for me. Here, I will be using R and the XML files produced in the previous step. Creating tagged text The … Continue reading Working with XML-formatted text annotations in R
***This post has been updated on my new website. To use CoreNLP with command and export to xml or other formats: see this post To use CoreNLP with R: see this post*** As the title suggests, this is a guide to automatically annotating raw texts using the Stanford CoreNLP. This tool carries out a similar … Continue reading A guide to using the Stanford CoreNLP Tools for automatic text annotation
*** I have this post on a new website with some updates *** If you're working with language data, you probably want to process text files rather than strings of words you type on to an R script. Here is how to deal with files. Refer to the previous post for setting the tools up if … Continue reading A basic guide to using NLP for corpus analysis with R (Part 2): Processing text files
*** I have this post on a new website with some updates *** This is Part 1 of a basic guide for setting up and using a natural language processing (NLP) tool with R. I specifically utilze the spaCy “industrial strength natural language processing” Python library, and an R wrapper called cleanNLP that provides tools … Continue reading A basic guide to using NLP for corpus analysis with R (Part 1): Installing Python, spaCy, and cleanNLP