Working with XML-formatted text annotations in R

In this post, I'm documenting how to reformat the XML-formatted files outputted by the Stanford CoreNLP tool. This might not be the most elegant way to go about it, but this is something that works for me. Here, I will be using R and the XML files produced in the previous step. Creating tagged text The … Continue reading Working with XML-formatted text annotations in R

Advertisements

A guide to using the Stanford CoreNLP Tools for automatic text annotation

As the title suggests, this is a guide to automatically annotating raw texts using the Stanford CoreNLP. This tool carries out a similar function as the cleanNLP and spaCy combination that I have discussed in a previous post. When working with CoreNLP, the annotation itself does not require using R and the annotated output is … Continue reading A guide to using the Stanford CoreNLP Tools for automatic text annotation

A basic guide to using NLP for corpus analysis with R (Part 2): Processing text files

If you're working with language data, you probably want to process text files rather than strings of words you type on to an R script. Here is how to deal with files. Refer to the previous post for setting the tools up if needed.  Again, please see the pdf version to see the R script output. … Continue reading A basic guide to using NLP for corpus analysis with R (Part 2): Processing text files

A basic guide to using NLP for corpus analysis with R (Part 1): Installing Python, spaCy, and cleanNLP

This is Part 1 of a basic guide for setting up and using a natural language processing (NLP) tool with R. I specifically utilze the spaCy “industrial strength natural language processing” Python library, and an R wrapper called cleanNLP that provides tools for annotating texts and obtaining data tables. In this post, I will explain … Continue reading A basic guide to using NLP for corpus analysis with R (Part 1): Installing Python, spaCy, and cleanNLP