Working with XML-formatted text annotations in R

In this post, I'm documenting how to reformat the XML-formatted files outputted by the Stanford CoreNLP tool. This might not be the most elegant way to go about it, but this is something that works for me. Here, I will be using R and the XML files produced in the previous step. Creating tagged text The … Continue reading Working with XML-formatted text annotations in R

Advertisements

A guide to using the Stanford CoreNLP Tools for automatic text annotation

***This post has been updated on my new website. To use CoreNLP with command and export to xml or other formats: see this post To use CoreNLP with R: see this post***   As the title suggests, this is a guide to automatically annotating raw texts using the Stanford CoreNLP. This tool carries out a similar … Continue reading A guide to using the Stanford CoreNLP Tools for automatic text annotation

A basic guide to using NLP for corpus analysis with R (Part 2): Processing text files

*** I have this post on a new website with some updates *** If you're working with language data, you probably want to process text files rather than strings of words you type on to an R script. Here is how to deal with files. Refer to the previous post for setting the tools up if … Continue reading A basic guide to using NLP for corpus analysis with R (Part 2): Processing text files

A basic guide to using NLP for corpus analysis with R (Part 1): Installing Python, spaCy, and cleanNLP

*** I have this post on a new website with some updates *** This is Part 1 of a basic guide for setting up and using a natural language processing (NLP) tool with R. I specifically utilze the spaCy “industrial strength natural language processing” Python library, and an R wrapper called cleanNLP that provides tools … Continue reading A basic guide to using NLP for corpus analysis with R (Part 1): Installing Python, spaCy, and cleanNLP