Quantitative Text Analysis and Textual Similarity in R

Offered By
Coursera Project Network
In this Guided Project, you will:

Tokenize the dataset and convert the data into a document feature matrix Calculate cosine similarity across documents and plot the output

Clock1 hour
BeginnerBeginner
CloudNo download needed
VideoSplit-screen video
Comment DotsEnglish
LaptopDesktop only

By the end of this project, you will learn about the concept of document similarity in textual analysis in R. You will know how to load and pre-process a data set of text documents by converting the data set into a corpus and document feature matrix. You will know how to calculate the cosine similarity between documents and explore and plot the output of your calculation.

Skills you will develop

  • cosine similarity
  • Text Analysis
  • Document Similarity
  • Data Visualization (DataViz)
  • Text Corpus

Learn step-by-step

In a video that plays in a split-screen with your work area, your instructor will walk you through these steps:

  1. Load textual data into R and turn it into a corpus object and understand the concept of calculating document similarity in textual analysis

  2. Extract meta-data from text document filenames and subset the data frame to exclude unwanted data

  3. Tokenize and clean the dataset and convert the data into a document feature matrix

  4. Calculate cosine similarity across documents and plot the output

How Guided Projects work

Your workspace is a cloud desktop right in your browser, no download required

In a split-screen video, your instructor guides you step-by-step

Frequently asked questions

Frequently Asked Questions

More questions? Visit the Learner Help Center.