The field of natural language processing (NLP) aims at getting computers to perform useful and interesting tasks with human language. This course introduces students to the 3 pillars underlying modern NLP: probabilistic language models, simple neural networks with a focus on gradient based learning, and vector-based meaning representations in the form of word embeddings. At the end of the course, students will be able to implement and analyze probabilistic language models based on N-grams, text classifiers using logistic regression and gradient-based learning, and vector-based approaches to word meaning and text classification.
This course can be taken for academic credit as part of CU Boulder’s MS in Data Science or MS in Computer Science degrees offered on the Coursera platform. These fully accredited graduate degrees offer targeted courses, short 8-week sessions, and pay-as-you-go tuition. Admission is based on performance in three preliminary courses, not academic history. CU degrees on Coursera are ideal for recent graduates or working professionals. Learn more:
MS in Data Science: https://www.coursera.org/degrees/master-of-science-data-science-boulder
MS in Computer Science: https://coursera.org/degrees/ms-computer-science-boulder
This first week of Fundamentals of Natural Language Processing introduces the fundamental concepts of natural language processing (NLP), focusing on how computers process and analyze human language. You will explore key linguistic structures, including words and morphology, and learn essential techniques for text normalization and tokenization.
What's included
5 videos8 readings2 assignments
Show info about module content
5 videos•Total 56 minutes
Meet Your Instructor•1 minute
Course Introduction•7 minutes
Morphology•16 minutes
Text Normalization•17 minutes
Subword Tokenization•15 minutes
8 readings•Total 141 minutes
Course Updates and Accessibility Support•1 minute
Earn Academic Credit for Your Work! •10 minutes
Course Support•10 minutes
Assessment Expectations•5 minutes
AI Citation and Acknowledgement•10 minutes
Morphology•30 minutes
Text Normalization•60 minutes
Byte-Pair Encoding•15 minutes
2 assignments•Total 35 minutes
AI Policy Quiz•5 minutes
Quiz 1: Morphology and Tokenization•30 minutes
Probabilistic Language Models
Module 2•6 hours to complete
Module details
This week explores foundational language modeling techniques, focusing on n-gram models and their role in statistical Natural Language Processing. You will learn how n-gram language models are constructed, smoothed, and evaluated for effectiveness.
N-Gram Language Models: Smoothing, Interpolation, and Backoff•20 minutes
Evaluating Language Models•30 minutes
1 assignment•Total 30 minutes
Quiz 2: Language Models•30 minutes
1 programming assignment•Total 180 minutes
Constructing a Language Model•180 minutes
Text Classification and Logistic Regression
Module 3•7 hours to complete
Module details
This week introduces text classification and explores logistic regression as a powerful classification technique. You will learn how logistic regression models work, including key mathematical concepts such as the logit function, gradients, and stochastic gradient descent. The week also covers evaluation metrics for assessing classifier performance.
Learning Algorithms for Logistic Regression•17 minutes
Evaluating Classifiers•21 minutes
3 readings•Total 125 minutes
Introduction to Text Classification•35 minutes
Logistic Regression•60 minutes
Evaluating Classifiers•30 minutes
1 assignment•Total 30 minutes
Quiz 3: Logistic Regression •30 minutes
1 programming assignment•Total 180 minutes
Sentiment Classification with Logistic Regression•180 minutes
Vector Space Semantics and Word Embeddings
Module 4•7 hours to complete
Module details
This final week explores how words can be represented as vectors in a high-dimensional space, allowing computational models to capture semantic relationships between words. You will learn about both sparse and dense vector representations, including TF-IDF, Pointwise Mutual Information (PMI), Latent Semantic Analysis (LSA), and Word2Vec. The module also covers techniques for evaluating and applying word embeddings.
Evaluation and Application of Word Embeddings•45 minutes
1 assignment•Total 30 minutes
Quiz 4: Vector-Space Semantics•30 minutes
1 programming assignment•Total 180 minutes
Training and Applying Word Embeddings•180 minutes
Build toward a degree
This course is part of the following degree program(s) offered by University of Colorado Boulder. If you are admitted and enroll, your completed coursework may count toward your degree learning and your progress can transfer with you.¹
View eligible degrees
Build toward a degree
This course is part of the following degree program(s) offered by University of Colorado Boulder. If you are admitted and enroll, your completed coursework may count toward your degree learning and your progress can transfer with you.¹
¹Successful application and enrollment are required. Eligibility requirements apply. Each institution determines the number of credits recognized by completing this content that may count towards degree requirements, considering any existing credits you may have. Click on a specific course for more information.
OK
Instructor
Instructor ratings
Instructor ratings
We asked all learners to give feedback on our instructors based on the quality of their teaching style.
CU Boulder is a dynamic community of scholars and learners on one of the most spectacular college campuses in the country. As one of 34 U.S. public institutions in the prestigious Association of American Universities (AAU), we have a proud tradition of academic excellence, with five Nobel laureates and more than 50 members of prestigious academic academies.
What is the recommended background for this course?
Learners should be proficient in Python programming including the use of packages such as numpy, scikit-learn and pandas. Students should be proficient in data structures and basic topics in algorithm design, such as sorting and searching, dynamic programming, and algorithm analysis. Students should also have basic familiarity with introductory concepts from calculus, discrete probability, and linear algebra.
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I subscribe to this Specialization?
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Is financial aid available?
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.