Do I need any specific software or tools to complete the course successfully?

<text variant="body1">Only a modern web browser is required to complete this course and all hands-on labs. You will be provided access to cloud-based environments to complete the labs at no charge.

What will I get if I subscribe to this Certificate?

When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Generative AI Language Modeling with Transformers

Enjoy unlimited growth with a year of Coursera Plus for $199 (regularly $399). Save now.

Generative AI Language Modeling with Transformers

This course is part of multiple programs.

Instructors: Joseph Santarcangelo

21,174 already enrolled

Included with

Learn more

2 modules

Gain insight into a topic and learn the fundamentals.

4.5

(130 reviews)

Intermediate level

Recommended experience

9 hours to complete

Flexible schedule

Learn at your own pace

2 modules

Gain insight into a topic and learn the fundamentals.

4.5

(130 reviews)

Intermediate level

Recommended experience

9 hours to complete

Flexible schedule

Learn at your own pace

What you'll learn

Explain the role of attention mechanisms in transformer models for capturing contextual relationships in text
Describe the differences in language modeling approaches between decoder-based models like GPT and encoder-based models like BERT
Implement key components of transformer models, including positional encoding, attention mechanisms, and masking, using PyTorch
Apply transformer-based models for real-world NLP tasks, such as text classification and language translation, using PyTorch and Hugging Face tools

Skills you'll gain

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

6 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is available as part of

When you enroll in this course, you'll also be asked to select a specific program.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

There are 2 modules in this course

This course provides a practical introduction to using transformer-based models for natural language processing (NLP) applications. You will learn to build and train models for text classification using encoder-based architectures like Bidirectional Encoder Representations from Transformers (BERT), and explore core concepts such as positional encoding, word embeddings, and attention mechanisms.

The course covers multi-head attention, self-attention, and causal language modeling with GPT for tasks like text generation and translation. You will gain hands-on experience implementing transformer models in PyTorch, including pretraining strategies such as masked language modeling (MLM) and next sentence prediction (NSP). Through guided labs, you’ll apply encoder and decoder models to real-world scenarios. This course is designed for learners interested in generative AI engineering and requires prior knowledge of Python, PyTorch, and machine learning. Enroll now to build your skills in NLP with transformers!

In this module, you will learn how transformers process sequential data using positional encoding and attention mechanisms. You will explore how to implement positional encoding in PyTorch and understand how attention helps models focus on relevant parts of input sequences. You'll dive deeper into self-attention and scaled dot-product attention with multiple heads to see how they contribute to language modeling tasks. The module also explains how the transformer architecture leverages these mechanisms efficiently. Through hands-on labs, you’ll implement these concepts and build transformer encoder layers in PyTorch. Finally, you'll apply transformer models for text classification, including building a data pipeline, defining the model, and training it, while also exploring techniques to optimize transformer training performance.

What's included

6 videos4 readings2 assignments2 app items2 plugins

6 videosTotal 39 minutes

Course Introduction2 minutes
Positional Encoding6 minutes
Attention Mechanism7 minutes
Self-attention Mechanism7 minutes
From Attention to Transformers7 minutes
Transformers for Classification: Encoder8 minutes

4 readingsTotal 17 minutes

Course Overview5 minutes
Specialization Overview7 minutes
Optimization Techniques for Efficient Transformer Training 3 minutes
Summary and Highlights2 minutes

2 assignmentsTotal 45 minutes

Practice Quiz: Positional Encoding, Attention, and Application in Classification15 minutes
Graded Quiz: Fundamental Concepts of Transformer Architecture30 minutes

2 app itemsTotal 105 minutes

Hands-on Lab: Attention Mechanism and Positional Encoding45 minutes
Hands-on Lab: Applying Transformers for Classification60 minutes

2 pluginsTotal 7 minutes

Helpful Tips for Course Completion2 minutes
Reading: Beginner's Guide to Transformer Model Fundamentals5 minutes

In this module, you will learn how decoder-based models like GPT are trained using causal language modeling and implemented in PyTorch for both training and inference. You will explore encoder-based models, such as Bidirectional Encoder Representations from Transformers (BERT), and understand their pretraining strategies using masked language modeling (MLM) and next sentence prediction (NSP), along with data preparation techniques in PyTorch. You will also examine how transformer architectures are applied to machine translation, including their implementation using PyTorch. Through hands-on labs, you will gain practical experience with decoder models, encoder models, and translation tasks. The module concludes with a cheat sheet, glossary, and summary to help consolidate your understanding of key concepts.

What's included

10 videos6 readings4 assignments4 app items3 plugins

10 videosTotal 67 minutes

Language Modeling with the Decoders and GPT-like Models6 minutes
Training Decoder Models7 minutes
Decoder Models- PyTorch Implementation-Causal LM5 minutes
Decoder Models: PyTorch Implementation Using Training and Inference5 minutes
Encoder Models with BERT: Pretraining Using MLM5 minutes
Encoder Models with BERT: Pretraining Using NSP6 minutes
Data Preparation for BERT with PyTorch8 minutes
Pretraining BERT Models with PyTorch8 minutes
Transformer Architecture for Language Translation5 minutes
Transformer Architecture for Translation: PyTorch Implementation7 minutes

6 readingsTotal 9 minutes

Summary and Highlights1 minute
Summary and Highlights1 minute
Summary and Highlights1 minute
Course Conclusion2 minutes
Thanks from the Course team2 minutes
Congratulations and Next Steps2 minutes

4 assignmentsTotal 63 minutes

Practice Quiz: Decoder Models12 minutes
Practice Quiz: Encoder Models12 minutes
Practice Quiz: Application of Transformers for Translation9 minutes
Graded Quiz: Advanced Concepts of Transformer Architecture30 minutes

4 app itemsTotal 180 minutes

Hands-on Lab: Decoder GPT-like Models45 minutes
Hands-on Lab: Pretraining BERT Models60 minutes
Hands-on Lab: Data Preparation for BERT45 minutes
Lab: Transformers for Translation30 minutes

3 pluginsTotal 25 minutes

Reading: Getting Started with Advanced Concepts of Transformer Models7 minutes
Cheat Sheet: Language Modeling with Transformers15 minutes
Course Glossary: Language Modeling with Transformers 3 minutes

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructors

Instructor ratings

4.2 (30 ratings)

Joseph Santarcangelo

IBM

36 Courses2,268,886 learners

Offered by

IBM

Explore more from Machine Learning

Status: Preview
University of Glasgow
Generative Pre-trained Transformers (GPT)
Course
Status: Free Trial
Pearson
Introduction to Transformer Models for NLP: Unit 1
Course
Status: Preview
Board Infinity
Transformers in Action: A Practical Approach to NLP and AI
Course
Status: Free Trial
Whizlabs
NVIDIA: Fundamentals of NLP and Transformers
Course

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

4.5

130 reviews

5 stars
74.80%
4 stars
12.21%
3 stars
4.58%
2 stars
2.29%
1 star
6.10%

Showing 3 of 130

Reviewed on Jan 17, 2025

Exceptional course and all the labs are industry related

Reviewed on Dec 29, 2024

This course gives me a wide picture of what transformers can be.

Reviewed on Sep 1, 2025

I loved this course. It is very informative and has a lot of examples. It will take some time to master all this information.

View more reviews

Open new doors with Coursera Plus

Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Learn more

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Explore degrees

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Learn more

Frequently asked questions

It will take only two weeks to complete this course if you spend 3–5 hours of study time per week.

It would be good if you had a basic knowledge of Python and a familiarity with machine learning and neural network concepts. It would be beneficial if you are familiar with text preprocessing steps and N-gram, Word2Vec, and sequence-to-sequence models. Knowledge of evaluation metrics such as bilingual evaluation understudy (BLEU) will be advantageous.

This course is part of the Generative AI Engineering Essentials with LLMs PC specialization. When you complete the specialization, you will prepare yourself with the skills and confidence to take on jobs such as AI Engineer, NLP Engineer, Machine Learning Engineer, Deep Learning Engineer, and Data Scientist.

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.