Introduces the theoretical foundations and advanced concepts of neural networks, generative models, transformers, and large language models. Students will explore how these AI systems create new data, process information, and learn through feedback, while analyzing their applications across various fields. The course emphasizes key principles in model building, optimization, and real-world generative AI use cases.

Generative AI Part 1

Recommended experience
Recommended experience
Intermediate level
Understanding of linear algebra.
Recommended experience
Recommended experience
Intermediate level
Understanding of linear algebra.
Details to know

Add to your LinkedIn profile
18 assignments
April 2026
See how employees at top companies are mastering in-demand skills

There are 7 modules in this course
In this module, you will explore the foundations of neural networks, including perceptrons, architectures, and learning algorithms. You will dive deeply into optimization methods critical for efficient training, focusing on advanced techniques like Newton’s and quasi-Newton methods, momentum, RMSProp, and Adam optimization algorithms.
What's included
6 videos17 readings2 assignments
6 videos•Total 29 minutes
- Neural Networks Part 1: Perceptron•6 minutes
- Neural Networks Part 2: How Neural Networks Learn•6 minutes
- Neural Networks Part 3: Back Propagation•7 minutes
- Optimization Technique Overview Part 1•3 minutes
- Optimization Technique Overview Part 2•4 minutes
- Optimization Technique Overview Part 3•3 minutes
17 readings•Total 257 minutes
- Course Introduction•1 minute
- Meet Your Faculty•1 minute
- Syllabus - Generative AI Part 1•10 minutes
- Recommended Prior Knowledge•100 minutes
- Academic Integrity•1 minute
- Perceptron In-Depth•10 minutes
- Neural Network Breakdown•15 minutes
- Neural Network Structure•5 minutes
- How Neural Networks Learn: Deep Dive•10 minutes
- Backpropagation & SGD•20 minutes
- Module Overview•3 minutes
- Matrices•15 minutes
- Newton's Methods•15 minutes
- Quasi-Newton Methods•15 minutes
- Root-Mean-Square Propagation•15 minutes
- Adaptive Moment Estimation•20 minutes
- Module Wrap-Up•1 minute
2 assignments•Total 20 minutes
- Module 1- Assess Your Learning 1•10 minutes
- Module 1- Assess Your Learning 2•10 minutes
This module guides you through the mathematical approaches to regularization techniques that enhance neural network generalization and prevent overfitting. You will analyze concepts including Stein’s unbiased risk estimator, eigen decomposition, ensemble methods, dropout mechanisms, and advanced normalization techniques such as batch normalization.
What's included
4 videos17 readings2 assignments
4 videos•Total 23 minutes
- Regularization: Model Selection and Complexity•5 minutes
- Regularization Techniques•8 minutes
- Introduction to Dropout•4 minutes
- Introduction to Batch Normalization•6 minutes
17 readings•Total 160 minutes
- Module Overview•1 minute
- Stein’s Unbiased Risk Estimator•15 minutes
- Stein's Lemma•15 minutes
- Regularization•10 minutes
- Why Does Regularization Work?•15 minutes
- Eigen Decomposition and Singular Value Decomposition•15 minutes
- Understanding the Search Space•5 minutes
- Regularization Techniques•15 minutes
- Bagging and Other Ensemble Methods•5 minutes
- Deep Dive Into Dropout•15 minutes
- Applying Dropout to Linear Regression•15 minutes
- Deep Dive Into Batch Normalization•2 minutes
- Internal Covariate Shift and Domain Adaptation•10 minutes
- New Batch Normalization Techniques•15 minutes
- Batch Normalization Effects•5 minutes
- Alternatives to Batch Normalization•1 minute
- Module Wrap-Up•1 minute
2 assignments•Total 20 minutes
- Module 2- Assess Your Learning 1•10 minutes
- Module 2- Assess Your Learning 2•10 minutes
In this module, you will examine convolutional neural networks (CNNs), including convolution operations, parameter sharing, kernel methods, and multi-dimensional data structures. You'll explore advanced CNN architectures, regularization, normalization techniques, and the implications of random kernels on network learning behavior.
What's included
5 videos31 readings2 assignments
5 videos•Total 46 minutes
- Convolutional Neural Networks Part 1: The First Principles•10 minutes
- Convolutional Neural Networks Part 2: 1D Input•8 minutes
- Convolutional Neural Networks Part 3: Multiple Dimensions•9 minutes
- Convolutional Neural Networks Part 4: Backpropagation•12 minutes
- Convolutional Neural Networks Part 5: PixelCNN•7 minutes
31 readings•Total 270 minutes
- Module Overview•1 minute
- Introduction to Convolutional Neural Networks•2 minutes
- Invariance and Equivariance•5 minutes
- Convolution•5 minutes
- Translation•5 minutes
- Kernel Flipping•5 minutes
- Convolution vs. Cross-Correlation•5 minutes
- Edge Detection•15 minutes
- Types of Kernels•5 minutes
- Parameter Sharing and Filters•2 minutes
- CNNs for 1D Inputs•10 minutes
- Padding•5 minutes
- Stride, Kernel Size, and Dilation•2 minutes
- Convolutional Layers as Fully Connected Layers•10 minutes
- Convolution in Multidimensional Arrays•5 minutes
- Architecture of Convolutional NNs•10 minutes
- Downsampling•15 minutes
- Upsampling and Layers•5 minutes
- End-to-End Visualization of CNNs•30 minutes
- Backpropagation•15 minutes
- Convolutional Layers•25 minutes
- Kernel Weights•15 minutes
- Applications of CNNs•20 minutes
- Residual Neural Networks•20 minutes
- Recap on Regularization•2 minutes
- Ideas to Get Around the Optimization Problem•5 minutes
- Layer Normalization Formulas•5 minutes
- Filter Response Normalization (FRN)•10 minutes
- Normalizer-Free Networks•5 minutes
- Why Random Kernels Learn Different Things•5 minutes
- Module Wrap-Up•1 minute
2 assignments•Total 13 minutes
- Module 3- Assess Your Learning 1•10 minutes
- Module 3- Assess Your Learning 2•3 minutes
In this module, you will analyze the maths underpinning generative models and maximum likelihood estimation (MLE). You will explore divergence metrics such as Kullback-Leibler divergence, Bayesian network structures, and autoregressive modeling methods, focusing on their theoretical foundations and practical implications.
What's included
6 videos32 readings3 assignments
6 videos•Total 53 minutes
- Intro to Maximum Likelihood Learning•9 minutes
- Divergence Methods & Gradient Descent•11 minutes
- Representation Part 1: Distributions•10 minutes
- Representation Part 2: Discriminative vs General Models•9 minutes
- Autoregressive Models General Principles•9 minutes
- Autoregressive Models Continued•7 minutes
32 readings•Total 225 minutes
- Module Overview•1 minute
- Learning a Generative Model•8 minutes
- Goal of Learning•3 minutes
- What is “Best?"•2 minutes
- Learning as Density Estimation•1 minute
- Kullback-Leibler (KL-Divergence)•3 minutes
- Detour on KL-Divergence•3 minutes
- Expected Log-Likelihood•5 minutes
- Monte Carlo Estimation•8 minutes
- Extending the MLE Principle to Autoregressive Models•5 minutes
- MLE Learning: Gradient Descent•3 minutes
- MLE Learning: Stochastic Gradient Descent•4 minutes
- Empirical Risk and Overfitting•10 minutes
- Learning a Generative Model Part 2•5 minutes
- Basic Discrete Distributions•10 minutes
- Structure Through Independence•3 minutes
- Key Notion: Conditional Independence•15 minutes
- Bayesian Networks•5 minutes
- Examples•10 minutes
- Naive Bayes•8 minutes
- Discriminative vs. Generative Models•10 minutes
- Generative Models Are Still Useful•8 minutes
- Bayesian Networks vs. Neural Models•20 minutes
- Motivating Example: MNIST•2 minutes
- Introduction to Autoregressive Models•10 minutes
- Fully Visible Sigmoid Belief Networks (FVSBN)•10 minutes
- NADE: Neural Autoregressive Density Estimation•25 minutes
- General Discrete Distributions•5 minutes
- Real-Valued Neural Autoregressive Density-Estimator (RNADE)•5 minutes
- Autoregressive Models vs. Autoencoder•15 minutes
- Summary of Autoregressive Models•2 minutes
- Module Wrap-Up•1 minute
3 assignments•Total 30 minutes
- Module 4- Assess Your Learning 1•10 minutes
- Module 4- Assess Your Learning 2•10 minutes
- Module 4- Assess Your Learning 3•10 minutes
In this module, you will rigorously examine the foundations and implementation details of Recurrent Neural Networks (RNNs) for modeling sequential data. You will study the structure, dynamics, training procedures, and limitations of standard RNNs, explore gated architectures like LSTM and GRU mathematically, and extend these models with bidirectional and multilayer approaches.
What's included
4 videos14 readings3 assignments
4 videos•Total 31 minutes
- Introduction to Recurrent Neural Networks•11 minutes
- Training RNNs•7 minutes
- Long Short-Term Memory•8 minutes
- Gated Recurrent Unit (GRU)•5 minutes
14 readings•Total 93 minutes
- Module Overview•10 minutes
- Introduction to Recurrent Neural Networks•5 minutes
- Dynamic Systems•5 minutes
- Computing Gradient in RNNs•10 minutes
- Training an RNN Language Model•8 minutes
- Problems with RNNs•8 minutes
- Potential Solutions to RNN Issues•10 minutes
- Gated RNNs and LSTM•10 minutes
- Gated Recurrent Unit: In-Depth•10 minutes
- Extension of Residual Networks to RNNs•5 minutes
- Motivation•1 minute
- Intro to Bidirectional RNNs•5 minutes
- Multilayer RNNs•5 minutes
- Module Wrap-Up•1 minute
3 assignments•Total 9 minutes
- Module 5- Assess Your Learning 1•3 minutes
- Module 5- Assess Your Learning 2•3 minutes
- Module 5- Assess Your Learning 3•3 minutes
You will explore techniques essential to sequence-to-sequence modeling, with special emphasis on attention mechanisms. The module will guide you through the motivations behind attention, how attention weights are calculated, and how attention significantly improves sequence models in practical tasks.
What's included
3 videos8 readings2 assignments
3 videos•Total 20 minutes
- Sequence to Sequence Models•7 minutes
- Attention in Seq2Seq: Dynamic Attention•9 minutes
- Attention in Translation: Decoding•4 minutes
8 readings•Total 38 minutes
- Module Overview•2 minutes
- Motivation for Attention Mechanism•2 minutes
- Seq2Seq•7 minutes
- Challenges of Seq2Seq•5 minutes
- Attention Mechanism•10 minutes
- Computing Attention Weights•5 minutes
- Detailed Attention in Seq2Seq & Decoding•5 minutes
- Module Wrap-Up•2 minutes
2 assignments•Total 6 minutes
- Module 6- Assess Your Learning 1•3 minutes
- Module 6- Assess Your Learning 2•3 minutes
This module offers a deep investigation into Transformer architectures, focusing on self-attention mechanisms, positional encodings, multi-head attention, and various Transformer configurations. You will analyze how Transformers structurally differ from RNNs, and mathematically explore their capabilities and limitations.
What's included
3 videos16 readings4 assignments
3 videos•Total 25 minutes
- Transformers Part 1: Applications and Key Query Value•7 minutes
- Transformers Part 2: Self-Attention•9 minutes
- Transformers Part 3: Position Information•9 minutes
16 readings•Total 139 minutes
- Module Overview•2 minutes
- Key, Query, Value & Self-Attention•15 minutes
- Self-Attention As Routing•4 minutes
- Computing and Weighting Values•15 minutes
- Self-Attention in Summary•15 minutes
- Position Representation•10 minutes
- The Intuition•8 minutes
- Changing Alpha Directly & Future Masking•10 minutes
- Multihead Attention•10 minutes
- Sequence Tensor Form•5 minutes
- Transformer Mechanisms•10 minutes
- Types of Transformers•15 minutes
- Decoder Process with Cross-Attention•8 minutes
- Drawbacks of Transformers•8 minutes
- Module Wrap-Up•3 minutes
- Congratulations!•1 minute
4 assignments•Total 12 minutes
- Module 7- Assess Your Learning 1•3 minutes
- Module 7- Assess Your Learning 2•3 minutes
- Module 7- Assess Your Learning 3•3 minutes
- Module 7- Assess Your Learning 4•3 minutes
Instructor

Offered by

Offered by

Founded in 1898, Northeastern is a global research university with a distinctive, experience-driven approach to education and discovery. The university is a leader in experiential learning, powered by the world’s most far-reaching cooperative education program. The spirit of collaboration guides a use-inspired research enterprise focused on solving global challenges in health, security, and sustainability.
Why people choose Coursera for their career

Felipe M.

Jennifer J.

Larry W.

Chaitanya A.
Explore more from Data Science
AAlberta Machine Intelligence Institute
Course
UUniversity of Colorado Boulder
Course

Course

Course