Programming Generative AI: Unit 3

Programming Generative AI: Unit 3

This course is part of Programming Generative AI Specialization

Instructor: Pearson

Access provided by EMAAR MISR

1 module

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

8 hours to complete

Flexible schedule

Learn at your own pace

1 module

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

8 hours to complete

Flexible schedule

Learn at your own pace

What you'll learn

Understand and implement multimodal models that integrate images and text for advanced AI applications.
Build and optimize semantic image search engines using contrastive language-image pre-training.
Master the principles and practicalities of latent diffusion and stable diffusion for text-to-image generation.
Adapt, fine-tune, and efficiently evaluate pre-trained generative models for new tasks, styles, and real-time performance.

Skills you'll gain

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

3 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is part of the Programming Generative AI Specialization

When you enroll in this course, you'll also be enrolled in this Specialization.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

There is 1 module in this course

Unlock the full potential of generative AI with our advanced course module focused on state-of-the-art multimodal models. This course is designed for learners eager to bridge the gap between images and text, and to master the latest techniques in AI-driven content generation. You’ll begin by exploring the foundational concepts behind multimodal models, learning how contrastive language-image pre-training enables seamless integration of visual and textual data. Discover how these models power innovative applications like semantic image search, allowing you to query image content without manual labeling. Dive deeper into the mechanics of latent diffusion models and unravel the inner workings of stable diffusion, gaining the skills to transform text prompts into entirely new, never-before-seen images. The course also covers essential strategies for evaluating generative models and introduces efficient methods for fine-tuning and adapting pre-trained models to new styles and subjects. By the end, you’ll be equipped to build, adapt, and optimize cutting-edge text-to-image systems—ready to innovate in creative, research, or commercial settings.

This module delves into multimodal generative AI, focusing on models that connect images and text. Learners explore contrastive language-image pre-training for semantic image search and uncover the workings of latent diffusion and stable diffusion for text-to-image generation. The module then covers evaluation of generative models, parameter-efficient fine-tuning, and techniques to teach pre-trained models new styles and subjects. It concludes with methods to optimize diffusion models for faster, near real-time image generation, equipping students with both conceptual understanding and practical skills in advanced multimodal AI systems.

What's included

44 videos3 assignments

44 videosTotal 407 minutes

Topics0 minutes
Components of a Multimodal Model5 minutes
Vision-Language Understanding9 minutes
Contrastive Language-Image Pretraining6 minutes
Embedding Text and Images with CLIP14 minutes
Zero-Shot Image Classification with CLIP3 minutes
Semantic Image Search with CLIP10 minutes
Conditional Generative Models5 minutes
Introduction to Latent Diffusion Models8 minutes
The Latent Diffusion Model Architecture5 minutes
Failure Modes and Additional Tools6 minutes
Stable Diffusion Deconstructed11 minutes
Writing Our Own Stable Diffusion Pipeline11 minutes
Decoding Images from the Stable Diffusion Latent Space4 minutes
Improving Generation with Guidance9 minutes
Playing with Prompts30 minutes
Topics0 minutes
Methods and Metrics for Evaluating Generative AI7 minutes
Manual Evaluation of Stable Diffusion with DrawBench13 minutes
Quantitative Evaluation of Diffusion Models with Human Preference Predictors20 minutes
Overview of Methods for Fine-Tuning Diffusion Models9 minutes
Sourcing and Preparing Image Datasets for Fine-Tuning7 minutes
Generating Automatic Captions with BLIP-28 minutes
Parameter Efficient Fine-Tuning with LoRA11 minutes
Inspecting the Results of Fine-Tuning5 minutes
Inference with LoRAs for Style-Specific Generation12 minutes
Conceptual Overview of Textual Inversion8 minutes
Subject-Specific Personalization with Dreambooth7 minutes
Dreambooth versus LoRA Fine-Tuning6 minutes
Dreambooth Fine-Tuning with Hugging Face14 minutes
Inference with Dreambooth to Create Personalized AI Avatars14 minutes
Adding Conditional Control to Text-to-Image Diffusion Models4 minutes
Creating Edge and Depth Maps for Conditioning15 minutes
Depth and Edge-Guided Stable Diffusion with ControlNet17 minutes
Understanding and Experimenting with ControlNet Parameters8 minutes
Generative Text Effects with Font Depth Maps2 minutes
Few Step Generation with Adversarial Diffusion Distillation (ADD)7 minutes
Reasons to Distill6 minutes
Comparing SDXL and SDXL Turbo11 minutes
Text-Guided Image-to-Image Translation16 minutes
Video-Driven Frame-by-Frame Generation with SDXL Turbo13 minutes
Near Real-Time Inference with PyTorch Performance Optimizations11 minutes
Programming Generative AI: Summary1 minute
Course Summary1 minute