Lorsque vous vous inscrivez à ce cours, vous êtes également inscrit(e) à cette Spécialisation.
Apprenez de nouveaux concepts auprès d'experts du secteur
Acquérez une compréhension de base d'un sujet ou d'un outil
Développez des compétences professionnelles avec des projets pratiques
Obtenez un certificat professionnel partageable
Il y a 4 modules dans ce cours
Generative AI for Audio and Images: Models and Applications offers an in-depth exploration of how modern generative models such as Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Transformers, and Diffusion models are used to create, manipulate, and enhance audio, image, and video content.
Learners examine the architectures, training processes, and use cases of these models across different modalities, gaining both conceptual understanding and practical insights through hands-on activities. The course also highlights the ethical and societal implications of generative AI, including bias, transparency, intellectual property, and the challenges of deepfake technologies.
By covering foundational theory as well as state-of-the-art approaches and applications, this course prepares learners to apply and develop generative AI creatively and responsibly for the audio and image modalities.
By the end of this course, learners will be able to:
Outline core concepts, challenges, and the history of AI-generated audio.
Analyze important foundational audio generation models, such as variational and vector quantized autoencoders (VAE and VQ-VAE)
Examine how these models integrate with the latest GenAI technologies to form hybrid, state-of-the-art transformer and diffusion-based audio generation systems,
Study the architecture and functionality of Generative Adversarial Networks (GANs), and their variations.
Implement and train GAN models for creating and enhancing visual content,
Explore cutting-edge techniques such as diffusion models and transformers for image and video creation.
Discuss the ethical considerations regarding generative AI for audio and images.
This module introduces the foundations and core concepts of AI-generated audio. Learners explore why audio generation is uniquely challenging, such representation and evaluation challenges. They learn how audio is represented and processed, compare waveform and symbolic formats, and common audio data formats and Python libraries for working with audio. The module also examines methods for evaluating generated audio and provides a framework for categorizing audio generation approaches by their functionality and human–AI collaboration level. It concludes with a historical overview of AI-generated audio, tracing its evolution from early rule-based methods to modern deep generative models.
Inclus
21 vidéos3 lectures4 devoirs2 sujets de discussion
Afficher les informations sur le contenu du module
21 vidéos•Total 135 minutes
Course Introduction•6 minutes
Meet your instructor: Anahita Doosti•1 minute
Meet your instructor: Nasimeh Asgarian•1 minute
Overview of AI for Audio and Music Generation•7 minutes
Why Is Audio Generation Difficult?•9 minutes
Data representation: Waveform vs Symbolic•8 minutes
Early ML Approaches: HMMs, FF Neural Networks•7 minutes
Modern Approaches 1: RNNs and CNNs•10 minutes
Modern Approaches 2: Autoencoders/VAEs and GANs•6 minutes
Modern Approaches 3: Transformers and Diffusion•9 minutes
Module 1 Recap•2 minutes
3 lectures•Total 140 minutes
Terminology•10 minutes
Python Libraries for Audio Data•10 minutes
WaveNet Implementation (Hands-on Lab)•120 minutes
4 devoirs•Total 145 minutes
Module 1 Quiz•80 minutes
Practice Quiz 1•30 minutes
Practice Quiz 2•20 minutes
Practice Quiz 3•15 minutes
2 sujets de discussion•Total 20 minutes
Learning Goal•10 minutes
Is AI even capable of achieving true creativity?•10 minutes
Advanced audio generation with Generative AI
Module 2•8 heures à terminer
Détails du module
Building on the fundamentals, this module dives into advanced models for audio generation. Learners study Variational Autoencoders (VAEs) and their variants, and how they apply to melody generation and speech synthesis. The module also explores transformer-based models, such as Music Transformer, AudioLM, and FastSpeech, as well as diffusion-based models like DiffWave and Stable Audio. Through these lessons, learners gain a comprehensive understanding of how modern generative architectures produce realistic, high-quality audio and music.
Inclus
31 vidéos2 lectures4 devoirs
Afficher les informations sur le contenu du module
31 vidéos•Total 202 minutes
Introduction to Variational Autoencoders•4 minutes
How to Condition VAEs with Additional Musical Information Such as Chord, Scale?•7 minutes
Example: MusicVAE•8 minutes
Attribute Vector Arithmetic for Melodies •8 minutes
Example: Jukebox•6 minutes
Example: Speech Synthesis•8 minutes
Strengths and limitations of VAE-based approaches•5 minutes
Transformer Primer•6 minutes
Transformers for Audio Generation•6 minutes
Example: Music Transformer•13 minutes
Revisiting JukeBox: How Transformers Can Generate Waveform Audio! (Part 1)•9 minutes
Revisiting JukeBox: How Transformers Can Generate Waveform Audio! (Part 2)•4 minutes
A New Paradigm: Audio Codec + Language Model (Part 1)•6 minutes
A New Paradigm: Audio Codec + Language Model (Part 2)•8 minutes
Example: FastSpeech•8 minutes
Strengths and Limitations of Transformer-Based Approaches•5 minutes
What Are Diffusion Models, and How Can They Generate Audio?•5 minutes
Example: Stable Audio•6 minutes
Example: DiffWave•5 minutes
Strengths and Limitations of Diffusion-Based Approaches•5 minutes
How Do the Recent Models Compare to Each Other?•9 minutes
What Is on the Horizon? Where Are We Headed?•7 minutes
Module 2 Recap•3 minutes
2 lectures•Total 130 minutes
Resource Guide•10 minutes
Audio Generation Models Inference and Comparison (Hands-on Lab)•120 minutes
4 devoirs•Total 125 minutes
Module 2 Quiz•80 minutes
Practice Quiz•15 minutes
Practice Quiz•15 minutes
Practice Quiz•15 minutes
Introduction to Generative Image Models
Module 3•7 heures à terminer
Détails du module
This module transitions from audio to image generation, introducing the principles and evolution of image and video synthesis. Learners examine key architectures like GANs and VAEs, explore how adversarial training works, and study variations such as Conditional and Progressive GANs, Pix2Pix, and CycleGAN. The module also connects theory to practice by showcasing creative and commercial applications—from art and design to data augmentation—demonstrating how generative models enhance realism and variety in visual outputs.
Inclus
22 vidéos3 lectures5 devoirs
Afficher les informations sur le contenu du module
22 vidéos•Total 156 minutes
Overview of AI for Image and Video Generation•8 minutes
Applications of Image and Video Generation•8 minutes
DALL-E and MidJourney Examples•8 minutes
Sora Examples•5 minutes
A Short History of Image Generation•8 minutes
Revisit VAE•6 minutes
Introducing GAN•8 minutes
Discriminator•7 minutes
Generator•9 minutes
GAN Training•6 minutes
Challenges and Best Practices for GAN Training•6 minutes
Progressive GAN•8 minutes
Conditional GANs•8 minutes
Applications, Advantages and Limitations of cGANs•7 minutes
Image-to-Image Translation•7 minutes
Challenges and Applications of Image-to-Image Translation•5 minutes
Text to Image GAN•9 minutes
Other GAN Variations: Cycle GAN, DCGAN, StyleGAN•10 minutes
Creative design•9 minutes
Commercial Use Cases•7 minutes
Data Augmentation•7 minutes
Module 3 Recap•2 minutes
3 lectures•Total 140 minutes
Style GAN•10 minutes
Data synthesis•10 minutes
DCGAN from Scratch (Hands-on Lab)•120 minutes
5 devoirs•Total 140 minutes
Module 3 Quiz•80 minutes
Practice Quiz 1•15 minutes
Practice Quiz 2•15 minutes
Practice Quiz 3•15 minutes
Practice Quiz 4•15 minutes
Advanced Image and Video Generation with Generative AI
Module 4•7 heures à terminer
Détails du module
In this module,we explore the final stages of what large language models (LLMs) can offer. You’ll learn how and when to use fine-tuning, along with the pros and cons of different approaches. Throughout the course, you will receive relevant assignments that prepare you for the capstone project: building a fully functional chatbot
Inclus
21 vidéos1 lecture4 devoirs
Afficher les informations sur le contenu du module
21 vidéos•Total 146 minutes
Overview on Key Models and Architectures•8 minutes
High-Level Overview of Vision Transformer•8 minutes
Encoder-Decoder Design Pattern•9 minutes
Convolutional Encoders•10 minutes
Self Attention•9 minutes
Spatial vs. Channel vs. Temporal Attention•8 minutes
Diffusion Model Architecture High-Level Overview•7 minutes
Forward / Diffusion Process•7 minutes
Reverse Process•7 minutes
Diffusion Model Training•5 minutes
Examples of Diffusion Model•6 minutes
Bias in Training Data•8 minutes
Transparency•9 minutes
Intellectual Property•8 minutes
Data Privacy•7 minutes
Deepfake Intro•9 minutes
Deep Fake - Face Swap•5 minutes
Voice Cloning•4 minutes
Video Deep Fake•6 minutes
Module 4 Recap•2 minutes
Course Wrap Up•3 minutes
1 lecture•Total 120 minutes
ViT vs. Diffusion (Hands-on Lab)•120 minutes
4 devoirs•Total 158 minutes
Module 4 Quiz•80 minutes
Practice Quiz 1•30 minutes
Practice Quiz 2•30 minutes
Practice Quiz 3•18 minutes
Obtenez un certificat professionnel
Ajoutez ce titre à votre profil LinkedIn, à votre curriculum vitae ou à votre CV. Partagez-le sur les médias sociaux et dans votre évaluation des performances.
The Alberta Machine Intelligence Institute (Amii) is home to some of the world’s top talent in machine intelligence. We’re an Alberta-based
research institute that pushes the bounds of academic knowledge and guides business understanding of artificial intelligence and machine learning.
Pour quelles raisons les étudiants sur Coursera nous choisissent-ils pour leur carrière ?
Felipe M.
Étudiant(e) depuis 2018
’Pouvoir suivre des cours à mon rythme à été une expérience extraordinaire. Je peux apprendre chaque fois que mon emploi du temps me le permet et en fonction de mon humeur.’
Jennifer J.
Étudiant(e) depuis 2020
’J'ai directement appliqué les concepts et les compétences que j'ai appris de mes cours à un nouveau projet passionnant au travail.’
Larry W.
Étudiant(e) depuis 2021
’Lorsque j'ai besoin de cours sur des sujets que mon université ne propose pas, Coursera est l'un des meilleurs endroits où se rendre.’
Chaitanya A.
’Apprendre, ce n'est pas seulement s'améliorer dans son travail : c'est bien plus que cela. Coursera me permet d'apprendre sans limites.’
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I subscribe to this Specialization?
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Is financial aid available?
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.