Wenn Sie sich für diesen Kurs anmelden, werden Sie auch für diese Spezialisierung angemeldet.
Lernen Sie neue Konzepte von Branchenexperten
Gewinnen Sie ein Grundverständnis bestimmter Themen oder Tools
Erwerben Sie berufsrelevante Kompetenzen durch praktische Projekte
Erwerben Sie ein Berufszertifikat zur Vorlage
In diesem Kurs gibt es 4 Module
Generative AI for Audio and Images: Models and Applications offers an in-depth exploration of how modern generative models such as Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Transformers, and Diffusion models are used to create, manipulate, and enhance audio, image, and video content.
Learners examine the architectures, training processes, and use cases of these models across different modalities, gaining both conceptual understanding and practical insights through hands-on activities. The course also highlights the ethical and societal implications of generative AI, including bias, transparency, intellectual property, and the challenges of deepfake technologies.
By covering foundational theory as well as state-of-the-art approaches and applications, this course prepares learners to apply and develop generative AI creatively and responsibly for the audio and image modalities.
By the end of this course, learners will be able to:
Outline core concepts, challenges, and the history of AI-generated audio.
Analyze important foundational audio generation models, such as variational and vector quantized autoencoders (VAE and VQ-VAE)
Examine how these models integrate with the latest GenAI technologies to form hybrid, state-of-the-art transformer and diffusion-based audio generation systems,
Study the architecture and functionality of Generative Adversarial Networks (GANs), and their variations.
Implement and train GAN models for creating and enhancing visual content,
Explore cutting-edge techniques such as diffusion models and transformers for image and video creation.
Discuss the ethical considerations regarding generative AI for audio and images.
This module introduces the foundations and core concepts of AI-generated audio. Learners explore why audio generation is uniquely challenging, such representation and evaluation challenges. They learn how audio is represented and processed, compare waveform and symbolic formats, and common audio data formats and Python libraries for working with audio. The module also examines methods for evaluating generated audio and provides a framework for categorizing audio generation approaches by their functionality and human–AI collaboration level. It concludes with a historical overview of AI-generated audio, tracing its evolution from early rule-based methods to modern deep generative models.
Das ist alles enthalten
21 Videos3 Lektüren4 Aufgaben2 Diskussionsthemen
Infos zu Modulinhalt anzeigen
21 Videos•Insgesamt 135 Minuten
Course Introduction•6 Minuten
Meet your instructor: Anahita Doosti•1 Minute
Meet your instructor: Nasimeh Asgarian•1 Minute
Overview of AI for Audio and Music Generation•7 Minuten
Why Is Audio Generation Difficult?•9 Minuten
Data representation: Waveform vs Symbolic•8 Minuten
Early ML Approaches: HMMs, FF Neural Networks•7 Minuten
Modern Approaches 1: RNNs and CNNs•10 Minuten
Modern Approaches 2: Autoencoders/VAEs and GANs•6 Minuten
Modern Approaches 3: Transformers and Diffusion•9 Minuten
Module 1 Recap•2 Minuten
3 Lektüren•Insgesamt 140 Minuten
Terminology•10 Minuten
Python Libraries for Audio Data•10 Minuten
WaveNet Implementation (Hands-on Lab)•120 Minuten
4 Aufgaben•Insgesamt 145 Minuten
Module 1 Quiz•80 Minuten
Practice Quiz 1•30 Minuten
Practice Quiz 2•20 Minuten
Practice Quiz 3•15 Minuten
2 Diskussionsthemen•Insgesamt 20 Minuten
Learning Goal•10 Minuten
Is AI even capable of achieving true creativity?•10 Minuten
Advanced audio generation with Generative AI
Modul 2•8 Stunden abzuschließen
Moduldetails
Building on the fundamentals, this module dives into advanced models for audio generation. Learners study Variational Autoencoders (VAEs) and their variants, and how they apply to melody generation and speech synthesis. The module also explores transformer-based models, such as Music Transformer, AudioLM, and FastSpeech, as well as diffusion-based models like DiffWave and Stable Audio. Through these lessons, learners gain a comprehensive understanding of how modern generative architectures produce realistic, high-quality audio and music.
Das ist alles enthalten
31 Videos2 Lektüren4 Aufgaben
Infos zu Modulinhalt anzeigen
31 Videos•Insgesamt 202 Minuten
Introduction to Variational Autoencoders•4 Minuten
How to Condition VAEs with Additional Musical Information Such as Chord, Scale?•7 Minuten
Example: MusicVAE•8 Minuten
Attribute Vector Arithmetic for Melodies •8 Minuten
Example: Jukebox•6 Minuten
Example: Speech Synthesis•8 Minuten
Strengths and limitations of VAE-based approaches•5 Minuten
Transformer Primer•6 Minuten
Transformers for Audio Generation•6 Minuten
Example: Music Transformer•13 Minuten
Revisiting JukeBox: How Transformers Can Generate Waveform Audio! (Part 1)•9 Minuten
Revisiting JukeBox: How Transformers Can Generate Waveform Audio! (Part 2)•4 Minuten
A New Paradigm: Audio Codec + Language Model (Part 1)•6 Minuten
A New Paradigm: Audio Codec + Language Model (Part 2)•8 Minuten
Example: FastSpeech•8 Minuten
Strengths and Limitations of Transformer-Based Approaches•5 Minuten
What Are Diffusion Models, and How Can They Generate Audio?•5 Minuten
Example: Stable Audio•6 Minuten
Example: DiffWave•5 Minuten
Strengths and Limitations of Diffusion-Based Approaches•5 Minuten
How Do the Recent Models Compare to Each Other?•9 Minuten
What Is on the Horizon? Where Are We Headed?•7 Minuten
Module 2 Recap•3 Minuten
2 Lektüren•Insgesamt 130 Minuten
Resource Guide•10 Minuten
Audio Generation Models Inference and Comparison (Hands-on Lab)•120 Minuten
4 Aufgaben•Insgesamt 125 Minuten
Module 2 Quiz•80 Minuten
Practice Quiz•15 Minuten
Practice Quiz•15 Minuten
Practice Quiz•15 Minuten
Introduction to Generative Image Models
Modul 3•7 Stunden abzuschließen
Moduldetails
This module transitions from audio to image generation, introducing the principles and evolution of image and video synthesis. Learners examine key architectures like GANs and VAEs, explore how adversarial training works, and study variations such as Conditional and Progressive GANs, Pix2Pix, and CycleGAN. The module also connects theory to practice by showcasing creative and commercial applications—from art and design to data augmentation—demonstrating how generative models enhance realism and variety in visual outputs.
Das ist alles enthalten
22 Videos3 Lektüren5 Aufgaben
Infos zu Modulinhalt anzeigen
22 Videos•Insgesamt 156 Minuten
Overview of AI for Image and Video Generation•8 Minuten
Applications of Image and Video Generation•8 Minuten
DALL-E and MidJourney Examples•8 Minuten
Sora Examples•5 Minuten
A Short History of Image Generation•8 Minuten
Revisit VAE•6 Minuten
Introducing GAN•8 Minuten
Discriminator•7 Minuten
Generator•9 Minuten
GAN Training•6 Minuten
Challenges and Best Practices for GAN Training•6 Minuten
Progressive GAN•8 Minuten
Conditional GANs•8 Minuten
Applications, Advantages and Limitations of cGANs•7 Minuten
Image-to-Image Translation•7 Minuten
Challenges and Applications of Image-to-Image Translation•5 Minuten
Text to Image GAN•9 Minuten
Other GAN Variations: Cycle GAN, DCGAN, StyleGAN•10 Minuten
Creative design•9 Minuten
Commercial Use Cases•7 Minuten
Data Augmentation•7 Minuten
Module 3 Recap•2 Minuten
3 Lektüren•Insgesamt 140 Minuten
Style GAN•10 Minuten
Data synthesis•10 Minuten
DCGAN from Scratch (Hands-on Lab)•120 Minuten
5 Aufgaben•Insgesamt 140 Minuten
Module 3 Quiz•80 Minuten
Practice Quiz 1•15 Minuten
Practice Quiz 2•15 Minuten
Practice Quiz 3•15 Minuten
Practice Quiz 4•15 Minuten
Advanced Image and Video Generation with Generative AI
Modul 4•7 Stunden abzuschließen
Moduldetails
In this module,we explore the final stages of what large language models (LLMs) can offer. You’ll learn how and when to use fine-tuning, along with the pros and cons of different approaches. Throughout the course, you will receive relevant assignments that prepare you for the capstone project: building a fully functional chatbot
Das ist alles enthalten
21 Videos1 Lektüre4 Aufgaben
Infos zu Modulinhalt anzeigen
21 Videos•Insgesamt 146 Minuten
Overview on Key Models and Architectures•8 Minuten
High-Level Overview of Vision Transformer•8 Minuten
Encoder-Decoder Design Pattern•9 Minuten
Convolutional Encoders•10 Minuten
Self Attention•9 Minuten
Spatial vs. Channel vs. Temporal Attention•8 Minuten
Diffusion Model Architecture High-Level Overview•7 Minuten
Forward / Diffusion Process•7 Minuten
Reverse Process•7 Minuten
Diffusion Model Training•5 Minuten
Examples of Diffusion Model•6 Minuten
Bias in Training Data•8 Minuten
Transparency•9 Minuten
Intellectual Property•8 Minuten
Data Privacy•7 Minuten
Deepfake Intro•9 Minuten
Deep Fake - Face Swap•5 Minuten
Voice Cloning•4 Minuten
Video Deep Fake•6 Minuten
Module 4 Recap•2 Minuten
Course Wrap Up•3 Minuten
1 Lektüre•Insgesamt 120 Minuten
ViT vs. Diffusion (Hands-on Lab)•120 Minuten
4 Aufgaben•Insgesamt 158 Minuten
Module 4 Quiz•80 Minuten
Practice Quiz 1•30 Minuten
Practice Quiz 2•30 Minuten
Practice Quiz 3•18 Minuten
Erwerben Sie ein Karrierezertifikat.
Fügen Sie dieses Zeugnis Ihrem LinkedIn-Profil, Lebenslauf oder CV hinzu. Teilen Sie sie in Social Media und in Ihrer Leistungsbeurteilung.
The Alberta Machine Intelligence Institute (Amii) is home to some of the world’s top talent in machine intelligence. We’re an Alberta-based
research institute that pushes the bounds of academic knowledge and guides business understanding of artificial intelligence and machine learning.
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I subscribe to this Specialization?
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Is financial aid available?
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.