Wenn Sie sich für diesen Kurs anmelden, werden Sie auch für diese Spezialisierung angemeldet.
Lernen Sie neue Konzepte von Branchenexperten
Gewinnen Sie ein Grundverständnis bestimmter Themen oder Tools
Erwerben Sie berufsrelevante Kompetenzen durch praktische Projekte
Erwerben Sie ein Berufszertifikat zur Vorlage
In diesem Kurs gibt es 4 Module
Step into the frontier of artificial intelligence with this advanced course designed to explore the latest models powering visual and multimodal intelligence. From foundational mathematical tools to state-of-the-art architectures, you'll gain the skills to understand and build systems that interpret images, text, and more—just like today’s leading AI models.
You'll begin by discovering how Nonlinear Support Vector Machines (NSVMs) and Fourier transforms lay the groundwork for signal processing and pattern recognition in visual data. You'll then build a strong foundation in probabilistic reasoning and temporal modeling with RNNs, enabling AI systems to understand sequences and context. After, you'll learn how transformer architectures revolutionize both language and vision tasks. Finally, you'll dive into multimodal learning with CLIP, which connects images and text, and explore diffusion models that generate high-fidelity images through iterative refinement.
This course is ideal for learners who want to go beyond traditional deep learning and explore the models shaping the future of AI. With a blend of theory, code, and real-world applications, you'll be equipped to tackle cutting-edge challenges in computer vision and multimodal AI.
This course can be taken for academic credit as part of CU Boulder’s MS in Data Science or MS in Computer Science degrees offered on the Coursera platform. These fully accredited graduate degrees offer targeted courses, short 8-week sessions, and pay-as-you-go tuition. Admission is based on performance in three preliminary courses, not academic history. CU degrees on Coursera are ideal for recent graduates or working professionals. Learn more:
MS in Data Science: https://www.coursera.org/degrees/master-of-science-data-science-boulder
MS in Computer Science: https://coursera.org/degrees/ms-computer-science-boulder
Welcome to Modern AI Models for Vision and Multimodal Understanding, the third course in the Computer Vision specialization. In this first module, you’ll explore foundational mathematical tools used in modern AI models for vision and multimodal understanding. You’ll begin with Support Vector Machines (SVMs), learning how linear and radial basis function (RBF) kernels define decision boundaries and how support vectors influence classification. Then, you’ll dive into the Fourier Transform, starting with 1D signals and progressing to 2D applications. You’ll learn how to move between time/spatial and frequency domains using the Discrete Fourier Transform (DFT) and its inverse, and how these transformations reveal patterns and structures in data. By the end of this module, you’ll understand how SVMs and Fourier analysis contribute to feature extraction, signal decomposition, and model interpretability in AI systems.
Das ist alles enthalten
14 Videos8 Lektüren5 Aufgaben
Infos zu Modulinhalt anzeigen
14 Videos•Insgesamt 85 Minuten
Meet Your Instructor •3 Minuten
Linear SVM•11 Minuten
Visualize Linear•8 Minuten
Radial Basis Function (RBF)•6 Minuten
RBF Kernel•4 Minuten
Visualize a RBF SVM•10 Minuten
1D DFT•6 Minuten
1D Inverse DFT •7 Minuten
1D Basic Functions•5 Minuten
Frequency and Time•6 Minuten
2D DFT•7 Minuten
2D Inverse DFT•3 Minuten
2D Basic Functions•5 Minuten
Frequency and Spatial •4 Minuten
8 Lektüren•Insgesamt 50 Minuten
Course Updates and Accessibility Support•1 Minute
Earn Academic Credit for your Work!•10 Minuten
Course Support•10 Minuten
Inside the Course•5 Minuten
Assessment Expectations•10 Minuten
AI Citation and Acknowledgement•10 Minuten
Get the Workbook: SVM•2 Minuten
Get the Workbook: Fourier 1D & 2D•2 Minuten
5 Aufgaben•Insgesamt 80 Minuten
AI Policy Quiz•5 Minuten
SMV and Fourier•30 Minuten
Support Vector Machine (SVM)•15 Minuten
Fourier 1D•15 Minuten
Fourier 2D•15 Minuten
Probability and RNN
Modul 2•4 Stunden abzuschließen
Moduldetails
This module invites you to explore how probability theory and sequential modeling power modern AI systems. You’ll begin by examining how conditional and joint probabilities shape predictions in language and image models, and how the chain rule enables structured generative processes. Then, you’ll transition to recurrent neural networks (RNNs), learning how they handle sequential data through hidden states and feedback loops. You’ll compare RNNs to feedforward models, explore architectures like one-to-many and sequence-to-sequence, and address challenges like vanishing gradients. By the end, you’ll understand how probabilistic reasoning and temporal modeling combine to support tasks ranging from text generation to autoregressive image synthesis.
Das ist alles enthalten
15 Videos2 Lektüren5 Aufgaben
Infos zu Modulinhalt anzeigen
15 Videos•Insgesamt 123 Minuten
Probability in Language Models •10 Minuten
Conditional Probabilities •9 Minuten
The Chain Rule of Probabilities•11 Minuten
Calculating Joint Probabilities •12 Minuten
Pixel-Base Image Models•13 Minuten
Autoregressive Image Model•16 Minuten
Attention Mechanisms in Transformer Models•14 Minuten
Batch vs Recurrent•4 Minuten
MLP vs RNN•12 Minuten
Many to One•4 Minuten
One to Many•2 Minuten
One to One•6 Minuten
Sequence to Sequence•2 Minuten
Deep RNN•5 Minuten
Autoregressive RNN•3 Minuten
2 Lektüren•Insgesamt 4 Minuten
Get the Workbook: Probability•2 Minuten
Get the Workbook: RNN•2 Minuten
5 Aufgaben•Insgesamt 90 Minuten
Probability and RNN•30 Minuten
Probability Part One•15 Minuten
Probability Part Two•15 Minuten
RNN Part One•15 Minuten
RNN Part Two•15 Minuten
Transformer and ViT
Modul 3•3 Stunden abzuschließen
Moduldetails
This module explores how attention-based architectures have reshaped the landscape of deep learning for both language and vision. You’ll begin by unpacking the mechanics of the Transformer, including self-attention, multi-head attention, and the encoder-decoder structure that enables parallel sequence modeling. Then, you’ll transition to Vision Transformers (ViTs), where images are tokenized and processed using the same principles that revolutionized NLP. Along the way, you’ll examine how normalization, positional encoding, and projection layers contribute to model performance. By the end, you’ll understand how Transformers and ViTs unify sequence and spatial reasoning in modern AI systems.
Das ist alles enthalten
15 Videos2 Lektüren5 Aufgaben
Infos zu Modulinhalt anzeigen
15 Videos•Insgesamt 81 Minuten
Batch vs Recurrent vs Attention•7 Minuten
Attention + MLP•5 Minuten
Dot-Product Self-Attention•4 Minuten
QKV Self-Attention•4 Minuten
Transformer Encoder•4 Minuten
Self vs Cross Attention•5 Minuten
Encoder and Decoder for Transformer•7 Minuten
Decoder Output Layer•3 Minuten
Image to Tokens•11 Minuten
Normalization for ViT•4 Minuten
Self-Attention for ViT•6 Minuten
Multi-Head Attention•9 Minuten
MLP Forward Feed•4 Minuten
ViT Output Layer•5 Minuten
Loss Gradient for ViT•4 Minuten
2 Lektüren•Insgesamt 4 Minuten
Get the Workbook: Transformer•2 Minuten
Get the Workbook: ViT•2 Minuten
5 Aufgaben•Insgesamt 90 Minuten
Transformer and ViT•30 Minuten
Transformer Part One•15 Minuten
Transformer Part Two•15 Minuten
ViT Part One•15 Minuten
ViT Part Two•15 Minuten
CLIP and Diffusion
Modul 4•3 Stunden abzuschließen
Moduldetails
In this module, you’ll explore two transformative approaches in multimodal and generative AI. First, you’ll dive into CLIP, a model that learns a shared embedding space for images and text using contrastive pre-training. You’ll see how CLIP enables zero-shot classification by comparing image embeddings to textual descriptions, without needing labeled training data. Then, you’ll shift to diffusion models, which generate images through a gradual denoising process. You’ll learn how noise prediction, time conditioning, and reverse diffusion combine to produce high-quality samples. This module highlights how foundational models can bridge modalities and synthesize data with remarkable flexibility.
Das ist alles enthalten
11 Videos2 Lektüren4 Aufgaben
Infos zu Modulinhalt anzeigen
11 Videos•Insgesamt 75 Minuten
Batch of Pairs•6 Minuten
Image Encoder (Batch)•6 Minuten
Text Encoder (Batch)•10 Minuten
Joint Embedding•5 Minuten
Contrastive Pre-Training•13 Minuten
Zero-Shot Image Classifier•6 Minuten
Zero-Shot Image Prediction•7 Minuten
Diffusion Introduction•5 Minuten
Noise Prediction•6 Minuten
Time Conditioning and Parallel Training•5 Minuten
Reverse Diffusion•6 Minuten
2 Lektüren•Insgesamt 4 Minuten
Get the Workbook: CLIP•2 Minuten
Get the Workbook: Diffusion•2 Minuten
4 Aufgaben•Insgesamt 75 Minuten
CLIP and Diffusion•30 Minuten
CLIP Part One•15 Minuten
CLIP Part Two•15 Minuten
Diffusion•15 Minuten
Erwerben Sie ein Karrierezertifikat.
Fügen Sie dieses Zeugnis Ihrem LinkedIn-Profil, Lebenslauf oder CV hinzu. Teilen Sie sie in Social Media und in Ihrer Leistungsbeurteilung.
Auf einen Abschluss hinarbeiten
Dieses Kurs ist Teil des/der folgenden Studiengangs/Studiengänge, die von University of Colorado Boulderangeboten werden. Wenn Sie zugelassen werden und sich immatrikulieren, können Ihre abgeschlossenen Kurse auf Ihren Studienabschluss angerechnet werden und Ihre Fortschritte können mit Ihnen übertragen werden.¹
Mögliche Abschüsse anzeigen
Auf einen Abschluss hinarbeiten
Dieses Kurs ist Teil des/der folgenden Studiengangs/Studiengänge, die von University of Colorado Boulderangeboten werden. Wenn Sie zugelassen werden und sich immatrikulieren, können Ihre abgeschlossenen Kurse auf Ihren Studienabschluss angerechnet werden und Ihre Fortschritte können mit Ihnen übertragen werden.¹
¹Erfolgreiche Bewerbung und Einschreibung sind erforderlich. Es gelten die Zulassungsbedingungen. Jede Einrichtung legt die Anzahl der Credits fest, die durch die Absolvierung dieser Inhalte anerkannt werden und auf die Abschlussanforderungen angerechnet werden können, wobei bereits vorhandene Credits berücksichtigt werden. Klicken Sie auf einen bestimmten Kurs, um weitere Informationen zu erhalten.
OK
Dozent
Lehrkraftbewertungen
Lehrkraftbewertungen
Wir haben alle Lernenden um Feedback zu unseren Dozenten gebeten, ausgehend von der Qualität ihres Unterrichtsstils.
CU Boulder is a dynamic community of scholars and learners on one of the most spectacular college campuses in the country. As one of 34 U.S. public institutions in the prestigious Association of American Universities (AAU), we have a proud tradition of academic excellence, with five Nobel laureates and more than 50 members of prestigious academic academies.
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I subscribe to this Specialization?
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Is financial aid available?
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.