Modern AI Models for Vision and Multimodal Understanding

kurs ist nicht verfügbar in Deutsch (Deutschland)

Wir übersetzen es in weitere Sprachen. Sehen Sie sich die Sprachen an, die wir anbieten.

Modern AI Models for Vision and Multimodal Understanding

Dieser Kurs ist Teil von Spezialisierung „Computer Vision“

Dozent: Tom Yeh

3.741 bereits angemeldet

Bei Mehr erfahren enthalten

Fragen Sie Coursera

4 Module

Verschaffen Sie sich einen Einblick in ein Thema und lernen Sie die Grundlagen.

30 Bewertungen

Stufe Fortgeschritten

Empfohlene Erfahrung

Flexibler Zeitplan

1 Woche bei 10 Stunden eine Woche

In Ihrem eigenen Lerntempo lernen

Auf einen Abschluss hinarbeiten

Mehr erfahren

4 Module

Verschaffen Sie sich einen Einblick in ein Thema und lernen Sie die Grundlagen.

30 Bewertungen

Stufe Fortgeschritten

Empfohlene Erfahrung

Flexibler Zeitplan

1 Woche bei 10 Stunden eine Woche

In Ihrem eigenen Lerntempo lernen

Auf einen Abschluss hinarbeiten

Mehr erfahren

Was Sie lernen werden

Apply Nonlinear Support Vector Machines (NSVMs) and Fourier transforms to analyze and process visual data.
Use probabilistic reasoning and implement Recurrent Neural Networks (RNNs) to model temporal sequences and contextual dependencies in visual data.
Explain the principles of transformer architectures and how Vision Transformers (ViT) perform image classification and visual understanding tasks.
Implement CLIP for multimodal learning, and utilize diffusion models to generate high-fidelity images.

Kompetenzen, die Sie erwerben

Kategorie: Embeddings
Kategorie: Recurrent Neural Networks (RNNs)
Kategorie: Generative Model Architectures
Kategorie: Transfer Learning
Kategorie: Digital Signal Processing
Kategorie: Supervised Learning
Kategorie: Machine Learning Methods

Werkzeuge, die Sie lernen werden

Kategorie: Classification Algorithms
Kategorie: Vision Transformer (ViT)

Wichtige Details

Zertifikat zur Vorlage

Zu Ihrem LinkedIn-Profil hinzufügen

Bewertungen

19 Aufgaben

Unterrichtet in Englisch

Erfahren Sie, wie Mitarbeiter führender Unternehmen gefragte Kompetenzen erwerben.

Weitere Informationen zu Coursera für Unternehmen

Logos von Petrobras, TATA, Danone, Capgemini, P&G und L'Oreal

Erweitern Sie Ihre Fachkenntnisse

Dieser Kurs ist Teil der Spezialisierung Spezialisierung „Computer Vision“

Wenn Sie sich für diesen Kurs anmelden, werden Sie auch für diese Spezialisierung angemeldet.

Lernen Sie neue Konzepte von Branchenexperten
Gewinnen Sie ein Grundverständnis bestimmter Themen oder Tools
Erwerben Sie berufsrelevante Kompetenzen durch praktische Projekte
Erwerben Sie ein Berufszertifikat zur Vorlage

In diesem Kurs gibt es 4 Module

Step into the frontier of artificial intelligence with this advanced course designed to explore the latest models powering visual and multimodal intelligence. From foundational mathematical tools to state-of-the-art architectures, you'll gain the skills to understand and build systems that interpret images, text, and more—just like today’s leading AI models.

You'll begin by discovering how Nonlinear Support Vector Machines (NSVMs) and Fourier transforms lay the groundwork for signal processing and pattern recognition in visual data. You'll then build a strong foundation in probabilistic reasoning and temporal modeling with RNNs, enabling AI systems to understand sequences and context. After, you'll learn how transformer architectures revolutionize both language and vision tasks. Finally, you'll dive into multimodal learning with CLIP, which connects images and text, and explore diffusion models that generate high-fidelity images through iterative refinement. This course is ideal for learners who want to go beyond traditional deep learning and explore the models shaping the future of AI. With a blend of theory, code, and real-world applications, you'll be equipped to tackle cutting-edge challenges in computer vision and multimodal AI. This course can be taken for academic credit as part of CU Boulder’s MS in Data Science or MS in Computer Science degrees offered on the Coursera platform. These fully accredited graduate degrees offer targeted courses, short 8-week sessions, and pay-as-you-go tuition. Admission is based on performance in three preliminary courses, not academic history. CU degrees on Coursera are ideal for recent graduates or working professionals. Learn more: MS in Data Science: https://www.coursera.org/degrees/master-of-science-data-science-boulder MS in Computer Science: https://coursera.org/degrees/ms-computer-science-boulder

Welcome to Modern AI Models for Vision and Multimodal Understanding, the third course in the Computer Vision specialization. In this first module, you’ll explore foundational mathematical tools used in modern AI models for vision and multimodal understanding. You’ll begin with Support Vector Machines (SVMs), learning how linear and radial basis function (RBF) kernels define decision boundaries and how support vectors influence classification. Then, you’ll dive into the Fourier Transform, starting with 1D signals and progressing to 2D applications. You’ll learn how to move between time/spatial and frequency domains using the Discrete Fourier Transform (DFT) and its inverse, and how these transformations reveal patterns and structures in data. By the end of this module, you’ll understand how SVMs and Fourier analysis contribute to feature extraction, signal decomposition, and model interpretability in AI systems.

Das ist alles enthalten

14 Videos8 Lektüren5 Aufgaben

14 VideosInsgesamt 85 Minuten

Meet Your Instructor 3 Minuten
Linear SVM11 Minuten
Visualize Linear8 Minuten
Radial Basis Function (RBF)6 Minuten
RBF Kernel4 Minuten
Visualize a RBF SVM10 Minuten
1D DFT6 Minuten
1D Inverse DFT 7 Minuten
1D Basic Functions5 Minuten
Frequency and Time6 Minuten
2D DFT7 Minuten
2D Inverse DFT3 Minuten
2D Basic Functions5 Minuten
Frequency and Spatial 4 Minuten

8 LektürenInsgesamt 50 Minuten

Course Updates and Accessibility Support1 Minute
Earn Academic Credit for your Work!10 Minuten
Course Support10 Minuten
Inside the Course5 Minuten
Assessment Expectations10 Minuten
AI Citation and Acknowledgement10 Minuten
Get the Workbook: SVM2 Minuten
Get the Workbook: Fourier 1D & 2D2 Minuten

5 AufgabenInsgesamt 80 Minuten

Support Vector Machine (SVM)15 Minuten
Fourier 1D15 Minuten
Fourier 2D15 Minuten
AI Policy Quiz5 Minuten
SMV and Fourier30 Minuten

This module invites you to explore how probability theory and sequential modeling power modern AI systems. You’ll begin by examining how conditional and joint probabilities shape predictions in language and image models, and how the chain rule enables structured generative processes. Then, you’ll transition to recurrent neural networks (RNNs), learning how they handle sequential data through hidden states and feedback loops. You’ll compare RNNs to feedforward models, explore architectures like one-to-many and sequence-to-sequence, and address challenges like vanishing gradients. By the end, you’ll understand how probabilistic reasoning and temporal modeling combine to support tasks ranging from text generation to autoregressive image synthesis.

Das ist alles enthalten

15 Videos2 Lektüren5 Aufgaben

15 VideosInsgesamt 123 Minuten

Probability in Language Models 10 Minuten
Conditional Probabilities 9 Minuten
The Chain Rule of Probabilities11 Minuten
Calculating Joint Probabilities 12 Minuten
Pixel-Base Image Models13 Minuten
Autoregressive Image Model16 Minuten
Attention Mechanisms in Transformer Models14 Minuten
Batch vs Recurrent4 Minuten
MLP vs RNN12 Minuten
Many to One4 Minuten
One to Many2 Minuten
One to One6 Minuten
Sequence to Sequence2 Minuten
Deep RNN5 Minuten
Autoregressive RNN3 Minuten

2 LektürenInsgesamt 4 Minuten

Get the Workbook: Probability2 Minuten
Get the Workbook: RNN2 Minuten

5 AufgabenInsgesamt 90 Minuten

Probability Part One15 Minuten
Probability Part Two15 Minuten
RNN Part One15 Minuten
RNN Part Two15 Minuten
Probability and RNN30 Minuten

This module explores how attention-based architectures have reshaped the landscape of deep learning for both language and vision. You’ll begin by unpacking the mechanics of the Transformer, including self-attention, multi-head attention, and the encoder-decoder structure that enables parallel sequence modeling. Then, you’ll transition to Vision Transformers (ViTs), where images are tokenized and processed using the same principles that revolutionized NLP. Along the way, you’ll examine how normalization, positional encoding, and projection layers contribute to model performance. By the end, you’ll understand how Transformers and ViTs unify sequence and spatial reasoning in modern AI systems.

Das ist alles enthalten

15 Videos2 Lektüren5 Aufgaben

15 VideosInsgesamt 81 Minuten

Batch vs Recurrent vs Attention7 Minuten
Attention + MLP5 Minuten
Dot-Product Self-Attention4 Minuten
QKV Self-Attention4 Minuten
Transformer Encoder4 Minuten
Self vs Cross Attention5 Minuten
Encoder and Decoder for Transformer7 Minuten
Decoder Output Layer3 Minuten
Image to Tokens11 Minuten
Normalization for ViT4 Minuten
Self-Attention for ViT6 Minuten
Multi-Head Attention9 Minuten
MLP Forward Feed4 Minuten
ViT Output Layer5 Minuten
Loss Gradient for ViT4 Minuten

2 LektürenInsgesamt 4 Minuten

Get the Workbook: Transformer2 Minuten
Get the Workbook: ViT2 Minuten

5 AufgabenInsgesamt 90 Minuten

Transformer Part One15 Minuten
Transformer Part Two15 Minuten
ViT Part One15 Minuten
ViT Part Two15 Minuten
Transformer and ViT30 Minuten

In this module, you’ll explore two transformative approaches in multimodal and generative AI. First, you’ll dive into CLIP, a model that learns a shared embedding space for images and text using contrastive pre-training. You’ll see how CLIP enables zero-shot classification by comparing image embeddings to textual descriptions, without needing labeled training data. Then, you’ll shift to diffusion models, which generate images through a gradual denoising process. You’ll learn how noise prediction, time conditioning, and reverse diffusion combine to produce high-quality samples. This module highlights how foundational models can bridge modalities and synthesize data with remarkable flexibility.

Das ist alles enthalten

11 Videos2 Lektüren4 Aufgaben

11 VideosInsgesamt 75 Minuten

Batch of Pairs6 Minuten
Image Encoder (Batch)6 Minuten
Text Encoder (Batch)10 Minuten
Joint Embedding5 Minuten
Contrastive Pre-Training13 Minuten
Zero-Shot Image Classifier6 Minuten
Zero-Shot Image Prediction7 Minuten
Diffusion Introduction5 Minuten
Noise Prediction6 Minuten
Time Conditioning and Parallel Training5 Minuten
Reverse Diffusion6 Minuten

2 LektürenInsgesamt 4 Minuten

Get the Workbook: CLIP2 Minuten
Get the Workbook: Diffusion2 Minuten

4 AufgabenInsgesamt 75 Minuten

CLIP Part One15 Minuten
CLIP Part Two15 Minuten
Diffusion15 Minuten
CLIP and Diffusion30 Minuten

Erwerben Sie ein Karrierezertifikat.

Fügen Sie dieses Zeugnis Ihrem LinkedIn-Profil, Lebenslauf oder CV hinzu. Teilen Sie sie in Social Media und in Ihrer Leistungsbeurteilung.

Auf einen Abschluss hinarbeiten

Dieses Kurs ist Teil des/der folgenden Studiengangs/Studiengänge, die von University of Colorado Boulderangeboten werden. Wenn Sie zugelassen werden und sich immatrikulieren, können Ihre abgeschlossenen Kurse auf Ihren Studienabschluss angerechnet werden und Ihre Fortschritte können mit Ihnen übertragen werden.¹

Dozent

Lehrkraftbewertungen

(7 Bewertungen)

Tom Yeh

University of Colorado Boulder

4 Kurse22.359 Lernende

von

University of Colorado Boulder

Mehr von Algorithms entdecken

Status: Kostenloser Testzeitraum
Coursera
Fine-tune Multimodal Models with Transfer Learning
Kurs
Status: Kostenloser Testzeitraum
Coursera
Analyze Multimodal AI for Business Insights
Kurs
Status: Vorschau
University of Colorado Boulder
Deep Learning Applications for Computer Vision
Kurs
Status: Kostenloser Testzeitraum
University of Colorado Boulder
Introduction to Deep Learning
Kurs

Warum entscheiden sich Menschen für Coursera für ihre Karriere?

Felipe M.

Lernender seit 2018

„Es ist eine großartige Erfahrung, in meinem eigenen Tempo zu lernen. Ich kann lernen, wenn ich Zeit und Nerven dazu habe.“

Jennifer J.

Lernender seit 2020

„Bei einem spannenden neuen Projekt konnte ich die neuen Kenntnisse und Kompetenzen aus den Kursen direkt bei der Arbeit anwenden.“

Larry W.

Lernender seit 2021

„Wenn mir Kurse zu Themen fehlen, die meine Universität nicht anbietet, ist Coursera mit die beste Alternative.“

Chaitanya A.

„Man lernt nicht nur, um bei der Arbeit besser zu werden. Es geht noch um viel mehr. Bei Coursera kann ich ohne Grenzen lernen.“

Schalten Sie den Zugang zu mehr als 10.000 Kursen mit einem Abonnement frei
Bringen Sie Ihre Karriere mit einem Online-Abschluss voran.
Erwerben Sie einen Abschluss von erstklassigen Universitäten – 100 % online
Schließen Sie sich den über 4.700 globalen Unternehmen an, die Coursera for Business wählen

Häufig gestellte Fragen

To access course materials, assignments, and earn a Certificate, you'll need to purchase the Certificate experience when you enroll in a course. Eligible learners may also have the option to start with a Free Trial. Some courses may also offer a Full Course, No Certificate option. This lets you access course materials, submit required assessments, and receive a final grade, but you won't be able to earn or purchase a Certificate.

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.