Multimodal and cross-modal AI integrations

kurs ist nicht verfügbar in Deutsch (Deutschland)

Wir übersetzen es in weitere Sprachen.

Multimodal and cross-modal AI integrations

Dieser Kurs ist Teil von Microsoft Generative AI Engineering (berufsbezogenes Zertifikat)

Dozent: Microsoft

Bei enthalten

Mehr erfahren

4 Module

Verschaffen Sie sich einen Einblick in ein Thema und lernen Sie die Grundlagen.

Stufe Mittel

Empfohlene Erfahrung

2 Wochen zu vervollständigen

unter 10 Stunden pro Woche

Flexibler Zeitplan

In Ihrem eigenen Lerntempo lernen

4 Module

Verschaffen Sie sich einen Einblick in ein Thema und lernen Sie die Grundlagen.

Stufe Mittel

Empfohlene Erfahrung

2 Wochen zu vervollständigen

unter 10 Stunden pro Woche

Flexibler Zeitplan

In Ihrem eigenen Lerntempo lernen

Kompetenzen, die Sie erwerben

Kategorie: Computer Vision
Kategorie: AI Integrations
Kategorie: Multimodal Prompts
Kategorie: Prompt Patterns
Kategorie: Image Analysis
Kategorie: LLM Application
Kategorie: Application Design
Kategorie: Natural Language Processing
Kategorie: Artificial Intelligence
Kategorie: Model Optimization

Werkzeuge, die Sie lernen werden

Kategorie: Prompt Engineering
Kategorie: AI Workflows
Kategorie: Generative AI
Kategorie: OpenAI API
Kategorie: AI Orchestration
Kategorie: Microsoft Azure

Wichtige Details

Zertifikat zur Vorlage

Zu Ihrem LinkedIn-Profil hinzufügen

Kürzlich aktualisiert!

Januar 2026

Bewertungen

24 Aufgaben

Unterrichtet in Englisch

91%

of learners achieved a positive career outcome

Erfahren Sie, wie Mitarbeiter führender Unternehmen gefragte Kompetenzen erwerben.

Weitere Informationen zu Coursera für Unternehmen

Logos von Petrobras, TATA, Danone, Capgemini, P&G und L'Oreal

Erweitern Sie Ihr Fachwissen im Bereich Software Development

Dieser Kurs ist Teil der Spezialisierung Microsoft Generative AI Engineering (berufsbezogenes Zertifikat)

Wenn Sie sich für diesen Kurs anmelden, werden Sie auch für dieses berufsbezogene Zertifikat angemeldet.

Lernen Sie neue Konzepte von Branchenexperten
Gewinnen Sie ein Grundverständnis bestimmter Themen oder Tools
Erwerben Sie berufsrelevante Kompetenzen durch praktische Projekte
Erwerben Sie ein Berufszertifikat von Microsoft zur Vorlage

In diesem Kurs gibt es 4 Module

Learn to build AI that sees, hears, and understands the world in an integrated way. This course takes you beyond single-modality models, teaching you to architect applications that connect different data types like text, images, and speech.

This module introduces the foundational concepts of multimodal AI. You will learn the architectural patterns for combining different AI components, such as text and image models, and progress from basic integration to building complex systems that can reason across multiple data types. Important Notice on the Azure Interface: The screencast videos and screenshots were last updated in late 2025. Please be aware that Microsoft may have updated the Azure interface since then. If the steps shown in the course materials look different from your current Azure environment, please follow the most up-to-date interface, as the underlying concepts and learning objectives remain the same.

Das ist alles enthalten

4 Videos9 Lektüren7 Aufgaben

4 VideosInsgesamt 18 Minuten

Introduction to Microsoft Generative AI engineering certification4 Minuten
Introduction to multimodal and cross-modal integrations course3 Minuten
Understanding multimodal AI5 Minuten
Advanced multimodal applications5 Minuten

9 LektürenInsgesamt 95 Minuten

Course syllabus and recommended background5 Minuten
Components of multimodal AI setup15 Minuten
Visualizing a multimodal workflow15 Minuten
Architectural choices in multimodal AI: Single model vs. chained pipelines10 Minuten
Analyzing your first multimodal integration10 Minuten
Advanced integration strategies and use cases10 Minuten
Insights on advanced multimodal AI10 Minuten
Case study: Designing a multimodal product search10 Minuten
Module 1 summary: From architectural theory to practical integration10 Minuten

7 AufgabenInsgesamt 195 Minuten

First steps with a true multimodal model15 Minuten
Building your first multimodal pipeline30 Minuten
Multimodal integration: Practice Quiz30 Minuten
Building a multimodal system30 Minuten
Architecting a complex multimodal solution30 Minuten
Advanced multimodal skills: Practice Quiz30 Minuten
Module 1 evaluation: Graded Quiz30 Minuten

This module provides a deep dive into the popular and creative task of generating images from text descriptions. You will explore the models that power this technology, like DALL·E, and learn both basic and advanced prompting techniques to craft and refine specific, high-quality visual outputs. Important Notice on the Azure Interface: The screencast videos and screenshots were last updated in late 2025. Please be aware that Microsoft may have updated the Azure interface since then. If the steps shown in the course materials look different from your current Azure environment, please follow the most up-to-date interface, as the underlying concepts and learning objectives remain the same.

Das ist alles enthalten

5 Videos5 Lektüren5 Aufgaben

5 VideosInsgesamt 19 Minuten

Module 2 introduction: From words to worlds with text-to-image models6 Minuten
From text to image in practice4 Minuten
Text-to-image model comparisons3 Minuten
Mastering text-to-image control3 Minuten
Module 2 summary: From architecture to artistic control3 Minuten

5 LektürenInsgesamt 50 Minuten

Exploration of text-to-image practices10 Minuten
Insights from text-to-image applications10 Minuten
Advanced text-to-image techniques10 Minuten
Advanced text-to-image insights10 Minuten
Case study: A creative workflow for a marketing campaign10 Minuten

5 AufgabenInsgesamt 180 Minuten

Generating and refining images with text-to-image prompts30 Minuten
Text-to-image skills: Practice Quiz30 Minuten
Synthesizing advanced text-to-image workflows60 Minuten
Solving text-to-image challenges: Practice Quiz30 Minuten
Module 2 evaluation: Graded Quiz30 Minuten

This module focuses on practical implementation using a powerful, specialized tool. You will leverage the features of Azure AI Vision to build and optimize cross-modal applications like image captioning and visual search. You'll learn how this single service can analyze visual content to generate rich textual descriptions and extract embedded text (OCR), providing the core components for sophisticated multimodal solutions. Important Notice on the Azure Interface: The screencast videos and screenshots were last updated in late 2025. Please be aware that Microsoft may have updated the Azure interface since then. If the steps shown in the course materials look different from your current Azure environment, please follow the most up-to-date interface, as the underlying concepts and learning objectives remain the same.

Das ist alles enthalten

7 Videos6 Lektüren7 Aufgaben

7 VideosInsgesamt 28 Minuten

An overview of the Azure AI services toolkit5 Minuten
Module 3 introduction: The multiple applications of Azure AI Vision3 Minuten
Bringing sight to your applications with Azure AI Vision5 Minuten
Getting started with Azure AI Vision4 Minuten
Exploring cross-modal features in Vision Studio4 Minuten
Refining cross-modal applications6 Minuten
Module 3 summary: From a single feature to a complete vision solution2 Minuten

6 LektürenInsgesamt 60 Minuten

Prototyping vs. production: The role of Vision Studio10 Minuten
Cross-modal AI implementation insights10 Minuten
Interpreting OCR results with the SDK10 Minuten
Advanced strategies for cross-modal AI10 Minuten
Optimizing multimodal workflows10 Minuten
Case study: Building an automated inventory checker10 Minuten

7 AufgabenInsgesamt 255 Minuten

Exploring cross-modal techniques30 Minuten
Extract text from images60 Minuten
Cross-modal techniques quiz: Practice Quiz30 Minuten
Chaining vision skills with the Python SDK30 Minuten
Extending a multimodal application45 Minuten
Advanced cross-modal skills: Practice Quiz30 Minuten
Module 3 evaluation: Graded Quiz30 Minuten

This capstone module builds upon your deep expertise in Azure AI Vision. You will learn to integrate your vision applications with other powerful Azure AI Services, such as Language and Speech, to create comprehensive, end-to-end solutions. The focus will be on orchestrating these distinct services to develop a sophisticated application that solves a real-world business problem, demonstrating your ability to design and build a complete multimodal system from the ground up. Important Notice on the Azure Interface: The screencast videos and screenshots were last updated in late 2025. Please be aware that Microsoft may have updated the Azure interface since then. If the steps shown in the course materials look different from your current Azure environment, please follow the most up-to-date interface, as the underlying concepts and learning objectives remain the same.

Das ist alles enthalten

6 Videos5 Lektüren5 Aufgaben

6 VideosInsgesamt 26 Minuten

Module 4 introduction: Building an end-to-end solution3 Minuten
Orchestrating Azure AI services: A demonstration6 Minuten
Setting up your environment for integration6 Minuten
Demonstrating text-to-speech with the SDK6 Minuten
Module 4 summary: Orchestrating a full AI solution2 Minuten
Course summary3 Minuten

5 LektürenInsgesamt 60 Minuten

Integrating Azure AI services15 Minuten
Managing multimodal workflows10 Minuten
Adding speech to your application15 Minuten
Analyzing your end-to-end application10 Minuten
Production considerations for multimodal apps10 Minuten

5 AufgabenInsgesamt 210 Minuten

Integrating Vision with the language service60 Minuten
Designing multimodal workflows: Practice Quiz30 Minuten
Building an end-to-end multimodal application60 Minuten
Analyzing multimodal solutions: Practice Quiz30 Minuten
Module 4 evaluation: Graded Quiz30 Minuten

Erwerben Sie ein Karrierezertifikat.

Fügen Sie dieses Zeugnis Ihrem LinkedIn-Profil, Lebenslauf oder CV hinzu. Teilen Sie sie in Social Media und in Ihrer Leistungsbeurteilung.

Dozent

Microsoft

326 Kurse2.559.837 Lernende

von

Microsoft

Mehr von Software Development entdecken

Pragmatic AI Labs
Multi-modal AI
Kurs
IBM
Build Multimodal Generative AI Applications
Kurs
Coursera
End-to-End Multimodal AI: Fine-Tuning, Fusion, and MLOps
Kurs
Coursera
Architect Multimodal AI Solutions End-to-End
Kurs

Warum entscheiden sich Menschen für Coursera für ihre Karriere?

Felipe M.

Lernender seit 2018

„Es ist eine großartige Erfahrung, in meinem eigenen Tempo zu lernen. Ich kann lernen, wenn ich Zeit und Nerven dazu habe.“

Jennifer J.

Lernender seit 2020

„Bei einem spannenden neuen Projekt konnte ich die neuen Kenntnisse und Kompetenzen aus den Kursen direkt bei der Arbeit anwenden.“

Larry W.

Lernender seit 2021

„Wenn mir Kurse zu Themen fehlen, die meine Universität nicht anbietet, ist Coursera mit die beste Alternative.“

Chaitanya A.

„Man lernt nicht nur, um bei der Arbeit besser zu werden. Es geht noch um viel mehr. Bei Coursera kann ich ohne Grenzen lernen.“

Häufig gestellte Fragen

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.