Multimodal and cross-modal AI integrations

Ce cours n'est pas disponible en Français (France)

Nous sommes actuellement en train de le traduire dans plus de langues.

Multimodal and cross-modal AI integrations

Ce cours fait partie de Certificat Professionnel Microsoft Generative AI Engineering

Instructeur : Microsoft

Inclus avec

4 modules

Obtenez un aperçu d'un sujet et apprenez les principes fondamentaux.

niveau Intermédiaire

Expérience recommandée

2 semaines à compléter

à 10 heures par semaine

Planning flexible

Apprenez à votre propre rythme

4 modules

Obtenez un aperçu d'un sujet et apprenez les principes fondamentaux.

niveau Intermédiaire

Expérience recommandée

2 semaines à compléter

à 10 heures par semaine

Planning flexible

Apprenez à votre propre rythme

Compétences que vous acquerrez

Catégorie : Computer Vision
Catégorie : AI Integrations
Catégorie : Multimodal Prompts
Catégorie : Prompt Patterns
Catégorie : Image Analysis
Catégorie : LLM Application
Catégorie : Application Design
Catégorie : Natural Language Processing
Catégorie : Artificial Intelligence
Catégorie : Model Optimization

Outils que vous découvrirez

Catégorie : Prompt Engineering
Catégorie : AI Workflows
Catégorie : Generative AI
Catégorie : OpenAI API
Catégorie : AI Orchestration
Catégorie : Microsoft Azure

Détails à connaître

Certificat partageable

Ajouter à votre profil LinkedIn

Récemment mis à jour !

janvier 2026

Évaluations

24 devoirs

Enseigné en Anglais

91% of learners achieved a positive career outcome

Découvrez comment les employés des entreprises prestigieuses maîtrisent des compétences recherchées

En savoir plus sur Coursera pour les affaires

logos de Petrobras, TATA, Danone, Capgemini, P&G et L'Oreal

Élaborez votre expertise en Software Development

Ce cours fait partie de la Certificat Professionnel Microsoft Generative AI Engineering

Lorsque vous vous inscrivez à ce cours, vous êtes également inscrit(e) à ce Certificat Professionnel.

Apprenez de nouveaux concepts auprès d'experts du secteur
Acquérez une compréhension de base d'un sujet ou d'un outil
Développez des compétences professionnelles avec des projets pratiques
Obtenez un certificat professionnel partageable auprès de Microsoft

Il y a 4 modules dans ce cours

Learn to build AI that sees, hears, and understands the world in an integrated way. This course takes you beyond single-modality models, teaching you to architect applications that connect different data types like text, images, and speech.

This module introduces the foundational concepts of multimodal AI. You will learn the architectural patterns for combining different AI components, such as text and image models, and progress from basic integration to building complex systems that can reason across multiple data types. Important Notice on the Azure Interface: The screencast videos and screenshots were last updated in late 2025. Please be aware that Microsoft may have updated the Azure interface since then. If the steps shown in the course materials look different from your current Azure environment, please follow the most up-to-date interface, as the underlying concepts and learning objectives remain the same.

Inclus

4 vidéos9 lectures7 devoirs

4 vidéosTotal 18 minutes

Introduction to Microsoft Generative AI engineering certification4 minutes
Introduction to multimodal and cross-modal integrations course3 minutes
Understanding multimodal AI5 minutes
Advanced multimodal applications5 minutes

9 lecturesTotal 95 minutes

Course syllabus and recommended background5 minutes
Components of multimodal AI setup15 minutes
Visualizing a multimodal workflow15 minutes
Architectural choices in multimodal AI: Single model vs. chained pipelines10 minutes
Analyzing your first multimodal integration10 minutes
Advanced integration strategies and use cases10 minutes
Insights on advanced multimodal AI10 minutes
Case study: Designing a multimodal product search10 minutes
Module 1 summary: From architectural theory to practical integration10 minutes

7 devoirsTotal 195 minutes

Module 1 evaluation: Graded Quiz30 minutes
First steps with a true multimodal model15 minutes
Building your first multimodal pipeline30 minutes
Multimodal integration: Practice Quiz30 minutes
Building a multimodal system30 minutes
Architecting a complex multimodal solution30 minutes
Advanced multimodal skills: Practice Quiz30 minutes

This module provides a deep dive into the popular and creative task of generating images from text descriptions. You will explore the models that power this technology, like DALL·E, and learn both basic and advanced prompting techniques to craft and refine specific, high-quality visual outputs. Important Notice on the Azure Interface: The screencast videos and screenshots were last updated in late 2025. Please be aware that Microsoft may have updated the Azure interface since then. If the steps shown in the course materials look different from your current Azure environment, please follow the most up-to-date interface, as the underlying concepts and learning objectives remain the same.

Inclus

5 vidéos5 lectures5 devoirs

5 vidéosTotal 19 minutes

Module 2 introduction: From words to worlds with text-to-image models6 minutes
From text to image in practice4 minutes
Text-to-image model comparisons3 minutes
Mastering text-to-image control3 minutes
Module 2 summary: From architecture to artistic control3 minutes

5 lecturesTotal 50 minutes

Exploration of text-to-image practices10 minutes
Insights from text-to-image applications10 minutes
Advanced text-to-image techniques10 minutes
Advanced text-to-image insights10 minutes
Case study: A creative workflow for a marketing campaign10 minutes

5 devoirsTotal 180 minutes

Module 2 evaluation: Graded Quiz30 minutes
Generating and refining images with text-to-image prompts30 minutes
Text-to-image skills: Practice Quiz30 minutes
Synthesizing advanced text-to-image workflows60 minutes
Solving text-to-image challenges: Practice Quiz30 minutes

This module focuses on practical implementation using a powerful, specialized tool. You will leverage the features of Azure AI Vision to build and optimize cross-modal applications like image captioning and visual search. You'll learn how this single service can analyze visual content to generate rich textual descriptions and extract embedded text (OCR), providing the core components for sophisticated multimodal solutions. Important Notice on the Azure Interface: The screencast videos and screenshots were last updated in late 2025. Please be aware that Microsoft may have updated the Azure interface since then. If the steps shown in the course materials look different from your current Azure environment, please follow the most up-to-date interface, as the underlying concepts and learning objectives remain the same.

Inclus

7 vidéos6 lectures7 devoirs

7 vidéosTotal 28 minutes

An overview of the Azure AI services toolkit5 minutes
Module 3 introduction: The multiple applications of Azure AI Vision3 minutes
Bringing sight to your applications with Azure AI Vision5 minutes
Getting started with Azure AI Vision4 minutes
Exploring cross-modal features in Vision Studio4 minutes
Refining cross-modal applications6 minutes
Module 3 summary: From a single feature to a complete vision solution2 minutes

6 lecturesTotal 60 minutes

Prototyping vs. production: The role of Vision Studio10 minutes
Cross-modal AI implementation insights10 minutes
Interpreting OCR results with the SDK10 minutes
Advanced strategies for cross-modal AI10 minutes
Optimizing multimodal workflows10 minutes
Case study: Building an automated inventory checker10 minutes

7 devoirsTotal 255 minutes

Module 3 evaluation: Graded Quiz30 minutes
Exploring cross-modal techniques30 minutes
Extract text from images60 minutes
Cross-modal techniques quiz: Practice Quiz30 minutes
Chaining vision skills with the Python SDK30 minutes
Extending a multimodal application45 minutes
Advanced cross-modal skills: Practice Quiz30 minutes

This capstone module builds upon your deep expertise in Azure AI Vision. You will learn to integrate your vision applications with other powerful Azure AI Services, such as Language and Speech, to create comprehensive, end-to-end solutions. The focus will be on orchestrating these distinct services to develop a sophisticated application that solves a real-world business problem, demonstrating your ability to design and build a complete multimodal system from the ground up. Important Notice on the Azure Interface: The screencast videos and screenshots were last updated in late 2025. Please be aware that Microsoft may have updated the Azure interface since then. If the steps shown in the course materials look different from your current Azure environment, please follow the most up-to-date interface, as the underlying concepts and learning objectives remain the same.

Inclus

6 vidéos5 lectures5 devoirs

6 vidéosTotal 26 minutes

Module 4 introduction: Building an end-to-end solution3 minutes
Orchestrating Azure AI services: A demonstration6 minutes
Setting up your environment for integration6 minutes
Demonstrating text-to-speech with the SDK6 minutes
Module 4 summary: Orchestrating a full AI solution2 minutes
Course summary3 minutes

5 lecturesTotal 60 minutes

Integrating Azure AI services15 minutes
Managing multimodal workflows10 minutes
Adding speech to your application15 minutes
Analyzing your end-to-end application10 minutes
Production considerations for multimodal apps10 minutes

5 devoirsTotal 210 minutes

Module 4 evaluation: Graded Quiz30 minutes
Integrating Vision with the language service60 minutes
Designing multimodal workflows: Practice Quiz30 minutes
Building an end-to-end multimodal application60 minutes
Analyzing multimodal solutions: Practice Quiz30 minutes

Obtenez un certificat professionnel

Ajoutez ce titre à votre profil LinkedIn, à votre curriculum vitae ou à votre CV. Partagez-le sur les médias sociaux et dans votre évaluation des performances.

Instructeur

Microsoft

326 Cours2 558 011 apprenants

Offert par

Microsoft

En savoir plus sur Software Development

IBM
Build Multimodal Generative AI Applications
Cours
Pragmatic AI Labs
Multi-modal AI
Cours
Coursera
End-to-End Multimodal AI: Fine-Tuning, Fusion, and MLOps
Cours
Coursera
Architect Multimodal AI Solutions End-to-End
Cours

Pour quelles raisons les étudiants sur Coursera nous choisissent-ils pour leur carrière ?

Felipe M.

Étudiant(e) depuis 2018

’Pouvoir suivre des cours à mon rythme à été une expérience extraordinaire. Je peux apprendre chaque fois que mon emploi du temps me le permet et en fonction de mon humeur.’

Jennifer J.

Étudiant(e) depuis 2020

’J'ai directement appliqué les concepts et les compétences que j'ai appris de mes cours à un nouveau projet passionnant au travail.’

Larry W.

Étudiant(e) depuis 2021

’Lorsque j'ai besoin de cours sur des sujets que mon université ne propose pas, Coursera est l'un des meilleurs endroits où se rendre.’

Chaitanya A.

’Apprendre, ce n'est pas seulement s'améliorer dans son travail : c'est bien plus que cela. Coursera me permet d'apprendre sans limites.’

Foire Aux Questions

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.