Learn how to build multimodal search and RAG systems. RAG systems enhance an LLM by incorporating proprietary data into the prompt context. Typically, RAG applications use text documents, but, what if the desired context includes multimedia like images, audio, and video? This course covers the technical aspects of implementing RAG with multimodal data to accomplish this.



Building Multimodal Search and RAG

Instructor: Sebastian Witalec
Access provided by Duke University
5,399 already enrolled
(35 reviews)
Recommended experience
What you'll learn
Learn multimodality with contrastive learning to create modality-independent embeddings for seamless any-to-any retrieval.
Build multimodal RAG systems that retrieve multimodal context and reason over it to generate more relevant answers.
Implement industry applications of multimodal search and build multi-vector recommender systems.
Skills you'll practice
Details to know
Only available on desktop
See how employees at top companies are mastering in-demand skills

Learn, practice, and apply job-ready skills in less than 2 hours
- Receive training from industry experts
- Gain hands-on experience solving real-world job tasks

About this project
Instructor

Offered by
How you'll learn
Hands-on, project-based learning
Practice new skills by completing job-related tasks with step-by-step instructions.
No downloads or installation required
Access the tools and resources you need in a cloud environment.
Available only on desktop
This project is designed for laptops or desktop computers with a reliable Internet connection, not mobile devices.
Why people choose Coursera for their career




You might also like
DeepLearning.AI
Coursera Instructor Network
DeepLearning.AI