AI Orchestration: From local models to cloud

This course is part of AI Tooling Specialization

Instructors: Alfredo Deza

Included with Learn more

Ask Coursera

4 modules

Gain insight into a topic and learn the fundamentals.

Beginner level

Recommended experience

5 hours to complete

Flexible schedule

Learn at your own pace

4 modules

Gain insight into a topic and learn the fundamentals.

Beginner level

Recommended experience

5 hours to complete

Flexible schedule

Learn at your own pace

What you'll learn

Build a prompt engineering pyramid from basic prompts to chain-of-thought reasoning in Rust, and evaluate decision factors for local vs cloud
Set up local AI infrastructure with Ollama, llamafile, aprender and Rust Candle GPU compilation, plus caching and RAG optimization strategies
Configure a production AI workstation with tmux, nvidia-smi, and Zenith, and integrate cloud workflows with AWS Spot, Hugging Face, and GitHub AI

Skills you'll gain

Tools you'll learn

Details to know

Shareable certificate

Add to your LinkedIn profile

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is part of the AI Tooling Specialization

When you enroll in this course, you'll also be enrolled in this Specialization.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

There are 4 modules in this course

Learn to orchestrate AI systems across local and cloud environments through hands-on infrastructure setup, model deployment, and workflow integration. You will build a prompt engineering pyramid from basic prompts to chain-of-thought reasoning implemented in Rust, then evaluate six decision factors for choosing between local and cloud models including latency, throughput, cost, and privacy. The course covers local AI infrastructure in depth: running Ollama with custom Modelfiles for task-specific assistants, deploying llamafile for zero-dependency portable inference, compiling Rust Candle with CUDA for GPU-accelerated local inference, and optimizing local RAG with caching strategies. You will configure a complete AI workstation with tmux for session management, nvidia-smi and Zenith for GPU monitoring, and NVIDIA GPU optimization. The final module covers cloud workflows including AWS Spot instances for cost-effective GPU compute, Hugging Face model discovery and download, and GitHub AI models integration. By completing this course, you will be able to set up local AI infrastructure, deploy models across local and cloud environments, and design orchestration workflows that balance cost, privacy, and performance.

A comprehensive course covering prompt engineering with chain-of-thought reasoning, local inference runtimes (Ollama, llamafile, Candle), GPU workstation configuration, and cost-optimized cloud deployment with AWS Spot instances.

What's included

7 videos2 readings1 assignment

7 videosTotal 29 minutes

Course intro2 minutes
Course overview2 minutes
AI orchestration overview8 minutes
Prompt engineering pyramid3 minutes
Chain of thought prompt Rust4 minutes
Chain of thought Rust prompt demo6 minutes
Explaining chain of thought Rust prompt3 minutes

2 readingsTotal 2 minutes

Key Terms: Course1 minute
Key Terms: Prompt Engineering Pyramid1 minute

1 assignmentTotal 5 minutes

Orchestration Fundamentals5 minutes

Covers local vs cloud model tradeoffs, caching strategies, local RAG optimization, Ollama with custom Modelfiles, llamafile portable deployment, and Candle GPU-accelerated Rust inference.

What's included

9 videos3 readings1 assignment

9 videosTotal 41 minutes

Ollama local demo7 minutes
Ollama Modelfile Rust debugger7 minutes
Ollama arch2 minutes
Local vs. cloud models5 minutes
Caching for AI4 minutes
Optimizing local RAG5 minutes
Llamafile getting started Gemma4 minutes
Llamafile simple3 minutes
Compiling Rust candle GPU5 minutes

3 readingsTotal 30 minutes

Key Terms: Ollama Local Demo10 minutes
Key Terms: Local vs. Cloud Models10 minutes
Key Terms: Llamafile: Getting Started with Gemma10 minutes

1 assignmentTotal 5 minutes

Quiz: Local AI Infrastructure5 minutes

Covers tmux session management, nvidia-smi and Zenith GPU monitoring, local workstation orchestration, AWS Spot instance deployment, Hugging Face and GitHub AI model workflows, and Rust project structure.

What's included

11 videos3 readings1 assignment

11 videosTotal 47 minutes

AWS spot deploy ML4 minutes
Hugging Face workflow models3 minutes
GitHub AI models workflow3 minutes
Rust Hello World project structure2 minutes
Using tmux on Linux11 minutes
Using NVIDIA SMI5 minutes
Using Zenith GPU monitoring5 minutes
AI orchestration local workstation5 minutes
Technical training approaches5 minutes
Effective AI engineering learning3 minutes
Course conclusion3 minutes

3 readingsTotal 30 minutes

Key Terms: AWS Spot Deploy ML10 minutes
Key Terms: Using tmux on Linux10 minutes
Key Terms: Technical Training Approaches10 minutes

1 assignmentTotal 30 minutes

Quiz: Workstation and Cloud Workflows30 minutes

Head-to-head comparison of Ollama vs `apr` ([paiml/aprender](https://github.com/paiml/aprender)) running Qwen2.5-Coder-1.5B on the same prompt suite, same hardware. Build a chain-of-thought routing engine that selects runtimes based on task complexity and validation requirements, with cost analysis spanning local workstations, Spot instances, and Bedrock.

What's included

4 readings1 assignment

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructors

Alfredo Deza

Pragmatic AI Labs

35 Courses3,356 learners

Offered by

Pragmatic AI Labs

Explore more from Software Development

Status: Free Trial
Coursera
Orchestrate, Analyze, and Evaluate AI Deployments
Course
Status: Free Trial
Pragmatic AI Labs
AI Tooling Capstone: Serverless Multi-Model Systems
Course
Duke University
Local LLMs with llamafile
Guided Project
Status: Free Trial
Edureka
AI Agent Orchestration and Scaling
Course

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Unlock access to 10,000+ courses with a subscription
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Join over 4,700 global companies that choose Coursera for Business

Frequently asked questions

To access course materials, assignments, and earn a Certificate, you'll need to purchase the Certificate experience when you enroll in a course. Eligible learners may also have the option to start with a Free Trial. Some courses may also offer a Full Course, No Certificate option. This lets you access course materials, submit required assessments, and receive a final grade, but you won't be able to earn or purchase a Certificate.

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.