AI Orchestration: From local models to cloud

This course is part of AI Tooling Specialization

Instructors: Alfredo Deza

Access provided by Xavier School of Management, XLRI

4 modules

Gain insight into a topic and learn the fundamentals.

Beginner level

Recommended experience

5 hours to complete

Flexible schedule

Learn at your own pace

4 modules

Gain insight into a topic and learn the fundamentals.

Beginner level

Recommended experience

5 hours to complete

Flexible schedule

Learn at your own pace

What you'll learn

Build a prompt engineering pyramid from basic prompts to chain-of-thought reasoning in Rust, and evaluate decision factors for local vs cloud
Set up local AI infrastructure with Ollama, llamafile, aprender and Rust Candle GPU compilation, plus caching and RAG optimization strategies
Configure a production AI workstation with tmux, nvidia-smi, and Zenith, and integrate cloud workflows with AWS Spot, Hugging Face, and GitHub AI

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

4 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is part of the AI Tooling Specialization

When you enroll in this course, you'll also be enrolled in this Specialization.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

There are 4 modules in this course

Learn to orchestrate AI systems across local and cloud environments through hands-on infrastructure setup, model deployment, and workflow integration. You will build a prompt engineering pyramid from basic prompts to chain-of-thought reasoning implemented in Rust, then evaluate six decision factors for choosing between local and cloud models including latency, throughput, cost, and privacy. The course covers local AI infrastructure in depth: running Ollama with custom Modelfiles for task-specific assistants, deploying llamafile for zero-dependency portable inference, compiling Rust Candle with CUDA for GPU-accelerated local inference, and optimizing local RAG with caching strategies. You will configure a complete AI workstation with tmux for session management, nvidia-smi and Zenith for GPU monitoring, and NVIDIA GPU optimization. The final module covers cloud workflows including AWS Spot instances for cost-effective GPU compute, Hugging Face model discovery and download, and GitHub AI models integration. By completing this course, you will be able to set up local AI infrastructure, deploy models across local and cloud environments, and design orchestration workflows that balance cost, privacy, and performance.

A comprehensive course covering prompt engineering with chain-of-thought reasoning, local inference runtimes (Ollama, llamafile, Candle), GPU workstation configuration, and cost-optimized cloud deployment with AWS Spot instances.

What's included

7 videos3 readings1 assignment

7 videosTotal 29 minutes

Course intro2 minutes
Course overview2 minutes
AI orchestration overview8 minutes
Prompt engineering pyramid3 minutes
Chain of thought prompt Rust4 minutes
Chain of thought Rust prompt demo6 minutes
Explaining chain of thought Rust prompt3 minutes

3 readingsTotal 30 minutes

About This Course10 minutes
Key Terms: Course10 minutes
Key Terms: Prompt Engineering Pyramid10 minutes

1 assignmentTotal 5 minutes

Orchestration Fundamentals5 minutes

Covers local vs cloud model tradeoffs, caching strategies, local RAG optimization, Ollama with custom Modelfiles, llamafile portable deployment, and Candle GPU-accelerated Rust inference.

What's included

9 videos3 readings1 assignment

9 videosTotal 41 minutes

Ollama local demo7 minutes
Ollama Modelfile Rust debugger7 minutes
Ollama arch2 minutes
Local vs. cloud models5 minutes
Caching for AI4 minutes
Optimizing local RAG5 minutes
Llamafile getting started Gemma4 minutes
Llamafile simple3 minutes
Compiling Rust candle GPU5 minutes

3 readingsTotal 30 minutes

Key Terms: Ollama Local Demo10 minutes
Key Terms: Local vs. Cloud Models10 minutes
Key Terms: Llamafile: Getting Started with Gemma10 minutes

1 assignmentTotal 5 minutes

Quiz: Local AI Infrastructure5 minutes

Covers tmux session management, nvidia-smi and Zenith GPU monitoring, local workstation orchestration, AWS Spot instance deployment, Hugging Face and GitHub AI model workflows, and Rust project structure.

What's included

11 videos3 readings1 assignment

11 videosTotal 47 minutes

AWS spot deploy ML4 minutes
Hugging Face workflow models3 minutes
GitHub AI models workflow3 minutes
Rust Hello World project structure2 minutes
Using tmux on Linux11 minutes
Using NVIDIA SMI5 minutes
Using Zenith GPU monitoring5 minutes
AI orchestration local workstation5 minutes
Technical training approaches5 minutes
Effective AI engineering learning3 minutes
Course conclusion3 minutes

3 readingsTotal 30 minutes

Key Terms: AWS Spot Deploy ML10 minutes
Key Terms: Using tmux on Linux10 minutes
Key Terms: Technical Training Approaches10 minutes

1 assignmentTotal 30 minutes

Quiz: Workstation and Cloud Workflows30 minutes

Head-to-head comparison of Ollama vs `apr` ([paiml/aprender](https://github.com/paiml/aprender)) running Qwen2.5-Coder-1.5B on the same prompt suite, same hardware. Build a chain-of-thought routing engine that selects runtimes based on task complexity and validation requirements, with cost analysis spanning local workstations, Spot instances, and Bedrock.