Learn to deploy ML models to production using the Sovereign Rust Stack—a pure Rust implementation with zero Python runtime dependencies. This hands-on course teaches you to work with three critical model formats (GGUF, SafeTensors, APR), implement MLOps pipelines with CI/CD and observability, and deploy models across GPU, CPU, WebAssembly, and edge targets.

Production ML with Hugging Face

Production ML with Hugging Face
This course is part of Next-Gen AI Development with Hugging Face Specialization

Instructor: Noah Gift
Access provided by Masterflex LLC, Part of Avantor
Recommended experience
What you'll learn
Convert and deploy ML models across GGUF, SafeTensors, and APR formats for GPU, CPU, and browser targets
Skills you'll gain
Details to know

Add to your LinkedIn profile
4 assignments
February 2026
See how employees at top companies are mastering in-demand skills

Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate

There are 4 modules in this course
Understanding ML model formats and the Sovereign AI Stack. Learn GGUF, SafeTensors, and APR formats for different deployment targets.
What's included
6 videos5 readings1 assignment
Production infrastructure for ML systems. This module covers the essential MLOps practices needed to deploy and maintain ML models in production environments. Learn how to implement CI/CD pipelines specifically designed for ML workflows, set up comprehensive observability with logs, metrics, and traces, apply cryptographic model signing for supply chain security, and choose optimal deployment patterns based on your infrastructure requirements.
What's included
8 videos3 readings1 assignment
Real-world projects built with the Sovereign AI Stack. This module demonstrates practical applications through three production projects: Depyler (a Python-to-Rust transpiler with self-improving ML), Whisper.apr (speech-to-text in browser and CLI), and the APR ecosystem tools. Learn how to build self-improving systems using compiler-in-the-loop training, deploy speech recognition to resource-constrained environments, and leverage the full APR toolchain for model conversion and inference.
What's included
11 videos3 readings1 assignment
Final project deploying Qwen2.5-Coder-0.5B across all three model formats. Students demonstrate mastery of format conversion, CLI deployment, server deployment, and performance benchmarking.
What's included
1 reading1 assignment
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Instructor

Offered by
Why people choose Coursera for their career

Felipe M.

Jennifer J.

Larry W.

Chaitanya A.
Explore more from Computer Science

Coursera

Pragmatic AI Labs

Duke University


