Production ML with Hugging Face

This course is part of Next-Gen AI Development with Hugging Face Specialization

Instructor: Noah Gift

Access provided by Masterflex LLC, Part of Avantor

4 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

5 hours to complete

Flexible schedule

Learn at your own pace

4 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

5 hours to complete

Flexible schedule

Learn at your own pace

What you'll learn

Convert and deploy ML models across GGUF, SafeTensors, and APR formats for GPU, CPU, and browser targets

Skills you'll gain

Hugging Face
Large Language Modeling
MLOps (Machine Learning Operations)
Model Deployment
Performance Testing
Application Deployment
Performance Tuning
CI/CD
Cloud Deployment
Cryptography
Rust (Programming Language)
Skills section collapsed. Showing 10 of 11 skills.

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

4 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is part of the Next-Gen AI Development with Hugging Face Specialization

When you enroll in this course, you'll also be enrolled in this Specialization.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

There are 4 modules in this course

Learn to deploy ML models to production using the Sovereign Rust Stack—a pure Rust implementation with zero Python runtime dependencies. This hands-on course teaches you to work with three critical model formats (GGUF, SafeTensors, APR), implement MLOps pipelines with CI/CD and observability, and deploy models across GPU, CPU, WebAssembly, and edge targets.

Through real-world projects including a Python-to-Rust transpiler (Depyler), browser-based speech recognition (Whisper.apr), and LLM inference benchmarking (Qwen), you'll master format conversion, cryptographic model signing, and performance optimization. The course culminates in a capstone project deploying Qwen2.5-Coder across all three formats with benchmarking. What makes this course unique: instead of relying on Python frameworks, you'll build with production-grade Rust tooling that compiles to native binaries and WebAssembly. Learn to run sub-millisecond inference in browsers, bundle models into executables, and achieve 2x performance gains over standard tools. Ideal for ML engineers and software developers ready to move beyond notebooks into production deployment.

Understanding ML model formats and the Sovereign AI Stack. Learn GGUF, SafeTensors, and APR formats for different deployment targets.

What's included

6 videos5 readings1 assignment

6 videos Total 18 minutes

New Video 3 minutes
HuggingFace Model Publishing 4 minutes
Model Types on HuggingFace 3 minutes
APR Format Deep Dive 3 minutes
Model Format Comparison 3 minutes
Why Trace Models 3 minutes

5 readings Total 50 minutes

Introduction to Course and Course Resources 10 minutes
Meet your instructors 10 minutes
Key Concepts 10 minutes
Key Terms 10 minutes
Key Terms 10 minutes

1 assignment Total 30 minutes

Quiz: Model Format 30 minutes

Production infrastructure for ML systems. This module covers the essential MLOps practices needed to deploy and maintain ML models in production environments. Learn how to implement CI/CD pipelines specifically designed for ML workflows, set up comprehensive observability with logs, metrics, and traces, apply cryptographic model signing for supply chain security, and choose optimal deployment patterns based on your infrastructure requirements.

What's included

8 videos3 readings1 assignment

8 videos Total 20 minutes

Model Registry Architecture 2 minutes
CI/CD Pipeline for ML 3 minutes
Model Observability Stack 3 minutes
Model Signing & Security 2 minutes
Binary Deployment Patterns 3 minutes
Inference Server Architecture 3 minutes
Corpus Management & DataOps 2 minutes
Cost-Performance Decision Matrix 2 minutes

3 readings Total 30 minutes

Key Concepts 10 minutes
Key Terms 10 minutes
Key Terms 10 minutes

1 assignment Total 30 minutes

Quiz: MLOps Foundations 30 minutes

Real-world projects built with the Sovereign AI Stack. This module demonstrates practical applications through three production projects: Depyler (a Python-to-Rust transpiler with self-improving ML), Whisper.apr (speech-to-text in browser and CLI), and the APR ecosystem tools. Learn how to build self-improving systems using compiler-in-the-loop training, deploy speech recognition to resource-constrained environments, and leverage the full APR toolchain for model conversion and inference.

What's included

11 videos3 readings1 assignment

11 videos Total 38 minutes

Four Projects, One Stack 5 minutes
Depyler Deep Dive 5 minutes
Depyler Oracle Training 3 minutes
Depyler Single-Shot Compile 3 minutes
Whisper.apr Overview 5 minutes
Whisper Code Walkthrough 4 minutes
Whisper Demo 3 minutes
APR Format Rosetta Stone 3 minutes
APR Hub & Spoke Architecture 3 minutes
APR Chat Demo 3 minutes
Course Conclusion 3 minutes

3 readings Total 30 minutes

Key Terms 10 minutes
Key Concepts 10 minutes
Key Concepts 10 minutes

1 assignment Total 30 minutes

Quiz: Project Showcase 30 minutes

Final project deploying Qwen2.5-Coder-0.5B across all three model formats. Students demonstrate mastery of format conversion, CLI deployment, server deployment, and performance benchmarking.