IBM
Generative AI Advance Fine-Tuning for LLMs
IBM

Generative AI Advance Fine-Tuning for LLMs

Joseph Santarcangelo
Ashutosh Sagar
Wojciech 'Victor' Fulmyk

Instructors: Joseph Santarcangelo

Access provided by Maxis

14,552 already enrolled

Gain insight into a topic and learn the fundamentals.
4.3

(106 reviews)

Intermediate level

Recommended experience

9 hours to complete
Flexible schedule
Learn at your own pace
Gain insight into a topic and learn the fundamentals.
4.3

(106 reviews)

Intermediate level

Recommended experience

9 hours to complete
Flexible schedule
Learn at your own pace

What you'll learn

  • In-demand generative AI engineering skills in fine-tuning LLMs that employers are actively seeking

  • Instruction tuning and reward modeling using Hugging Face, plus understanding LLMs as policies and applying RLHF techniques

  • Direct preference optimization (DPO) with partition function and Hugging Face, including how to define optimal solutions to DPO problems

  • Using proximal policy optimization (PPO) with Hugging Face to build scoring functions and tokenize datasets for fine-tuning

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

5 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is available as part of
When you enroll in this course, you'll also be asked to select a specific program.
  • Learn new concepts from industry experts
  • Gain a foundational understanding of a subject or tool
  • Develop job-relevant skills with hands-on projects
  • Earn a shareable career certificate

There are 2 modules in this course

In this module, you will explore advanced techniques for fine-tuning large language models (LLMs) through instruction tuning and reward modeling. You’ll begin by defining instruction tuning and learning its process, including dataset loading, text generation pipelines, and training arguments using Hugging Face. You’ll then delve into reward modeling, where you’ll preprocess datasets, apply low-rank adaptation (LoRA) configurations, and quantify quality responses to guide model optimization and align with human preferences. You’ll also describe and utilize reward trainers and reward model loss functions. In addition, the hands-on labs will reinforce your learning with practical experience in instruction tuning and reward modeling, empowering you to effectively customize LLMs for targeted tasks.

What's included

6 videos4 readings2 assignments2 app items3 plugins

In this module, you will explore advanced techniques for fine-tuning large language models (LLMs) using reinforcement learning from human feedback (RLHF), proximal policy optimization (PPO), and direct preference optimization (DPO). You’ll begin by describing how LLMs function as probabilistic distributions and how these can be transformed into policies to generate responses based on input text. You’ll examine the relationship between policies and language models as a function of parameters, such as omega, and how rewards can be calculated using human feedback. This includes training response samples, evaluating agent performance, and defining scoring functions for tasks like sentiment analysis using PPO. You’ll also be able to explain PPO configuration, learning rates, and the PPO trainer’s role in optimizing chatbot responses using Hugging Face tools. The module further introduces DPO, a more direct and efficient way to align models with human preferences. While complex topics like PPO and reinforcement learning are introduced, you are not expected to understand them in depth for this course. The hands-on labs in this module will allow you to practice applying RLHF and DPO. To support your learning, a cheat sheet and glossary are included for quick reference.

What's included

10 videos5 readings3 assignments2 app items4 plugins

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructors

Instructor ratings
3.7 (13 ratings)
Joseph Santarcangelo
IBM
36 Courses2,190,688 learners

Offered by

IBM

Why people choose Coursera for their career

Felipe M.
Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."
Jennifer J.
Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."
Larry W.
Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."
Chaitanya A.
"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

4.3

106 reviews

  • 5 stars

    73.14%

  • 4 stars

    8.33%

  • 3 stars

    4.62%

  • 2 stars

    5.55%

  • 1 star

    8.33%

Showing 3 of 106

RN
5

Reviewed on Mar 10, 2025

GP
5

Reviewed on Mar 10, 2025

SG
5

Reviewed on Aug 20, 2025

Explore more from Data Science