LLM Engineer’s Handbook

LLM Engineer’s Handbook

Instructor: Packt - Course Instructors

Access provided by PiggyVest

11 modules

Gain insight into a topic and learn the fundamentals.

Beginner level

Recommended experience

2 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

11 modules

Gain insight into a topic and learn the fundamentals.

Beginner level

Recommended experience

2 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

What you'll learn

Design and manage effective LLM training and deployment pipelines.
Implement supervised fine-tuning and evaluate LLM performance.
Deploy scalable, end-to-end LLM applications using cloud tools.

Skills you'll gain

Tools you'll learn

Model Deployment

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

11 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

There are 11 modules in this course

In this comprehensive course, you will explore the intricate world of Large Language Models (LLMs) and gain the skills to design, train, and deploy them using cutting-edge MLOps practices. LLMs are revolutionizing the AI landscape, and understanding how to develop and manage them is essential for AI professionals.

This course is designed to help you not only grasp the core concepts behind LLMs but also give you hands-on experience to build production-grade LLM systems. You'll learn how to create scalable, efficient LLM systems from scratch, focusing on real-world applications that will make you stand out in the AI industry. What sets this course apart is its combination of in-depth theoretical insights and real-world, practical applications. You'll move beyond basic knowledge to master LLM architecture, supervised fine-tuning, and deployment on cloud platforms, ensuring that you’re fully equipped to build robust, production-ready systems. This course is ideal for AI engineers, NLP professionals, and anyone looking to deepen their expertise in LLM engineering. A basic understanding of LLMs, Python, and cloud platforms like AWS is recommended for optimal learning.

In this section, we delve into the concept and architecture of LLM Twin, an innovative AI model mimicking a person's writing style and personality. We discuss its significance, benefits over generic chatbots, and the planning process for creating an effective LLM product. Detailed insights into the design of the feature, training, and inference pipelines are explored to structure a robust ML system.

What's included

2 videos3 readings1 assignment

In this section, we introduce the essential tools needed for the course, particularly for the LLM Twin project. We provide an overview of the tech stack, cover installation procedures for Python and its ecosystem, dependency management with Poetry, and task execution using Poe the Poet. This section also provides insights into MLOps and LLMOps tooling, including ZenML and Hugging Face, and explains their roles in the project. Finally, we guide users in setting up an AWS account, focusing on SageMaker for deploying ML models.

What's included

1 video2 readings1 assignment

In this section, we delve into the LLM Twin project by designing a data collection pipeline for gathering raw data essential for LLM use cases, such as fine-tuning and inference. We'll focus on implementing an ETL pipeline that aggregates data from platforms like Medium and GitHub into a MongoDB data warehouse, thus simulating real-world machine learning project scenarios.

What's included

1 video4 readings1 assignment

In this section, we explore the Retrieval-augmented Generation (RAG) feature pipeline, a crucial technique for embedding custom data into large language models without constant fine-tuning. We introduce the fundamental components of a naive RAG system, such as chunking, embedding, and vector databases. We also delve into LLM Twin's RAG feature pipeline architecture, applying theoretical concepts through practical implementation, and discuss the importance of RAG for addressing issues like model hallucinations and old data. This section provides in-depth insights into advanced RAG techniques and the role of batch pipelines in syncing data for improved accuracy.

What's included

1 video7 readings1 assignment

1 videoTotal 1 minute

RAG Feature Pipeline - Overview Video1 minute

7 readingsTotal 170 minutes

Introduction10 minutes
What are Embeddings?30 minutes
DB Operations10 minutes
Exploring the LLM Twin’s RAG Feature Pipeline Architecture30 minutes
Change data capture: syncing the data warehouse and feature store30 minutes
Querying the Data Warehouse30 minutes
OVM30 minutes

1 assignmentTotal 10 minutes

Advanced Concepts in Retrieval-Augmented Generation (RAG)10 minutes

In this section, we will explore the process of Supervised Fine-Tuning (SFT) for Large Language Models (LLMs). We'll delve into the creation of instruction datasets and how they are used to refine LLMs for specific tasks. This section covers the steps involved in crafting these datasets, the importance of data quality, and presents various techniques and strategies for enhancing the fine-tuning process. Our focus will be on transforming general-purpose models into specialized assistants through SFT, enabling them to provide more coherent and relevant responses.

What's included

1 video7 readings1 assignment

1 videoTotal 1 minute

Supervised Fine-Tuning - Overview Video1 minute

7 readingsTotal 150 minutes

Introduction10 minutes
Data Deduplication30 minutes
Data Generation10 minutes
Creating Our Own Instruction Dataset30 minutes
Exploring SFT and its Techniques30 minutes
Training Parameters10 minutes
Fine-tuning in Practice30 minutes

1 assignmentTotal 10 minutes

Advanced Techniques in Language Model Fine-Tuning10 minutes

In this section, we delve into the realms of preference alignment, discussing how Direct Preference Optimization (DPO) can fine-tune language models to better align with human preferences. We elaborate on creating and evaluating preference datasets, ensuring our models capture nuanced human interactions.

What's included

1 video4 readings1 assignment

In this section, we delve into the evaluation of large language models (LLMs), addressing various evaluation methods and their significance. We cover general-purpose, domain-specific, and task-specific evaluations, highlighting the unique challenges each presents. Additionally, we explore retrieval-augmented generation (RAG) pipelines and introduce tools like Ragas and ARES for comprehensive LLM assessment.

What's included

1 video3 readings1 assignment

In this section, we dive into the art of fine-tuning large language models to boost their performance and efficiency. We'll explore key strategies to optimize the inference process of these models, a crucial step given their heavy computational and memory demands. From reducing latency to improving throughput and minimizing memory usage, we examine how to deploy specialized hardware and innovative techniques to enhance model output. By learning these optimization secrets, you'll unlock more efficient deployments, be they for fast-response tasks like code completion or document generation in batches.

What's included

1 video3 readings1 assignment

In this section, we explore the construction and implementation of a RAG inference pipeline, starting from understanding its architecture to implementing key modules such as retrieval, prompt creation, and interaction with the LLM. We introduce methods for optimizing retrieval processes like query expansion and self-querying while utilizing OpenAI's API, and integrate these techniques into a comprehensive retrieval module. We'll conclude by assembling these elements into a cohesive inference pipeline and preparing for further deployment steps.

What's included

1 video5 readings1 assignment

1 videoTotal 1 minute

RAG Inference Pipeline - Overview Video1 minute

5 readingsTotal 130 minutes

Introduction30 minutes
Self-querying30 minutes
Advanced RAG Post-retrieval Optimization: Reranking10 minutes
Implementing the LLM Twin's RAG Inference Pipeline30 minutes
Bringing Everything Together into the RAG Inference Pipeline30 minutes

1 assignmentTotal 10 minutes

Advanced RAG Pipeline Implementation10 minutes

In this section, we focus on deploying the inference pipeline for large language models (LLMs) in ML applications, ensuring models are accessible and efficient for end users. We'll cover deployment strategies, architectural decisions, and optimization techniques to address challenges like computing power and feature access.

What's included

1 video5 readings1 assignment

1 videoTotal 1 minute

Inference Pipeline Deployment - Overview Video1 minute

5 readingsTotal 110 minutes

Introduction10 minutes
Monolithic versus Microservices Architecture in Model Serving10 minutes
Exploring the LLM Twin’s Inference Pipeline Deployment Strategy30 minutes
Deploying the LLM Twin model to AWS SageMaker30 minutes
Calling the AWS SageMaker Inference Endpoint30 minutes

1 assignmentTotal 10 minutes

Modern ML Model Deployment10 minutes

In this section, we dive into the intricacies of MLOps and LLMOps, exploring their roles in automating machine learning processes and handling large language models. We will cover their origins in DevOps, highlight the unique challenges LLMOps addresses, such as prompt management and scaling issues, and illustrate the practical steps for deploying these systems efficiently. The section also includes discussions on the transition from manual deployment to cloud-based solutions, emphasizing the advantages of CI/CD pipelines and Dockerization in executing and managing models at scale.