AI in Education

Reinforcement Learning with Human Feedback (RLHF), Clearly Explained!!!

statquestMay 5, 202518:02ai_ml_education

Summary

This video clearly explains Reinforcement Learning with Human Feedback (RLHF), a critical technique used to train and align Large Language Models (LLMs) like ChatGPT. It details how LLMs learn to generate polite and useful responses. This content is highly valuable for educators and students who need a deeper technical understanding of how prominent AI tools, increasingly relevant in education, are developed and function.

Description

Generative Large Language Models, like ChatGPT and DeepSeek, are trained on massive text based datasets, like the entire Wikipedia. However, this training alone fails to teach the models how to generate polite and useful responses to your prompts. Thus, LLMs rely on Supervised Fine-Tuning and Reinforcement Learning with Human Feedback (RLHF) to align the models to how we actually want to use them. This StatQuest explains every step in training an LLM, with special attention to how RLHF is done. NOTE: This video is based on the original manuscript for Instruct-GPT: https://arxiv.org/abs/2203.02155 Also, you should check out Serrano Academy if you can: https://www.youtube.com/@SerranoAcademy If you'd like to support StatQuest, please consider... Patreon: https://www.patreon.com/statquest ...or... YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join ...buying a book, a study guide, a t-shirt or hoodie, or a song from the StatQuest store... https://statquest.org/statquest-store/ ...or just donating to StatQuest! paypal: https://www.paypal.me/statquest venmo: @JoshStarmer Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter: https://twitter.com/joshuastarmer 0:00 Awesome song and introduction 2:25 Pre-Training an LLM 5:06 Supervised Fine-Tuning 7:35 Reinforcement Learning with Human Feedback (RLHF) 10:07 RLHF - training the reward model 15:02 RLHF - using the reward model #StatQuest

Watch on YouTube

More Videos

How AI works in Super Simple Terms!!!

How AI works in Super Simple Terms!!!

Reinforcement Learning with Neural Networks: Mathematical Details

Reinforcement Learning with Neural Networks: Mathematical Details

Reinforcement Learning with Neural Networks: Essential Concepts

Reinforcement Learning with Neural Networks: Essential Concepts

Reinforcement Learning: Essential Concepts

Reinforcement Learning: Essential Concepts

Encoder-Only Transformers (like BERT) for RAG, Clearly Explained!!!

Encoder-Only Transformers (like BERT) for RAG, Clearly Explained!!!

Luis Serrano + Josh Starmer Q&A Livestream!!!

Luis Serrano + Josh Starmer Q&A Livestream!!!