Supervised Finetuning for Reasoning Models (From Dataset to Deployment)
Finetuning Sessions · Lab 2 / 8
Welcome to Lab 2 of the Finetuning Sessions!
In today's lab, we're going to run our very first finetuning experiment — specifically, full finetuning.
Now, to be completely honest, this is a technique that's rarely used in real-world practice 😅
Most of the time, we rely on LoRA or QLoRA (Week 3 and 4) for efficiency and scalability. But we believe understanding full finetuning is still valuable — and that's exactly why we want to show it to you!
📕 If you haven't read Lesson 2's article, make sure to review it before going forward!
We've structured this lab into three main sections.
First, we'll dive into chat templates and their critical role in training large language models. We'll use a Google Colab notebook to walk through the key ideas and make everything concrete and practical.
In the second section, we'll explore the dataset we're using in this lab — the same one we'll continue working with throughout our finetuning sessions.
This is a synthetic dataset we created from the YouTube Commons dataset. What makes it especially interesting is the distillation process we applied to generate it.
Finally, in the last section, we'll give you an overview of the code used to train the models. And yes — we're not training just one model, but two.
We'll also cover how to deploy them so you can interact with them directly. On top of that, we'll analyze the training and evaluation losses to better understand how the models actually learned and how well the training process went.
Now that we have a clear roadmap, let's get started! 👇






