Discussion about this post

User's avatar
Ankana Mukherjee's avatar

Well informed article 👏 see you Friday 😃

tanzeel's avatar

This document is gold , it actually mentions the less talked about nuances in the AI world.

1.Why packing is not suited to SFT and how unsloth and Flash attention 2 resolve it

2.tokenizer_config.json has the chat template (i remember always searching docs for it for every model)

3.Think SFT explaination of taking the shortcuts (diagrams are gold here)

4.The System/User doesnt get trained on by assigned it a with -100 input_id which pytorch ignores

5.Grouped batching in SFT

6.Agenctic Sft (Would love to seee it in a lab session)

5 more comments...

No posts

Ready for more?