Vision Language Models

Advancing Vision-Language Models: New Alignment Methods in TRL

Beyond DPO: New Multimodal Alignment Methods in TRL Vision-Language Models (VLMs) are becoming more capable, but aligning them with human preferences remains crucial. Previously, we introduced Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) for VLMs in TRL. Now, we’re pushing further with three advanced alignment techniques:  Mixed Preference Optimization (MPO) – Combines DPO, SFT, and quality loss for better reasoning  Group Relative Policy…

Read More
Home
Courses
Services
Search