02 CMU · MMML
Fine-tuning LLaVA for Web Agents
2024
Lede
Group project at CMU. We took LLaVA, fine-tuned it on VisualWebBench, and pushed the open-model score up.
Detail
MMML capstone at CMU. The starting point was a baseline LLaVA checkpoint that did poorly on web-grounded tasks. Most of the work was unglamorous: cleaning and augmenting the VisualWebBench training data, picking a LoRA configuration that fit on a single A100, and running enough ablations to know which screenshots actually moved the metric. The interesting finding was that visual cues mattered less than we expected and bounding-box hints mattered more.
Stack
- LLaVA
- LoRA
- PyTorch
- VisualWebBench