From Sensing to Reasoning

This work was conducted during my internship at Lawrence Livermore National Laboratory (LLNL) and was selected as a Top Presenter. It was highlighted by LLNL Top News and Purdue ECE Top News; see also the Top Presenter feature. Work is currently in process to be submitted at Digital Discovery. Link to the Poster

Abstract

We evaluate multimodal large language models (LLMs) as protocol-aware “reasoning copilots” for self-driving laboratories (SDLs). Open-source families (e.g., Llama, Granite, Gemma, Hermes, LLaVA) and proprietary GPT models are benchmarked across image-based readiness checks, standard lab tasks, infeasible actions, and adversarial instructions. GPT models lead on perception—accurately detecting transparent vessels and counting objects—but no model exceeds 80% overall accuracy under protocol and safety constraints; in several real-world reasoning scenarios, compact open-source models (2–3B parameters) match or surpass GPT performance. These results reveal persistent gaps in fusing multimodal signals with SOP semantics and in reliable, real-time decision-making. We propose a practical path forward: protocol-aware prompting, rigorous safety stress-tests, action logging, and closed-loop evaluation, positioning LLMs as assistive automators with expert fallbacks—rather than autonomous controllers—to accelerate experimental science safely and effectively.