From Sensing to Reasoning
This work was conducted during my internship at Lawrence Livermore National Laboratory (LLNL) and was selected as a Top Presenter. It was highlighted by LLNL Top News and Purdue ECE Top News; see also the Top Presenter feature. Work is currently in process to be submitted at Digital Discovery. Link to the Poster
Abstract
We evaluate multimodal large language models (LLMs) as protocol-aware âreasoning copilotsâ for self-driving laboratories (SDLs). Open-source families (e.g., Llama, Granite, Gemma, Hermes, LLaVA) and proprietary GPT models are benchmarked across image-based readiness checks, standard lab tasks, infeasible actions, and adversarial instructions. GPT models lead on perceptionâaccurately detecting transparent vessels and counting objectsâbut no model exceeds 80% overall accuracy under protocol and safety constraints; in several real-world reasoning scenarios, compact open-source models (2â3B parameters) match or surpass GPT performance. These results reveal persistent gaps in fusing multimodal signals with SOP semantics and in reliable, real-time decision-making. We propose a practical path forward: protocol-aware prompting, rigorous safety stress-tests, action logging, and closed-loop evaluation, positioning LLMs as assistive automators with expert fallbacksârather than autonomous controllersâto accelerate experimental science safely and effectively.