Overview

In recent years we have seen significant progress in robot learning, resulting in robot policies that are more reliable and deployable on many different scenes and tasks across robot embodiments. Such increase in robot capability necessitates a re-thinking of the robot development lifecycle of design, evaluation, and deployment. While the traditional development life cycle involves designing a method targeted at increasing the evaluation score for a handful of tasks at the researcher's own institution, a more scalable, comprehensive, and reproducible evaluation framework is needed with the increase in capability of robot policies. There is a growing need to rethink this lifecycle as a first-class problem, alongside policy design. This workshop aims to address this gap by opening discussion on:

  • What are good evaluation protocols and methods for robot learning?
  • How can we make robot evaluation more reproducible and scalable, and less expensive?
  • How do we monitor robot status during deployment and ensure safety and performance?
  • How can research on safety and evaluation outside of robotics inspire that of robotics?

Schedule

Speakers

Panelists

Accepted Presentations

  • VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models
    Chongkai Gao, Zixuan Liu, Zhenghao Chi, Junshan Huang, Xin Fei, Yiwen Hou, Yuxuan Zhang, Yudi Lin, Zhirui Fang, Zeyu Jiang, Lin Shao
  • Test-Time Scaling of Vision-Language-Action Models via Self-Certainty
    Xu Luo, Jiaying Yang, Zehang Bai, Junlin Xie, Ji Zhang, Lianli Gao, Jingkuan Song
  • Evaluating Manipulation Policies in Clutter
    Amir Rasouli, Montgomery Alban
  • RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation
    Yi Ru Wang, Carter Ung, Grant Tannert, Jiafei Duan, Josephine Li, Amy Le, Markus Grotz, Rishabh Oswal, Wilbert Pumacay, Yuquan Deng, Ranjay Krishna, Dieter Fox, Siddhartha Srinivasa
  • Identity-Conditioned Preference-Aware Table Tidying with LLM-in-the-Loop
    Bojun.Long, Zhenhao.Guo, Fan.Zhu
  • Occlusion-robust Pose Estimation for Multi-Robot Systems via Geometric-aware Diffusion Matching
    Suyoung Kang, Rishav Dutta, Peng Gao, Hao Zhang
  • Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies
    Chen Xu, Tony Khuong Nguyen, Emma Dixon, Christopher Rodriguez, Patrick Miller, Robert Lee, Paarth Shah, Rares Andrei Ambrus, Haruki Nishimura, Masha Itkina
  • AURA: Autonomous Upskilling with Retrieval-Augmented Agents
    Alvin Zhu, Yusuke Tanaka, Andrew Goldberg, Dennis Hong
  • N2M: Bridging Navigation and Manipulation by Learning Pose Preference from Rollout
    Kaixin Chai, Hyunjun Lee, Joseph J Lim
  • Reliable and Scalable Robot Policy Evaluation with Imperfect Simulators
    Apurva Badithela, David Snyder, Lihan Zha, Joseph Mikhail, Matthew O'Kelly, Anushri Dixit, Anirudha Majumdar
  • Score the Steps, Not Just the Goal: VLM-Based Subgoal Evaluation for Robotic Manipulation
    Ramy ElMallah, Krish Chhajer, Chi-Guhn Lee
  • Benchmarking Affordance Generalization with BusyBox
    Dean Fortier, Timothy Adamson, Tess Hellebrekers, Teresa LaScala, Kofi Ennin, Michael Murray, Andrey Kolobov, Galen Mullins
  • SPUR: Scaling Reward Learning from Human Demonstrations
    Anthony Liang, Yigit Korkmaz, Jiahui Zhang, Jesse Zhang, Abrar Anwar, Sidhant Kaushik, Yufei Wang, Yu Xiang, David Held, Dieter Fox, Abhishek Gupta, Stephen Tu, Erdem Biyik

Organizers

Advisory Committee

Contact

Please feel free to send us your queries via email at abrar.anwar@usc.edu