In recent years we have seen significant progress in robot learning, resulting in robot policies that are more reliable and deployable on many different scenes and tasks across robot embodiments. Such increase in robot capability necessitates a re-thinking of the robot development lifecycle of design, evaluation, and deployment. While the traditional development life cycle involves designing a method targeted at increasing the evaluation score for a handful of tasks at the researcher's own institution, a more scalable, comprehensive, and reproducible evaluation framework is needed with the increase in capability of robot policies. There is a growing need to rethink this lifecycle as a first-class problem, alongside policy design. This workshop aims to address this gap by opening discussion on: