Zhutian (Skye) Yang

(PhD Thesis) Learning to Solve Long-Horizon Manipulation Problems

Zhutian Yang

Thesis Committee: Leslie Pack Kaelbling, Tomás Lozano-Pérez, Caelan Reed Garrett, Danfei Xu

Paper | Talk

Guiding Long-Horizon Task and Motion Planning with Vision Language Models

Zhutian Yang, Caelan Reed Garrett, Leslie Pack Kaelbling, Tomás Lozano-Pérez, and Dieter Fox

ICRA 2025; CoRL 2024 LangRob Workshop (Spotlight)

TLDR: Pretrained VLMs make mistakes in predicting robot actions when prompted with open language goals, so we use VLMs to break down long-horizon goals into subgoals, which are then solved by TAMP, in an interative replanning system. It's used to solve problems that involve interactions with 20+ objects and require 30-50 actions to complete.

Paper | Project Page | Code | Bibtex | Talk

Combining Planning and Diffusion for Mobility with Unknown Dynamics

Yajvan Ravan, Zhutian Yang, Tao Chen, Leslie Pack Kaelbling, and Tomás Lozano-Pérez

In Submission

TLDR: Rearranging large objects with unprediatble dynamics is hard because the relative pose between robot and object is changing. Diffusion policies that output global robot configurations struggle to generalize to new initial and goal conditions, or new environments and objects. So, we use motion planning to generating waypoints that guide a local diffusion policy, which is trained to achieve relative movements of the chair.

Paper | Project Page | Bibtex

Compositional Diffusion-Based Continuous Constraint Solvers

Zhutian Yang, Jiayuan Mao, Yilun Du, Jiajun Wu, Joshua Brett Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling

CoRL 2023

TLDR: Multi-step manipulation problems involve a lot of collision-free, physical stability, and culture-defined spatial constraints. Conventional methods usually solve it by sampling then rejection, which is too slow. Therefore, we find global solutions by diffusion-based optimization, using diffusion models trained for each contraint type.

Sequence-Based Plan Feasibility Prediction for Efficient Task and Motion Planning

Zhutian Yang, Caelan Reed Garrett, Leslie Pack Kaelbling, Tomás Lozano-Pérez, and Dieter Fox

RSS 2023

TLDR: In long-horizon mobile manipulation problems in complex environments with lots of articulated and movable obstacles, task and motion planners spend most computation on solving motion planning problems that aren't solvable. So we train a plan feasibility prediction model that quickly sort candidate plans by their likelihood of success using visual and language features of the problem, which cuts down planning time by 50 - 80 %.

🔥 We won Best Paper Runner-Up in CoRL 2022 Workshop on Learning, Perception, and Abstraction for Long-Horizon Planning

Let’s Handle It: Generalizable Manipulation of Articulated Objects

Zhutian Yang, and Aidan Curtis

ICRL 2022 Workshop on Generalizable Policy Learning in the Physical World (Spotlight)

🔥 We won 2nd place in the ManiSkill Challenge 2022 Robotics Track

Paper

Flexibly Instructable Robots

Zhutian Yang, Patrick Henry Winston, and David Hsu

Undergraduate thesis work; Also appeared in Advances in Cognitive Systems 2019 and DSpace@MIT

Paper | Video Demo

Zhutian (Skye) Yang

Research

Publications

Services

Misc.