Tong Zhang 张彤

I am a Ph.D. student at Institute for Interdisciplinary Information Sciences (IIIS, headed by Turing award winner Prof. Andrew Chi-Chih Yao), Tsinghua University. I am fortunate to be advised by Prof. Yang Gao. Previously, I received my bachelor's degree from Department of Electronic Engineering at Tsinghua University.

My primary research interest is in Embodied AI, which lies at the intersection of Artificial Intelligence and Robotics. I focus on topics such as robotic manipulation, 3D vision, representation learning, and sample efficiency. Recently, I am particularly interested in humanoid whole-body control.

Email  /  Google Scholar  /  Github

profile photo

News

  • [2024.09] Three papers (SGRv2, RLFP, and General Flow) are accepted at CoRL 2024.
  • [2023.12] Invited oral presentation at DAI 2023.
  • [2023.10] Invited talk at RLChina.
  • [2023.08] One paper (SGR) is accepted at CoRL 2023.
  • [2023.04] One paper (SGN) is accepted at CVPR 2023 Workshop on 3D Vision and Robotics.
  • Publications

    Leveraging Locality to Boost Sample Efficiency in Robotic Manipulation
    Tong Zhang, Yingdong Hu, Jiacheng You, Yang Gao
    CoRL, 2024
    project page / arXiv / code / X summary

    We introduce SGRv2, an imitation learning framework that enhances sample efficiency through improved visual and action representations. Central to the design of SGRv2 is the incorporation of a critical inductive bias-action locality, which posits that robot's actions are predominantly influenced by the target object and its interactions with the local environment.

    Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own
    Weirui Ye, Yunsheng Zhang, Haoyang Weng, Xianfan Gu, Shengjie Wang, Tong Zhang, Mengchen Wang, Pieter Abbeel, Yang Gao
    CoRL, 2024 (Oral Presentation)

    We propose Reinforcement Learning with Foundation Priors (RLFP) to utilize guidance and feedback from policy, value, and success-reward foundation models. Within this framework, we introduce the Foundation-guided Actor-Critic (FAC) algorithm, which enables embodied agents to explore more efficiently with automatic reward functions.

    General Flow as Foundation Affordance for Scalable Robot Learning
    Chengbo Yuan, Chuan Wen, Tong Zhang, Yang Gao
    CoRL, 2024
    project page / arXiv / code

    We build a 3D flow prediction model directly from large-scale RGBD human video datasets. Based on this model, we achieve stable zero-shot human-to-robot skill transfer in the real world.

    Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning
    Yingdong Hu*, Fanqi Lin*, Tong Zhang, Li Yi, Yang Gao
    arXiv, 2023
    project page / arXiv

    We introduce ViLa, a novel approach for long-horizon robotic planning that leverages GPT-4V to generate a sequence of actionable steps. ViLa empowers robots to execute complex tasks with a profound understanding of the visual world.

    A Universal Semantic-Geometric Representation for Robotic Manipulation
    Tong Zhang*, Yingdong Hu*, Hanchen Cui, Hang Zhao, Yang Gao
    CoRL, 2023
    CVPR Workshop on 3D Vision and Robotics, 2023
    project page / arXiv / code

    We present Semantic-Geometric Representation (SGR), a universal perception module for robotics that leverages the rich semantic information of large-scale pre-trained 2D models and inherits the merits of 3D spatial reasoning.