Yi Wei

I am a fifth-year PhD student at the Intelligent Vision Group (IVG), Department of Automation, Tsinghua University, advised by Prof. Jiwen Lu. My research interests lie in 3D vision, especially focusing on 3D scene understanding and 3D reconstruction. I hope my research can help the industry applications.

Prior to that, I received my Bachelor's degree from the department of Electronic Engineering, Tsinghua University in 2019 (Ranking 6/245). I have also spent some time at DeePhi Tech (Xilinx), Sensetime , Microsoft Research Asia, XPeng, ByteDance, PhiGent Robtics, Gaussian Robotics and Apple.

Email  /  Google Scholar  /  Github  /  Twitter  /  Curriculum Vitae

profile photo

  • 2024-02: One paper on 3D AIGC is accepted to CVPR 2024.
  • 2023-07: Two papers on occupancy prediction are accepted to ICCV 2023.
  • 2023-07: The journal version of PV-RAFT is accepted to T-PAMI.
  • 2023-04: The journal version of NerfingMVS is accepted to T-PAMI.
  • 2023-03: I am a recipient of the 2023 Apple Scholars in AI/ML PhD fellowship.
  • 2022-09: One paper on self-supervised multi-camera depth estimation is accepted to CoRL 2022.
  • 2022-07: One paper on LiDAR-based 3D object detection is accepted to ECCV 2022.
  • 2022-06: One paper on robotic exploration is accepted to IROS 2022.
  • 2021-07: Three papers (including 1 oral) on NeRF, depth estimation and 3D pretraining are accepted to ICCV 2021.
  • 2021-03: One paper on 3D scene flow estimation is accepted to CVPR 2021.
  • 2021-03: One paper on weakly supervised 3D detection is accepted to ICRA 2021.
  • Selected Publications

    * indicates equal contribution

    dise OccNeRF: Self-Supervised Multi-Camera Occupancy Prediction with Neural Radiance Fields
    Chubin Zhang*, Juncheng Yan*, Yi Wei*, Jiaxin Li, Li Liu, Yansong Tang, Yueqi Duan, Jiwen Lu
    arXiv, 2023
    [Project page] [arXiv] [Code]

    We propose an OccNeRF method for self-supervised multi-camera occupancy prediction, which adopts the parameterized occupancy fields, multi-frame photometric loss and open-vocabulary 2D segmentation.

    dise Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior
    Fangfu Liu, Diankun Wu, Yi Wei, Yongming Rao , Yueqi Duan
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
    [Project page] [arXiv] [Code]

    We propose Sherpa3D, a new text-to-3D framework that achieves high-fidelity, generalizability, and geometric consistency simultaneously.

    dise SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving
    Yi Wei*, Linqing Zhao*, Wenzhao Zheng, Zheng Zhu, Jie Zhou , Jiwen Lu
    IEEE International Conference on Computer Vision (ICCV), 2023
    [Project page] [arXiv] [Code]

    We propose a SurroundOcc method to predict the volumetric occupancy with multi-camera images and generate dense occupancy ground truth with sparse LiDAR points.

    dise OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception
    Xiaofeng Wang*, Zheng Zhu*, Wenbo Xu*, Yunpeng Zhang, Yi Wei, Xu Chi, Yun Ye, Dalong Du, Jiwen Lu , Xingang Wang
    IEEE International Conference on Computer Vision (ICCV), 2023
    [arXiv] [Code]

    Towards a comprehensive benchmarking of surrounding perception algorithms, we propose OpenOccupancy, which is the first surrounding semantic occupancy perception benchmark.

    dise 3D Point-Voxel Correlation Fields for Scene Flow Estimation
    Ziyi Wang*, Yi Wei*, Yongming Rao , Jie Zhou , Jiwen Lu
    IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI, IF: 24.31), 2023
    [Paper] [Code]

    We propose Deformable PV-RAFT, where the Spatial Deformation deforms the voxelized neighborhood, and the Temporal Deformation controls the iterative update process.

    dise Depth-Guided Optimization of Neural Radiance Fields for Indoor Multi-View Stereo
    Yi Wei, Shaohui Liu, Jie Zhou , Jiwen Lu
    IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI, IF: 24.31), 2023
    [Paper] [Code]

    Beyond NerfingMVS, we further present NerfingMVS++, where a coarse-to-fine depth priors training strategy is proposed to directly utilize sparse SfM points and the uniform sampling is replaced by Gaussian sampling to boost the performance.

    dise LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection
    Yi Wei, Zibu Wei, Yongming Rao, Jiaxin Li, Jiwen Lu , Jie Zhou
    European Conference on Computer Vision (ECCV), 2022
    [arXiv] [Code] [中文解读]

    We propose the LiDAR Distillation to bridge the domain gap induced by different LiDAR beams for 3D object detection.

    dise SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation
    Yi Wei*, Linqing Zhao*, Wenzhao Zheng, Zheng Zhu, Yongming Rao, Guan Huang, Jiwen Lu , Jie Zhou
    Conference on Robot Learning (CoRL), 2022
    [Project page] [arXiv] [Code] [中文解读]

    We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict scale-aware depth maps across cameras.

    dise NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo
    Yi Wei, Shaohui Liu, Yongming Rao, Wang Zhao, Jiwen Lu , Jie Zhou
    IEEE International Conference on Computer Vision (ICCV), 2021, Oral Presentation
    [Project page] [arXiv] [Code] [Video] [中文解读]

    We present a new multi-view depth estimation method that utilizes both conventional SfM reconstruction and learning-based priors over the recently proposed neural radiance fields (NeRF).

    dise A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo
    Wang Zhao*, Shaohui Liu*, Yi Wei , Hengkai Guo , Yong-jin Liu
    IEEE International Conference on Computer Vision (ICCV), 2021
    [Project page] [arXiv] [Code]

    We propose a novel solver that iteratively solves for per-view depth map and normal map by optimizing an energy potential based on the locally planar assumption.

    dise PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds
    Yi Wei*, Ziyi Wang*, Yongming Rao *, Jiwen Lu , Jie Zhou
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
    [arXiv] [Code] [Video]

    We present point-voxel correlation fields for 3D scene flow estimation which migrates the high performance of RAFT and provides a solution to build structured all-pairs correlation fields for unstructured point clouds.

    dise FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle Detection
    Yi Wei, Shang Su, Jiwen Lu , Jie Zhou
    IEEE International Conference on Robotics and Automation (ICRA), 2021
    [arXiv] [Code] [Video]

    We propose a weakly supervised 3D detection method without using 3D labels, which consists of coarse 3D segmentation and 3D bounding box estimation two stages.

    dise Conditional Single-view Shape Generation for Multi-view Stereo Reconstruction
    Yi Wei*, Shaohui Liu *, Wang Zhao *, Jiwen Lu , Jie Zhou
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019
    [Project] [arXiv] [Code]

    we present a new perspective towards image-based shape generation. Unlike most single-view methods which are sometimes insufficient to determine a single groundtruth shape because the back part is occluded, our method levergae multi-view consistency for 3D reconstruction.

    dise Quantization mimic: Towards very tiny cnn for object detection
    Yi Wei, Xinyu Pan , Hongwei Qin , Junjie Yan
    European Conference on Computer Vision (ECCV), 2018

    we propose a simple and general framework for training very tiny CNNs for object detection. Our method leverages the fact that mimic and quantization can facilitate each other.

    dise Two-stream binocular network: Accurate near field finger detection based on binocular images
    Yi Wei, Guijin Wang , Cairong Zhang , Hengkai Guo , Xinghao Chen , Huazhong Yang ,
    IEEE Visual Communications and Image Processing (VCIP), 2017   (Best Student Paper Award)

    We propose the Two-Stream Binocular Network (TSBnet) to detect fingertips from binocular images. Different with previous depth-based methods, we directly regress 3D positions of fingertip from left and right images.

    dise Apple
    AI/ML Group, Research Intern
    Topic: 3D AIGC
    dise Gaussian Robotics
    Gaussian-Tsinghua joint laboratory, Project leader
    Topic: Sensor calibration, Drivable space detection, LiDAR-based 3D object detection, Depth estimation, 3D reconstruction
    dise ByteDance
    SLAM & 3D Vision Group, Engineer&Research Intern
    Topic: Sky AR, Advertisement AR, Self-supervised depth estimation, Plane-assisted multi-view stereo, Multiple plane detection
    dise XPeng
    LiDAR Group, Engineer Intern
    Topic: LiDAR-based 3D object detection, LiDAR-based model quantization
    dise MSRA
    Intelligent Multimedia Group, Research Intern
    Topic: Multi-view hand pose estimation
    dise Sensetime
    Video Intelligence Group, Engineer&Research Intern
    Topic: Model compression
    dise Deephi
    Engineer Intern
    Topic: Real-time object detection
    Honors and Awards

  • 2023 Apple Scholar / 苹果学者奖学金 (22 people in the world, 2 people in China)
  • 2023 Ubiquant Scholar / 九坤奖学金
  • 2021 National Scholarship / 国家奖学金
  • 2019 Beijing Outstanding Graduate / 北京市优秀毕业生
  • 2018 Caixiong Scholarship / 清华科创类专项奖 (10 people in Tsinghua)
  • 2018 Baogang Outstanding Scholarship / 宝钢优秀学生特等奖 (1 person in Tsinghua)
  • 2017 National Scholarship / 国家奖学金
  • 2017 Qualcomm Scholarship / 高通奖学金 (30 people in Tsinghua)
  • 2017 Sensetime Scholarship / 商汤奖学金 (30 people in China)
  • Academic Services

  • Conference Reviewer / Program Committee Member: CVPR 2024, ICCV 2023, ICRA 2023, ECCV 2022, CVPR 2022, ICCV 2021, CVPR 2021, ICIP 2021, WACV 2021, ACCV 2020, CVPR 2020, ICIP 2019
  • Journal Reviewer: T-PAMI, T-IP, T-MM, T-CSVT

  • Website Template

    © Yi Wei | Last updated: Dec 21, 2023