News

  • Three papers are accepted by CVPR 2025.
  • Our text-to-image model Infinity based on VAR is available now with code and demo.
  • Our paper VAR wins the Best Paper Award in NeurIPS 2024.
  • We propose a new image generation paradigm VAR | Report.
  • 15 papers are accepted in 2023y (PAMI 3, CVPR 4, ICCV 3 , ICLR 2, NeurIPS 1, and 2 papers in SIGIR and ACM MM)
  • Our ByteTrack ranks 1th of the most influential papers in ECCV 2022.
  • 15 papers are accepted in 2022y (TIP 2, CVPR 3, ECCV 5 , NeurIPS 3, and 2 papers in ICLR and AAAI)
  • Our global team wins the first prize in the Trusted Media Challenge (TMC) combatting deepfakes [report]
  • Sparse R-CNN is accepted by CVPR 2021 and integrated by Detectron2, MMDetection and PaddlePaddle.
  • Our paper Controllable Orthogonalization is selected in Best Paper Award Nominees by CVPR 2020.

  • Selected Publications [Full List]

    HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation

    Qijun Gan, Yi Ren, Chen Zhang, Zhenhui Ye, Pan Xie, Xiang Yin, Zehuan Yuan, Bingyue Peng and Jianke Zhu

    [arxiv paper] [project]

    Hllm: Enhancing sequential recommendations via hierarchical large language models for item and user modeling

    Junyi Chen, Lu Chi, Bingyue Peng, Zehuan Yuan

    [arxiv paper] [code]

    Infinity: Scaling bitwise autoregressive modeling for high-resolution image synthesis

    Jian Han, Jinlai Liu, Yi Jiang, Bin Yan, Yuqi Zhang, Zehuan Yuan, Bingyue Peng, Xiaobing Liu

    CVPR 2025 Official VAR Text2Image Model

    [paper] [project]

    Goku: Flow Based Video Generative Foundation Models

    Shoufa Chen, Chongjian Ge, Yuqi Zhang, Yida Zhang, Fengda Zhu, Hao Yang, Hongxiang Hao, Hui Wu, Zhichao Lai, Yifei Hu, Ting-Che Lin, Shilong Zhang, Fu Li, Chuan Li, Xing Wang, Yanghua Peng, Peize Sun, Ping Luo, Yi Jiang, Zehuan Yuan, Bingyue Peng, Xiaobing Liu

    CVPR 2025

    [paper] [project]

    Tokenflow: Unified image tokenizer for multimodal understanding and generation

    Liao Qu, Huichao Zhang, Yiheng Liu, Xu Wang, Yi Jiang, Yiming Gao, Hu Ye, Daniel K Du, Zehuan Yuan, Xinglong Wu

    CVPR 2025

    [paper] [code]

    Visual autoregressive modeling: Scalable image generation via next-scale prediction

    Keyu Tian, Yi Jiang, Zehuan Yuan, Bingyue Peng, Liwei Wang

    NeurIPS 2024 [Best Paper Award]

    [paper] [code]

    Autoregressive model beats diffusion: Llama for scalable image generation

    Peize Sun, Yi Jiang, Shoufa Chen, Shilong Zhang, Bingyue Peng, Ping Luo, Zehuan Yuan

    [arxiv paper] [code]

    Groma: Localized visual tokenization for grounding multimodal large language models

    Chuofan Ma, Yi Jiang, Jiannan Wu, Zehuan Yuan, Xiaojuan Qi

    ECCV 2024

    [paper] [code]

    Sparse r-cnn: End-to-end object detection with learnable proposals

    Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chenfeng Xu, Wei Zhan, Masayoshi Tomizuka, Lei Li, Zehuan Yuan, Changhu Wang, Ping Luo

    TPAMI 2023

    [paper] [code]

    Bytetrack: Multi-object tracking by associating every detection box

    Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Fucheng Weng, Zehuan Yuan, Ping Luo, Wenyu Liu, Xinggang Wang

    ECCV 2022 The most influential paper in ECCV 2022 (rank 1th)

    [paper] [code]

    Transtrack: Multiple object tracking with transformer

    Peize Sun, Jinkun Cao, Yi Jiang, Rufeng Zhang, Enze Xie, Zehuan Yuan, Changhu Wang, Ping Luo

    [arxiv paper] [code]

    Focal and global knowledge distillation for detectors

    Zhendong Yang, Zhe Li, Xiaohu Jiang, Yuan Gong, Zehuan Yuan, Danpei Zhao, Chun Yuan

    CVPR 2022

    [paper]