News

  • We open-sourced Bernini, a unified visual generation and editing model with performance on par with leading closed-source models.
  • A novel visual generation paradigm GRN is introduced.
  • We release a strong video generation model Waver that ranks Top3 in the Artificial Analysis Arena
  • Our text-to-image model Infinity based on VAR is available now with code and demo.
  • Our paper VAR wins the Best Paper Award in NeurIPS 2024.
  • We propose a new image generation paradigm VAR | Report.
  • 15 papers are accepted in 2023y (PAMI 3, CVPR 4, ICCV 3 , ICLR 2, NeurIPS 1, and 2 papers in SIGIR and ACM MM)
  • Our ByteTrack ranks 1th of the most influential papers in ECCV 2022.
  • 15 papers are accepted in 2022y (TIP 2, CVPR 3, ECCV 5 , NeurIPS 3, and 2 papers in ICLR and AAAI)
  • Our global team wins the first prize in the Trusted Media Challenge (TMC) combatting deepfakes [report]
  • Sparse R-CNN is accepted by CVPR 2021 and integrated by Detectron2, MMDetection and PaddlePaddle.
  • Our paper Controllable Orthogonalization is selected in Best Paper Award Candidates by CVPR 2020.

  • Selected Publications [Full List]

    Bernini: Latent Semantic Planning for Video Diffusion

    Chenchen Liu, Junyi Chen, Lei Li, Lu Chi, Mingzhen Sun, Zhuoying Li, Yi Fu, Ruoyu Guo, Yiheng Wu, Ge Bai, Zehuan Yuan

    State-of-the-art open-source models for multimodal generation and editing

    [arxiv paper] [project] [models]

    GRN: Generative Refinement Networks for Visual Synthesis

    Jian Han, Jinlai Liu, Jiahuan Wang, Bingyue Peng, Zehuan Yuan

    A novel visual synthesis paradigm to support adaptive generation

    [arxiv paper] [project]

    ALIVE: Animate Your World with Lifelike Audio-Video Generation

    Ying Guo, Qijun Gan, Yifu Zhang, Jinlai Liu, Yifei Hu, Pan Xie, Dongjun Qian, Yu Zhang, Ruiqi Li, Yuqi Zhang, Ruibiao Lu, Xiaofeng Mei, Bo Han, Xiang Yin, Bingyue Peng, Zehuan Yuan

    [arxiv paper] [project]

    Waver: Wave Your Way to Lifelike Video Generation

    Yifu Zhang, Hao Yang, Yuqi Zhang, Yifei Hu, Fengda Zhu, Chuang Lin, Xiaofeng Mei, Yi Jiang, Bingyue Peng, Zehuan Yuan

    Rank top3 in T2V & I2V arena leaderboard

    [arxiv paper] [project]

    HLLM: Enhancing sequential recommendations via hierarchical large language models for item and user modeling

    Junyi Chen, Lu Chi, Bingyue Peng, Zehuan Yuan

    [arxiv paper] [code]

    Infinity: Scaling bitwise autoregressive modeling for high-resolution image synthesis

    Jian Han, Jinlai Liu, Yi Jiang, Bin Yan, Yuqi Zhang, Zehuan Yuan, Bingyue Peng, Xiaobing Liu

    CVPR 2025 Official VAR Text2Image Model

    [paper] [project]

    Goku: Flow Based Video Generative Foundation Models

    Shoufa Chen, Chongjian Ge, Yuqi Zhang, Yida Zhang, Fengda Zhu, Hao Yang, Hongxiang Hao, Hui Wu, Zhichao Lai, Yifei Hu, Ting-Che Lin, Shilong Zhang, Fu Li, Chuan Li, Xing Wang, Yanghua Peng, Peize Sun, Ping Luo, Yi Jiang, Zehuan Yuan, Bingyue Peng, Xiaobing Liu

    CVPR 2025

    [paper] [project]

    Tokenflow: Unified image tokenizer for multimodal understanding and generation

    Liao Qu, Huichao Zhang, Yiheng Liu, Xu Wang, Yi Jiang, Yiming Gao, Hu Ye, Daniel K Du, Zehuan Yuan, Xinglong Wu

    CVPR 2025

    [paper] [code]

    Visual autoregressive modeling: Scalable image generation via next-scale prediction

    Keyu Tian, Yi Jiang, Zehuan Yuan, Bingyue Peng, Liwei Wang

    NeurIPS 2024 [Best Paper Award]

    [paper] [code]

    Autoregressive model beats diffusion: Llama for scalable image generation

    Peize Sun, Yi Jiang, Shoufa Chen, Shilong Zhang, Bingyue Peng, Ping Luo, Zehuan Yuan

    [arxiv paper] [code]

    Groma: Localized visual tokenization for grounding multimodal large language models

    Chuofan Ma, Yi Jiang, Jiannan Wu, Zehuan Yuan, Xiaojuan Qi

    ECCV 2024

    [paper] [code]

    Sparse r-cnn: End-to-end object detection with learnable proposals

    Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chenfeng Xu, Wei Zhan, Masayoshi Tomizuka, Lei Li, Zehuan Yuan, Changhu Wang, Ping Luo

    TPAMI 2023

    [paper] [code]

    Bytetrack: Multi-object tracking by associating every detection box

    Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Fucheng Weng, Zehuan Yuan, Ping Luo, Wenyu Liu, Xinggang Wang

    ECCV 2022 The most influential paper in ECCV 2022 (rank 1th)

    [paper] [code]