Peilin Cai’s Personal Website

Last updated: November 30, 2025

About

I am a master student researcher focused on computer vision (CV), large language models (LLMs), and multimodal generation. At USC’s Graphics & Vision Lab (advisor: Prof. Yue Wang), my work centers on 3D reconstruction under sparse observations, controllable generative rendering, and embodied navigation.

More broadly, my research is driven by a simple goal: to enable models that both perceive and generate the real world in a scalable, physically grounded way. From structured 3D reconstruction during my undergraduate years at Wuhan University to my current work on large language models, world models, and embodied agents at USC, a recurring theme has been bridging raw sensory data with structured representations of scenes, actions, and goals. Looking ahead to a PhD, I hope to develop unified models of perception and generation for lifelong scene understanding and robust, context-aware agents.

I carried out two research projects of great personal significance at Prof. Yue Zhao’s FORTIS Lab: SecDOOD (ICCV 2025 Poster) and PERSONABENCH (NeurIPS 2025 MTI-LLM Spotlight). The former proposed a secure on-device OOD detection framework that requires no gradient backpropagation, offering insights for deploying personalized large models on edge devices; the latter introduced the first benchmark for evaluating the personalization capabilities of LLMs in multi-turn conversational settings. I am deeply grateful to Prof. Yue Zhao and the senior PhD students in the lab for their support.

At GVL, I developed The Earth Simulator, a street-view world model that turns a handful of raw, pose-free images into long-horizon, camera-controllable exploration videos grounded in 3D geometry. By combining a persistent 3D Gaussian spatial memory with a generative video model, we aim to preserve real-world structure while achieving photorealistic, temporally stable rollouts from sparse, in-the-wild driving footage rather than costly calibrated data collection. Source code and preprints are on the way!

I have strong coding skills and a solid background in computer vision and natural language processing, as well as extensive experience in training, deploying, and running inference with LLMs / VLMs. And I am also honing my research skills in robotics at the GVL Lab. If you are interested in collaborating, please feel free to reach out. My preferred email is peilinca@usc.edu

Current State:

In my third semester of the M.S. in Computer Science program at the University of Southern California.
Currently working on two projects that will be submitted for publication.
Seeking PhD opportunities.

Publications

In Submission: The Earth Simulator: Street View World Modeling with 3D Gaussian Memory and Camera Control; Peilin Cai, Weiduo Yuan, Sicheng He, Cho-Ying Wu, David Paz, Hengyuan Zhang, Yuliang Guo, Xinyu Huang, Liu Ren, Jiageng Mao, Yue Wang; in Submission, 2025

ICCV 2025 Poster: Secure On-Device Video OOD Detection Without Backpropagation; Shawn Li, Peilin Cai, Yuxiao Zhou, Zhiyu Ni, Renjie Liang, You Qin, Yi Nian, Zhengzhong Tu, Xiyang Hu, Yue Zhao; in International Conference on Computer Vision, 2025

NeurIPS 2025 MTI-LLM Workshop Spotlight (Top 5%): A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations; Li Li, Peilin Cai, Ryan A. Rossi, Franck Dernoncourt, Branislav Kveton, Junda Wu, Tong Yu, Linxin Song, Tiankai Yang, Yuehan Qin, Nesreen K. Ahmed, Samyadeep Basu, Subhojyoti Mukherjee, Ruiyi Zhang, Zhengmian Hu, Bo Ni, Yuxiao Zhou, Zichao Wang, Yue Huang, Yu Wang, Xiangliang Zhang, Philip S. Yu, Xiyang Hu, Yue Zhao; in arxiv preprint, 2025

CV

English CV 中文简历