Profile image

Junyi Wu 吴俊逸

I am a junior undergraduate student (2023.9 - Present) at the Departmen of EE., Shanghai Jiao Tong University (SJTU). I worked closely with Prof. Guohao Dai and Prof. Yulun Zhang. My research interests lie in LLM/DiT/dLLM models acceleration and compression through techniques such as CUDA kernel optimization, quantization, and specific algorithmic methods.

Outside of my primary research, I have interest in trading, fascinated by both quantitative and subjective trading. I interned at hedge funds doing quantitative research.

Please feel free to reach out via email or WeChat if you would like to connect further.

News

Sep 02, 2025 Our paper BalanceGS is accepted by ASP-DAC 2026.
Jun 26, 2025 Our paper QuantCache is accepted by ICCV 2025.
Mar 11, 2025 1 paper is accepted by ISCA 2025.

Publications

Equal Contribution, * Corresponding Author(s)


  1. ISCA
    SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting
    Jiaming XuJiayi PanYongkang Zhou , Siming Chen , Jinhao Li , Yaoxiu Lian , Junyi Wu , and Guohao Dai*
    In International Symposium on Computer Architecture, 2025
  2. ICCV
    QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation
    In International Conference on Computer Vision, 2025
  3. ASP-DAC
    BalanceGS: Adaptive Workload Optimization for High-Fidelity 3D Gaussian Splatting on GPU
    In 31st Asia and South Pacific Design Automation Conference, 2026
  4. arxiv
    DVD-Quant: Data-free Video Diffusion Transformers Quantization
    arXiv preprint arXiv:2505.18663, 2025
    Under Review