publications

publications by categories in reversed chronological order.

2025

  1. ICLR
    TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
    Lijie Yang*, Zhihao Zhang*, Zhuofu Chen, Zikun Li, and Zhihao Jia
    to appear at International Conference on Learning Representations, 2025

2024

  1. LCN
    Blocking-Waived Estimation: Improving the Worst-Case End-To-End Delay Analysis in Switched Ethernet
    Lijie Yang, Théo Docquier, Ludovic Thomas, and Ye-Qiong Song
    In Proceedings of Local Computer Networks, 2024
  2. ICML
    Accelerating Retrieval-augmented Language Model Serving with Speculation
    Zhihao Zhang, Alan Zhu, Lijie Yang, Yihua Xu, Lanting Li, Phitchaya Mangpo Phothilimthana, and Zhihao Jia
    In Proceedings of International Conference on Machine Learning, 2024
  3. ASPLOS
    SpecInfer: Accelerating Large Language Model Serving with Tree-based Speculative Inference and Verification
    Xupeng Miao*, Gabriele Oliaro*, Zhihao Zhang*, Xinhao Cheng*, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, and 6 more authors
    In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, Apr 2024

2023

  1. HAL
    Technical Report: Worst-case Delay Analysis: a Simulation-based Comparison between Flow Aggregation and CPA
    Lijie Yang
    Jan 2023