publications

publications by categories in reversed chronological order.

2026

Axe: A Simple Unified Layout Abstraction for Machine Learning Compilers

Bohan Hou, Hongyi Jin, Guanjie Wang, Jinqi Chen, Yaxing Cai, Lijie Yang, Zihao Ye, Yaoyao Ding, Ruihang Lai, and 1 more author

2026
CaveAgent: Transforming LLMs into Stateful Runtime Operators

Maohao Ran, Zhenglin Wan, Cooper Lin, Yanting Zhang, Hongyu Xin, Hongwei Fan, Yibo Xu, Beier Luo, Yaxin Zhou, and 13 more authors

2026

2025

arXiv

Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

Hongyin Luo, Nathaniel Morgan, Tina Li, Derek Zhao, Ai Vy Ngo, Philip Schroeder, Lijie Yang, Assaf Ben-Kish, Jack O’Brien, and 1 more author

2025

Website
arXiv

Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning

Lijie Yang^*, Zhihao Zhang^*, Arti Jain, Shijie Cao, Baihong Yuan, Yiwei Chen, Zhihao Jia, and Ravi Netravali

2025

Code Website
IoT

From Machine Learning-Based to LLM-Enhanced: An Application-Focused Analysis of How Social IoT Benefits from LLMs

Lijie Yang, and Runbo Su

IEEE Internet of Things Journal, Apr 2025

Website
ICLR

TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Lijie Yang^*, Zhihao Zhang^*, Zhuofu Chen, Zikun Li, and Zhihao Jia

In proceedings of International Conference on Learning Representations, Apr 2025

Code Website

2024

LCN

Blocking-Waived Estimation: Improving the Worst-Case End-To-End Delay Analysis in Switched Ethernet

Lijie Yang, Théo Docquier, Ludovic Thomas, and Ye-Qiong Song

In Proceedings of Local Computer Networks, Apr 2024

Code Website
ICML

Accelerating Retrieval-augmented Language Model Serving with Speculation

Zhihao Zhang, Alan Zhu, Lijie Yang, Yihua Xu, Lanting Li, Phitchaya Mangpo Phothilimthana, and Zhihao Jia

In Proceedings of International Conference on Machine Learning, Apr 2024

Code Website
ASPLOS

SpecInfer: Accelerating Large Language Model Serving with Tree-based Speculative Inference and Verification

Xupeng Miao^*, Gabriele Oliaro^*, Zhihao Zhang^*, Xinhao Cheng^*, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, and 6 more authors

In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, Apr 2024

Code Website

2023

HAL

Technical Report: Worst-case Delay Analysis: a Simulation-based Comparison between Flow Aggregation and CPA

Lijie Yang

Jan 2023

PDF