Lijie (Derrick) Yang

I am a CS PhD at Princeton University fortunate to be advised by Prof. Ravi Netravali and Prof. Tri Dao. I obtained my bachelor degree in Computer Science from Carnegie Mellon, where I was advised by Prof. Zhihao Jia and worked closely with Prof. Tianqi Chen.

Research Interests: My work focuses on building efficient deep learning systems with the co-design of hardware and software. I’m particularly passionate about exploring the potential of state-of-the-art AI models like language models in reasoning and long-context tasks.

If you’re interested in discussing research, exploring collaboration opportunities, or want informal advice about PhD applications, I’d love to connect — don’t hesitate to reach out to me at ly3223@princeton.edu!

news

Mar 06, 2026	My google scholar has been migrated to this link.
Feb 15, 2026	Axe, hardware-aware multi-granulatiry layout abstraction, and CaveAgent, dual-stream context arch for agentic workflows, are on ArXiv!
Jan 27, 2026	Event Tesnor is accepted to MLSys 2026 !
Oct 30, 2025	Glad to be selected as a Top Reviewer for NeurIPS 2025 !
Aug 13, 2025	Our paper on sparse attention for efficient reasoning, LessIsMore, is on ArXiv!
Jul 21, 2025	Graduated from CMU, more than excited about starting my PhD at Princeton University !
May 15, 2025	Honored to receive The Allen Newell Award for Research Excellence, Honorable Mention !
Jan 20, 2025	TidalDecode is accepted to ICLR 2025, see you in Singapore !
Nov 14, 2024	Gave a talk at CMU Catalyst Lab on TidalDecode
Oct 08, 2024	Our project on sparse attention for long-context models, TidalDecode, is on ArXiv!
Aug 24, 2024	Honored to be an early inductee into Phi Beta Kappa (ΦΒΚ) of Class 2025 !
Jul 03, 2024	BWE is accepted to LCN 2024 !
May 01, 2024	RalmSpec is accepted to ICML 2024 !
Mar 03, 2024	SpecInfer is accepted to ASPLOS 2024 !

selected publications

arXiv

Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning

Lijie Yang^*, Zhihao Zhang^*, Arti Jain, Shijie Cao, Baihong Yuan, Yiwei Chen, Zhihao Jia, and Ravi Netravali

2025

Code Website
ICLR

TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Lijie Yang^*, Zhihao Zhang^*, Zhuofu Chen, Zikun Li, and Zhihao Jia

In proceedings of International Conference on Learning Representations, 2025

Code Website
LCN

Blocking-Waived Estimation: Improving the Worst-Case End-To-End Delay Analysis in Switched Ethernet

Lijie Yang, Théo Docquier, Ludovic Thomas, and Ye-Qiong Song

In Proceedings of Local Computer Networks, 2024

Code Website
ICML

Accelerating Retrieval-augmented Language Model Serving with Speculation

Zhihao Zhang, Alan Zhu, Lijie Yang, Yihua Xu, Lanting Li, Phitchaya Mangpo Phothilimthana, and Zhihao Jia

In Proceedings of International Conference on Machine Learning, 2024

Code Website
ASPLOS

SpecInfer: Accelerating Large Language Model Serving with Tree-based Speculative Inference and Verification

Xupeng Miao^*, Gabriele Oliaro^*, Zhihao Zhang^*, Xinhao Cheng^*, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, and 6 more authors

In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, Apr 2024

Code Website