Education
Pre-Doc MS in Computer Science
University of Chicago, Sep. 2024 - Dec. 2025 (expected)
Semester Exchange in Computer Science
EPFL (Ecole Polytechnique Fédérale de Lausanne), Sep. 2023 - Feb. 2024
B.E. in Information Engineering
Shanghai Jiao Tong University, Sep. 2020 - Jun. 2024Ranking: top 5%
Publications
LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference
Yihua Cheng*, Yuhan Liu*, Jiayi Yao*, Yuwei An, Xiaokun Chen, Shaoting Feng, Yuyang Huang, Samuel Shen, Kuntai Du, Junchen Jiang
arXiv
DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving
Yuhan Liu, Yuyang Huang, Jiayi Yao, Shaoting Feng, Zhuohan Gu, Kuntai Du, Hanchen Li, Yihua Cheng, Junchen Jiang, Shan Lu, Madan Musuvathi, Esha Choukse
NSDI’26
AdaptCache: KV Cache Native Storage Hierarchy for Low-Delay and High-Quality Language Model Serving
Shaoting Feng, Hanchen Li, Kuntai Du, Zhuohan Gu, Yuhan Liu, Jiayi Yao, Siddhant Ray, Samuel Shen, Yihua Cheng, Ganesh Ananthanarayanan, Junchen Jiang
SOSP workshop BigMem’25
METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation
Siddhant Ray, Rui Pan, Zhuohan Gu, Kuntai Du, Shaoting Feng, Ganesh Ananthanarayanan, Ravi Netravali, Junchen Jiang
SOSP’25
Grouping Algorithms for Optimal Configuration of Virtual Links in AFDX
Shaoting Feng, Quanquan Peng, Qinya Li, Fan Wu, Guihai Chen
JCST’25
GIPUT: Maximizing Photo Coverage Efficiency for UAV Trajectory
Shaoting Feng, Qinya Li, Yaodong Yang, Fan Wu, Guihai Chen
APWeb‑WAIM’24
Presentations
Run Multi-Modality Models with LMCache
OnlineGIPUT: Maximizing Photo Coverage Efficiency for UAV Trajectory
- APWeb-WAIM 2024 [slides], Aug. 2024
Experience
Univerisity of Chicago - Graduate Student
September 2024 - PRESENT
- KV cache optimization in long context and RAG scenarios.
- One of the core contributors and maintainers of open source projects on systems for LLMs: LMCache, vllm production stack, and LMBenchmark.
TensorMesh, Inc. - Summer Intern
June 2025 - September 2025
- Core feature developments of LMCache
- Disaggregated Prefill (PD) with LMCache: more than a 20× improvement in KV cache transmission time compared with native vLLM PD.
- Eviction-based CPU offloading: 2.29× TTFT improvement over public vLLM; 2.79× TTFT improvement over public vLLM + LMCache.
- Multimodal support: supports all multimodal models supported by vLLM, and speeds up inference by up to 5.49×.
- Core feature developments of vLLM production stack
- Fault tolerance: if a pod fails during inference, the request seamlessly continues on another pod—backed by KV-cache transfer and smart routing—with no user-visible disruption.
- Community management for LMCache and vLLM production stack: reviewing PRs, resolving issues, designing CI/CD pipelines, and hosting community meetings.
University of Pennsylvania - Student Intern
June 2023 - September 2023
- Developed a practical fairness metric that quantifies packet-level deviations from user specific baselines.
- Implemented and validated the proposed metric using the ns-3 simulator, demonstrating its effectiveness in dynamic data center network scenarios.
- Check out here: slides and pdf (thesis in Chinese).
Awards
MPCS Merit-Based Scholarship
Issued by UChicago Pre-Doctoral MS Program · Sep 2024
Dennis C.C.Chan Scholarship
Issued by Shanghai Jiao Tong University · Dec 2023
Awarded to 6 outstanding undergraduate students across the university.
Shanghai Government Scholarship
Issued by Shanghai Municipal Education Commission · Dec 2022
Awared to 0.175% undergraduate and associate degree students in Shanghai.
Shanghai Jiao Tong University Outstanding Scholarship
Issued by Shanghai Jiao Tong University · Dec 2021-2023
Awared to 10% undergraduate students across the university.
