Education


Publications

LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference

Yihua Cheng*, Yuhan Liu*, Jiayi Yao*, Yuwei An, Xiaokun Chen, Shaoting Feng, Yuyang Huang, Samuel Shen, Kuntai Du, Junchen Jiang

arXiv

pdf | codes


DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving

Yuhan Liu, Yuyang Huang, Jiayi Yao, Shaoting Feng, Zhuohan Gu, Kuntai Du, Hanchen Li, Yihua Cheng, Junchen Jiang, Shan Lu, Madan Musuvathi, Esha Choukse

NSDI’26

pdf


AdaptCache: KV Cache Native Storage Hierarchy for Low-Delay and High-Quality Language Model Serving

Shaoting Feng, Hanchen Li, Kuntai Du, Zhuohan Gu, Yuhan Liu, Jiayi Yao, Siddhant Ray, Samuel Shen, Yihua Cheng, Ganesh Ananthanarayanan, Junchen Jiang

SOSP workshop BigMem’25

pdf | codes | slides


METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation

Siddhant Ray, Rui Pan, Zhuohan Gu, Kuntai Du, Shaoting Feng, Ganesh Ananthanarayanan, Ravi Netravali, Junchen Jiang

SOSP’25

pdf | poster


Grouping Algorithms for Optimal Configuration of Virtual Links in AFDX

Shaoting Feng, Quanquan Peng, Qinya Li, Fan Wu, Guihai Chen

JCST’25

pdf | codes | slides


GIPUT: Maximizing Photo Coverage Efficiency for UAV Trajectory

Shaoting Feng, Qinya Li, Yaodong Yang, Fan Wu, Guihai Chen

APWeb‑WAIM’24

pdf | codes


Presentations

Run Multi-Modality Models with LMCache

  • SIGCOMM 2025 Full-day Tutorial: Networking for Stateful LLM Inference [slides] [video], Sep. 2025
Online


GIPUT: Maximizing Photo Coverage Efficiency for UAV Trajectory

  • APWeb-WAIM 2024 [slides], Aug. 2024
Jinhua, Zhejiang, China


Experience

TensorMesh, Inc. - Summer Intern

June 2025 - September 2025

  • Core feature developments of LMCache
  • Disaggregated Prefill (PD) with LMCache: more than a 20× improvement in KV cache transmission time compared with native vLLM PD.
  • Eviction-based CPU offloading: 2.29× TTFT improvement over public vLLM; 2.79× TTFT improvement over public vLLM + LMCache.
  • Multimodal support: supports all multimodal models supported by vLLM, and speeds up inference by up to 5.49×.
  • Core feature developments of vLLM production stack
  • Fault tolerance: if a pod fails during inference, the request seamlessly continues on another pod—backed by KV-cache transfer and smart routing—with no user-visible disruption.
  • Community management for LMCache and vLLM production stack: reviewing PRs, resolving issues, designing CI/CD pipelines, and hosting community meetings.

University of Pennsylvania - Student Intern

June 2023 - September 2023

Advised by Prof. Vincent Liu and Dr. Liangcheng Yu
  • Developed a practical fairness metric that quantifies packet-level deviations from user specific baselines.
  • Implemented and validated the proposed metric using the ns-3 simulator, demonstrating its effectiveness in dynamic data center network scenarios.
  • Check out here: slides and pdf (thesis in Chinese).

Awards

MPCS Merit-Based Scholarship

Issued by UChicago Pre-Doctoral MS Program · Sep 2024


Dennis C.C.Chan Scholarship

Issued by Shanghai Jiao Tong University · Dec 2023

Awarded to 6 outstanding undergraduate students across the university.


Shanghai Government Scholarship

Issued by Shanghai Municipal Education Commission · Dec 2022

Awared to 0.175% undergraduate and associate degree students in Shanghai.


Shanghai Jiao Tong University Outstanding Scholarship

Issued by Shanghai Jiao Tong University · Dec 2021-2023

Awared to 10% undergraduate students across the university.