Publications:
DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving
NSDI’26
The first distributed LLM inference system that enables KV cache reuse across distributed nodes running inference of different LLMs
METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation
SOSP’25
The first RAG system that jointly schedules queries and adapts the key RAG configurations of each query, such as the number of retrieved text chunks and synthesis methods
Grouping Algorithms for Optimal Configuration of Virtual Links in AFDX
JCST’25
Scalably assigning messages to Avionics Full-Duplex Switched Ethernet (AFDX) virtual links reduces bandwidth usage by 24.7%.
GIPUT: Maximizing Photo Coverage Efficiency for UAV Trajectory
APWeb‑WAIM’24
GIPUT accurately models objects with realistic shapes, enabling precise computation of UAV photo coverage, which is essential for the UAV to learn optimal trajectories.
Open-source projects that I'm maintaining:
LMCache
The first and so far the most efficient open-source KV caching solution, which extracts and stores KV caches generated by modern LLM engines (vLLM and SGLang) and shares the KV caches across engines and queries.
vLLM production stack
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
LMBenchmark
Systematic and comprehensive benchmarks for LLM systems
Past Projects:
Network MAC Layer Implementation for LoRa Development Board
Developed and implemented a robust MAC layer for the LoRa communication protocol, optimizing for long-range, low-power, and anti-interference performance. Designed a complete software stack integrating MCU control and advanced features like timeout retransmission and duty cycle sleep management for a software-defined radio platform.







