Publications:

Open-source projects that I'm maintaining:

LMCache

LMCache

The first and so far the most efficient open-source KV caching solution, which extracts and stores KV caches generated by modern LLM engines (vLLM and SGLang) and shares the KV caches across engines and queries.

vLLM production stack

vLLM production stack

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

LMBenchmark

LMBenchmark

Systematic and comprehensive benchmarks for LLM systems

Past Projects:

Network MAC Layer Implementation for LoRa Development Board

Network MAC Layer Implementation for LoRa Development Board

Developed and implemented a robust MAC layer for the LoRa communication protocol, optimizing for long-range, low-power, and anti-interference performance. Designed a complete software stack integrating MCU control and advanced features like timeout retransmission and duty cycle sleep management for a software-defined radio platform.