Projects

Publications:

DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving

Preprint

DroidSpeak turbocharges multi-LLM pipelines by sharing KV caches across fine-tuned models, slashing latency by up to 2.6× and boosting throughput by 3× with negligible accuracy loss.

RAGServe: Fast Quality-Aware RAG Systems with Configuration Adaptation

Preprint

RAGServe turbocharges RAG by pruning and adapting per-query configurations on the fly, slashing latency by up to 2.5× without sacrificing quality.

Grouping Algorithms for Optimal Configuration of Virtual Links in AFDX

JCST’25

Scalable, bandwidth-preserving algorithms for AFDX virtual links that optimise message allocations.

GIPUT: Maximizing Photo Coverage Efficiency for UAV Trajectory

GIPUT models objects with realistic shapes, enabling UAVs to learn optimal trajectories while considering photo coverage, energy consumption, and bandwidth utilization. It achieves twice the efficiency of state-of-the-art algorithms.

Open-source projects that I'm maintaining:

LMCache

September, 2024

The first open-source Knowledge Delivery Network (KDN) that accelerates LLM applications up to 8x faster, at 8x lower cost.

vLLM production stack

February, 2025

Scale from single vLLM instance to distributed vLLM deployment without changing any application code.

LMBenchmark

April, 2025

Systematic and comprehensive benchmarks for LLM systems.

Past Projects:

Network MAC Layer Implementation for LoRa Development Board

June, 2023

Developed and implemented a robust MAC layer for the LoRa communication protocol, optimizing for long-range, low-power, and anti-interference performance. Designed a complete software stack integrating MCU control and advanced features like timeout retransmission and duty cycle sleep management for a software-defined radio platform.

Shaoting Feng

Projects

Research

Open-Source

Projects

Publications:

DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving

Preprint

RAGServe: Fast Quality-Aware RAG Systems with Configuration Adaptation

Preprint

Grouping Algorithms for Optimal Configuration of Virtual Links in AFDX

JCST’25

GIPUT: Maximizing Photo Coverage Efficiency for UAV Trajectory

APWeb‑WAIM’24

Open-source projects that I'm maintaining:

LMCache

vLLM production stack

LMBenchmark

Past Projects:

Network MAC Layer Implementation for LoRa Development Board

Shaoting Feng

About