Publications:
DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving
Preprint
DroidSpeak turbocharges multi-LLM pipelines by sharing KV caches across fine-tuned models, slashing latency by up to 2.6× and boosting throughput by 3× with negligible accuracy loss.
RAGServe: Fast Quality-Aware RAG Systems with Configuration Adaptation
Preprint
RAGServe turbocharges RAG by pruning and adapting per-query configurations on the fly, slashing latency by up to 2.5× without sacrificing quality.
Grouping Algorithms for Optimal Configuration of Virtual Links in AFDX
JCST’25
Scalable, bandwidth-preserving algorithms for AFDX virtual links that optimise message allocations.
GIPUT: Maximizing Photo Coverage Efficiency for UAV Trajectory
APWeb‑WAIM’24
GIPUT models objects with realistic shapes, enabling UAVs to learn optimal trajectories while considering photo coverage, energy consumption, and bandwidth utilization. It achieves twice the efficiency of state-of-the-art algorithms.
Open-source projects that I'm maintaining:
LMCache
The first open-source Knowledge Delivery Network (KDN) that accelerates LLM applications up to 8x faster, at 8x lower cost.
vLLM production stack
Scale from single vLLM instance to distributed vLLM deployment without changing any application code.
LMBenchmark
Systematic and comprehensive benchmarks for LLM systems.
Past Projects:
Network MAC Layer Implementation for LoRa Development Board
Developed and implemented a robust MAC layer for the LoRa communication protocol, optimizing for long-range, low-power, and anti-interference performance. Designed a complete software stack integrating MCU control and advanced features like timeout retransmission and duty cycle sleep management for a software-defined radio platform.