LLM Agents · Algorithm / System Co-Design

Hao Kang 康浩

I aim to build LLM agent systems and algorithms that evolve with their environments.

Agentic inference LLM serving KV-cache systems CUDA kernels
Hao Kang profile photo
4.0x agentic serving throughput
2.37x LLM attention throughput

About

Researcher in machine learning systems

I am a PhD student at Georgia Institute of Technology advised by Prof. Tushar Krishna. I am currently visiting MIT, advised by Prof. Song Han, and work closely with Han Cai.

Earlier in my PhD, I focused on efficient quantization and sparsity for LLMs to reduce serving cost, learning CUDA kernel development through these projects. Now I work on agentic LLM efficiency and system-algorithm co-design. I believe future AI systems will be adaptive and learnable from environment, algorithm, and workload feedback.

Prior to Georgia Tech, I worked with Prof. Baharan Mirzasoleiman at UCLA on efficient machine learning from massive datasets. I received my B.Eng. in Computer Science from Zhejiang University in 2023.

Selected Papers

Agents, inference, and efficient LLM systems

MLSys 2025 Attention systems

TurboAttention: Efficient Attention Approximation For High Throughput LLMs

Hao Kang, Srikant Bharadwaj, James Hensman, Tushar Krishna, Victor Ruhle, Saravan Rajmohan

Combines FlashQ for headwise KV-cache and activation quantization with SAS for dequantization-free softmax, achieving 1.2-1.8x faster attention and up to 2.37x higher throughput than FP16.

Projects

Research projects and tools

Epipe

Efficient pipeline parallelism with compression algorithms, built on Gpipe and low-rank approximation to reduce activation-transfer bandwidth.

More information