Hao Kang(康浩)

I am a PhD student at Georgia Institute of Technology advised by Prof. Tushar Krishna

Prior to GT, I was fortunate to have worked with Prof Baharan at UCLA about efficient machine learning from massive datasets. At MIT, I work with Prof Song Han about efficient machine learning on edge device. I received my B.Eng. in Computer Science in 2023 from Zhejiang University.

Email  /  CV  /  Github

profile photo
Research Interests

I am interested in efficient machine learning and systems, experienced in the intersection of both field. I aim to use low-rank approximation and compression algorithms to accelerate machine learning models especially LLMs. Also, I design efficient systems like inference/fine-tuning scheduler to accelerate training/inference process.

Published Papers
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
Arxiv

Hao Kang*, Qingru Zhang*, Souvik Kundu, Geonhwa Jeong, Zaoxing Liu, Tushar Krishna, Tuo Zhao
We propose a novel cache compression algorithm on KV cache for large language model inference. It can achieve near-lossless compression ratio and 2x speedup and 2x peak memory saving on inference time.

KV Cache Optimizations for Large Language Model Inference

In review Mlsys2024

Towards Sustainable Learning: Coresets for Data-efficient Deep Learning
ICML2023

Yu Yang, Hao Kang, Baharan Mirzasoleiman
We design a dataset distiling algorithm based on submodular function and batch SGD that can distil a small dataset from a large dataset. The small dataset can be used to train a model with similar performance to the model trained on the large dataset.

Research Project and Tools
torchanalyse.
A model profiling tool based on TVM and Maestro(Thanks for the help Abhi!). It can profile the model and give the flops, memory usage, and latency of each layer. It can also give the flops of each operator in the model.
More Information

Epipe: Efficient Pipeline Parallelism with Compression Algorithms.
A research project based on Gpipe and low-rank approximation which decreases bandwidth of activation transfer during cloud-based training process.
More Information

THOP: PyTorch-OpCounter.
A python third party library that counts flops of models(pytorch, jit, onnx). Already has 4k stars! I wrote the counter of onnx form.
More Information