Resource I Have for MLSys
This is like a guidance page for the resources I know for MLSys, I’ll give a brief introduction to each of them and list the link here. The resources will contain books, papers, and notes I wrote.
Books
AI System
This book is more about the hardware. I think it’s a little bit like for ECE students. I haven’t read it all yet, but I think you can find some useful topics here, such as the introduction to Nvidia GPUs, the Tensor Core, stream multiprocessors, and how the GPU actually do to accelerates the computations. It’s an intro-level course, so you don’t need to have any background. However, since it involves lots of history, which seems not that useful. I think you can just go to the specific chapter to find out what you want instead of reading it from the beginning to the end.
Another reason I recommend this book it’s that it’s open source, and you can contribute to it. I actually am one of the top 10 contributors. It’s also a really good project for you to actually practice your git skills in a real-world work environment.
The book is not sophisticated now, so you may see some errors in the web version. The way I read it is to git clone it to your local, and use the VSCode preview to read the .md files.
Here is the link: https://github.com/chenzomi12/aisystem
AI Infrastructure
This is another book from the same author, and the content of this book, I think, is what we system guys really do. It’s less about the hardware and focuses more on the distributed training/inference systems.
Here is the link: https://github.com/chenzomi12/AIInfra
Papers
I think my notes for the papers will be here, but the blog site it’s like I created recently, so some of my previous notes aren’t in it. Still, I list the corresponding link below. If you see something interesting and it happens to have a link to my notes, you can have a look.
Inference System
Orca
This is the paper that Prof. Ma gave us. I think no one uses it anymore, but you can learn some basic concepts here. Also, it’s one of the base models the vLLM used to compare with
The paper: https://www.usenix.org/system/files/osdi22-yu.pdf
My notes: https://s-tanley.github.io/blogs/2025/05/15/Orca/
vLLM
It’s one of the most popular inference serving systems nowadays, the core idea is the PagedAttention.
The paper: https://arxiv.org/abs/2309.06180
My notes: https://s-tanley.github.io/blogs/2025/05/27/vLLM/
SGLang
It’s the other one most popular inference serving system, and I think most of people use SGLang instead of vLLM now.
The paper: https://arxiv.org/abs/2312.07104
Music
ReaLchords
The paper: https://openreview.net/pdf?id=mUVydzrkgz
Pop Music Transformer(REMI)
The paper: https://arxiv.org/abs/2002.00212
Structured Multi-Track Accompaniment Arrangement
The paper: https://arxiv.org/abs/2310.16334
MoE
For now, these are papers recommended by another professor.
You can also see my notes here. I have also written a report, and I think it’s kind of concise.
MoE-Infinity
The paper: http://arxiv.org/abs/2401.14361
ProMoE
The paper: https://doi.org/10.48550/arXiv.2410.22134
TS-MoE
The paper: https://doi.org/10.48550/arXiv.2206.00277
DeepSpeed-FastGen
The paper: https://doi.org/10.48550/arXiv.2401.08671
DeepUM
The paper: https://doi.org/10.1145/3575693.3575736
GShard
The paper: https://doi.org/10.48550/arXiv.2006.16668
MC-SMoE
The paper: https://doi.org/10.48550/arXiv.2310.01334
Mu Li’s recommendation
Mu Li is a really great researcher, and he has published lots of useful resources online, which you can access through Bilibili or YouTube.
The collection of the reading paper is really useful, it’s like my 101 course for researching.
He also has a GitHub repo for this list.
I also casually take some notes when I watch the video, but it’s like most of them are really bad, I mean, I think even I won’t read them again. Anyway, I also created a repo for my notes, and in case you are interested.