avatar
Articles
68
Tags
16
Categories
8
Stanley Zheng
Home
Archives
Tags
Categories
LogoStanley's Blog
Search
Stanley Zheng
Home
Archives
Tags
Categories

Stanley's Blog

Coding Diary(2025-6-12)
Created2025-06-12|Coding Blogs
Issue 1: pip install/module not found Basically, I am trying to set up the music transformer model. When we run pip install -r requirements.txt We encounter the error But we just ignore. Then, when we run this command python extract_mid.py We encountered this error And if you try to solve it with pip install mido, you may not solve it. This is because the pip you are using is not the pip in your virtual environment. Actually, if you use venv, you don’t have pip, instead, you have pip3. So,...
Coding Diary(2025-6-19)
Created2025-06-19|Coding Blogs
这一周可以说是过的飞快,真是学到了点东西,模型的训练也开始了。 系统方面的话,SGLang 还是基本没什么进展,但是对我们现在用的这个 music transformer 的架构是大致了解了。我们 mentor 现在是让我们从算子的结构画一个模型图出来,目前还没倒出空来干。 模型方面进展挺多的,处理了我们的新的、巨大的 dataset:Aria。现在用的是最简单的分离 melody 和 accompaniment 的方法,之后可能会试一试别的,比如 skyline。在最简单的分离方法得到的数据集上,我们已经训练了两轮模型了,一个没有 interleave_pos 这个参数,另外一个有。这个参数差不多意思就是序列交错,就是一个 acc 和 一个 mel 是交错的,有这个参数就是相当于告诉了模型这个数据训练的时候是交错的。 之所以这个系统方向进展比较缓慢,就在于我们都得首先熟悉模型,要不然系统也做不了。其次就是训练的时候遇到了一些问题,调了很久训练参数。再就是有一些工具的安装,比如说这个...
Nsight System & Nsight Compute
Created2025-06-15|Tech Blogs
有关 Nsight System 和 Nsight Compute 的介绍与常用命令总结。 Introduction 核心定位:一个看“森林”,一个看“树木” 这是理解这两个工具最核心的一点。 Nsight Systems (nsys): 系统级性能分析器,负责看“森林”。它监控您的整个应用程序,告诉您CPU和GPU之间是如何交互的,时间都花在了哪些大的模块上(例如:哪个CUDA Kernel耗时最长、数据拷贝花了多久)。 Nsight Compute (ncu): Kernel级性能分析器,负责看“树木”。当nsys帮您找到最值得优化的那棵“树”(即耗时最长的Kernel)后,ncu会深入到这棵树的内部,详细分析它的每一个细节(例如:它的计算单元利用率如何?内存访问效率高不高?具体是哪一行代码拖慢了速度?)。 一个生动的比喻: nsys 就像城市交通地图,它告诉您哪条主干道(Kernel)发生了拥堵。 ncu 就像汽车引擎诊断仪,它负责分析堵在路上的那辆车,告诉您它的引擎、油路、电路具体哪里出了问题。 对比总结表 方面 (Aspect) NVIDIA...
Reading Notes for vLLM
Created2025-05-27|Reading Paper
This is the reading note for the Efficient Memory Management for Large Language Model Serving with PagedAttention. vLLM is one of the most popular open-source inference serving systems nowadays. Basically, there are two mainstream inference serving systems: one is vLLM, and the other is SGLang. Summary Abstract & Introduction & Background This is a paper about the vLLM, a serving system for inference. The most important idea they propose is PagedAttention. vLLM with the...
Resource I Have for MLSys
Created2025-06-10|Research Blogs
This is like a guidance page for the resources I know for MLSys, I’ll give a brief introduction to each of them and list the link here. The resources will contain books, papers, and notes I wrote. Books AI System This book is more about the hardware. I think it’s a little bit like for ECE students. I haven’t read it all yet, but I think you can find some useful topics here, such as the introduction to Nvidia GPUs, the Tensor Core, stream multiprocessors, and how the GPU actually do to...
Reading Notes for SGLang
Created2025-05-27|Reading Paper
Reading Notes for Orca
Created2025-05-15|Reading Paper
This is the reading notes for the ORCA: A Distributed Serving System for Transformer-Based Generative Models. This is an OSDI conference paper from 2022. Almost all the authors come from South Korea, and actually, this is the first time I have read papers written by Koreans. Summary Abstract & Introduction & Background The paper is focused on the inference serving, they point out that the existing system is not good enough for transformer-based models. So, they propose a new method...
Transformer
Created2025-04-09|Research Blogs
本篇blog讲了transformer里的几个比较重要的概念,attention,multi-head attention, self-attention & cross-attention 以及 encoder & decoder。 单头Attention最经典公式: Attention(Q,K,V)=softmax(QKTdk)V\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V Attention(Q,K,V)=softmax(dk​​QKT​)V 多头Attention(Transformer中常用): MultiHead(Q,K,V)=Concat(softmax(QWiQ(KWiK)Tdk)VWiV)i=1hWO\text{MultiHead}(Q,K,V) =...
Linux 系统硬件信息检查命令总结
Created2025-05-22|Tech Blogs
本文档总结了在 Linux 系统中查看各种硬件组件信息的常用命令。这些命令通常需要在终端中执行。 CPU 信息 🧠 lscpu: 显示 CPU 架构、核心数、线程数、速度、缓存等详细信息。 lscpu cat /proc/cpuinfo: 查看更详细的 CPU 底层信息,每个逻辑核心都会有条目。 cat /proc/cpuinfo 通过 dmidecode 查看处理器详情 (通常需要 sudo): sudo dmidecode -t processor 内存 (RAM) 信息 💾 free -h: 以人类可读格式显示总内存、已用、可用内存及交换空间情况。 free -h cat /proc/meminfo: 查看详细的内存使用和内核统计信息。 cat /proc/meminfo sudo dmidecode -t memory 或 sudo dmidecode -t 17: 查看每个物理内存条的详细信息,如制造商、型号、序列号、容量、速度、类型 (DDR4)、Rank、是否支持 ECC 等。 sudo dmidecode -t...
1…456…8
avatar
Stanley Zheng
Hi, I am Stanley. I am currently a CS student in the University of Wisconsin-Madison.
Articles
68
Tags
16
Categories
8
Follow Me
Announcement
This is my Blog
Recent Posts
Reaction---WhynotTV(翁家翌:OpenAI,GPT,强化学习,Infra,后训练,天授,tuixue,开源,CMU,清华)2026-01-30
Asymptotic Analysis---渐近符号总结2026-01-26
24-15 Winter 课评2026-01-10
25 Spring 课评 (UW-Madison)2026-01-10
2025 年度总结2026-01-10
Categories
  • Coding Blogs8
  • Life Blogs7
  • Reaction2
  • Reading Paper5
  • Research Blogs9
  • Study Blogs24
  • Tech Blogs12
  • 书评1
Tags
随笔 Nas Calculus CV 日记 Web Multimodal Tools Operating System Math CS notes NLP Statistics Personal Blog Website MLSys
Archives
  • January 2026 8
  • December 2025 2
  • November 2025 1
  • October 2025 8
  • September 2025 3
  • August 2025 5
  • July 2025 3
  • June 2025 7
Website Info
Article Count :
68
Unique Visitors :
Page Views :
Last Update :
©2019 - 2026 By Stanley Zheng
Framework Hexo 7.3.0|Theme Butterfly 5.3.5
Search
Loading Database