写了一个新 blog: A system programmer's guide to LLM inference 希望大家喜欢！==========最近写 LLM inference 是想搞明白我一直不明白的点：Nvidia 到底哪里好，Cuda 到底好在哪里

17:19 · 2026年6月9日 · 周二

写了一个新 blog: A system programmer's guide to LLM inference

希望大家喜欢！

==========

最近写 LLM inference 是想搞明白我一直不明白的点：Nvidia 到底哪里好，Cuda 到底好在哪里。

从技术上来说，英伟达做的大多数东西都没什么 taste, 都给人一种 working but ugly 的感觉。Cuda kernel 也许曾经是 moat，但现在这些东西都可以让LLM来优化，并且非常 effective。我也不觉得 performance 是什么很难的事情。

但英伟达强在管理层的执行能力，我们往往低估 bad ideas with good execution，他们能找准一条路持之以恒的走下去，再配合上他们现在拥有的资源和hype，几乎能在任何事情上掀起一片浪花。

Xiangpeng’s blog

A system programmer’s guide to LLM inference – Xiangpeng’s blog

Let’s build a local LLM inference engine in Rust with no dependencies.