顶级 AI 公司人才学习路径 / Top AI Talent Pathways

Part 1 · 共性研究

Part 1 · Commonalities

一、学校分布的三条带

1. Three school clusters

研究科学家路径上"门票学校"不到 15 所；Infra 路径上学校权重显著下降，"做过的事"权重上升。

For the research-scientist track, fewer than 15 schools issue most of the "tickets". For the infra track, school weight drops sharply — what you've shipped weighs more.

🇺🇸 美 / 加US / Canada

深度学习正统

Deep-learning orthodoxy

Toronto · Sutskever（Hinton 学生）、Karpathy 本科
Stanford · Karpathy PhD、Tri Dao、Jared Kaplan
Berkeley · Schulman（Abbeel）、Aravind Srinivas
CMU · 杨植麟 PhD
Princeton · Dario Amodei 生物物理
Caltech / MIT · Schulman 本科、何恺明 2024 任教

Toronto · Sutskever (Hinton's student), Karpathy undergrad
Stanford · Karpathy PhD, Tri Dao, Jared Kaplan
Berkeley · Schulman (Abbeel), Aravind Srinivas
CMU · Zhilin Yang PhD
Princeton · Dario Amodei (biophysics)
Caltech / MIT · Schulman undergrad; Kaiming He on faculty since 2024

🇪🇺 欧洲Europe

古典学院体系

Classical academy lineage

剑桥 + UCL · Hassabis CS 本 + 认知神经 PhD
X + ENS · Mistral 三人组（Mensch / Lample / Lacroix）全部出身
ETH Zürich · 大量 FAIR / DeepMind 研究员

Cambridge + UCL · Hassabis (CS undergrad + cognitive-neuro PhD)
X + ENS · The Mistral trio (Mensch / Lample / Lacroix) all from here
ETH Zürich · Heavy presence at FAIR / DeepMind

🇨🇳 中国China

本土主力学校

Domestic powerhouses

清华 · 何恺明基科、杨植麟、唐杰
CUHK · 汤晓鸥系（何恺明 PhD → 商汤一代）
浙大 · 梁文锋（异类：量化 infra 转 AI）
北大 / 上交 ACM / 中科大少年班

Tsinghua · Kaiming He (Foundation Class), Zhilin Yang, Jie Tang
CUHK · Xiaoou Tang's lab (He's PhD advisor → SenseTime generation)
Zhejiang U. · Liang Wenfeng (outlier: quant-infra to AI)
PKU / SJTU ACM Class / USTC Junior College

结论 Takeaway

研究科学家路径"门票学校"不到 15 所；研究工程师 / Infra 路径上学校权重显著下降，"做过的事"权重上升。

Fewer than 15 schools dominate the research-scientist pipeline. On the infra track, school weight drops sharply and shipped work outweighs pedigree.

Part 1 · 共性研究

Part 1 · Commonalities

二、专业背景

2. Undergrad majors

物理 / 数学背景在"从零搭新范式"上有结构性优势；纯 CS 在"把现有范式做到极致"上更熟练。

Physics/math backgrounds have a structural edge at building new paradigms; pure CS excels at pushing existing ones to their limit.

物理 → ML

Physics → ML

Anthropic 核心 · Kaplan / Schulman / DarioAnthropic core · Kaplan / Schulman / Dario

数学 → ML

Math → ML

Sutskever / Tri Dao / Shazeer

CS 主线

CS mainline

最大头：Karpathy / 何恺明 / Hassabis / MenschBiggest cohort: Karpathy / He / Hassabis / Mensch

EE / 信号

EE / Signals

梁文锋 / 孙剑（infra / 视觉）Liang Wenfeng / Jian Sun (infra / vision)

神经科学

Neuroscience

Hassabis（DeepMind 灵魂）Hassabis (DeepMind's soul)

Kaplan 的 Scaling Laws (2020) 本质是统计物理思维（Wilson RG、有限尺度标度），物理 PhD 训练出的"找 universal scaling"直觉对路。这条路径在 Anthropic 密度极高。

Kaplan's Scaling Laws (2020) is essentially statistical-physics thinking (Wilson RG, finite-size scaling). The physics-PhD instinct for "finding universal scaling" maps directly onto frontier ML. The density of this lineage at Anthropic is extreme.

判断 Call

物理 / 数学背景在"从零搭新范式"（scaling laws、Mamba、新架构）上有结构性优势；纯 CS 在"把现有范式做到极致"（系统、ranking、infra）上更熟练。两者不可替代。

Physics/math people have a structural edge at building new paradigms from scratch (scaling laws, Mamba, novel architectures). Pure-CS people are better at perfecting existing paradigms (systems, ranking, infra). Neither is replaceable.

Part 1 · 共性研究

Part 1 · Commonalities

三、竞赛经历

3. Competitions

不同公司偏好不同竞赛——这是简历筛选的隐性硬通货。

Different labs favour different competitions — this is the resume's hidden hard currency.

Putnam 研究Research

Noam Shazeer 是 Putnam Fellow（前五）。Google Brain / Anthropic senior 里 Putnam 出现频率极高。

Noam Shazeer is a Putnam Fellow (top five). Putnam shows up at extreme frequency among Google Brain / Anthropic seniors.

IMO / IPhO 研究Research

DeepMind 大量招（AlphaProof / AlphaGeometry 团队尤其）。Anthropic、OpenAI、xAI 都偏好。

DeepMind hires heavily (especially AlphaProof / AlphaGeometry teams). Anthropic, OpenAI, xAI all favour these.

ICPC + CF 2200+ Infra

中国 + 东欧 infra 岗密度高。字节 Seed、月之暗面、DeepSeek 招聘的实际信号。

Dense among Chinese + Eastern-European infra hires. The actual hiring signal at ByteDance Seed, Moonshot, DeepSeek.

Kaggle 应用Applied

产品 / 应用 ML 权重高，前沿研究权重低。Tesla 早期、各 fintech 偏好。

Heavy weight in product / applied ML, light weight in frontier research. Favoured by early Tesla AI, fintechs.

粗略偏好：Anthropic 偏数学 / 物理奥赛 + 理论品味；OpenAI 早期偏 Putnam + 工程；DeepMind 偏 IMO + 学术 PhD；DeepSeek / Moonshot / MiniMax 偏 ICPC + Codeforces + 顶会一作。

Rough preferences: Anthropic — math/physics olympiads + theoretical taste; early OpenAI — Putnam + engineering; DeepMind — IMO + academic PhD; DeepSeek / Moonshot / MiniMax — ICPC + Codeforces + first-author top-tier papers.

Part 1 · 共性研究

Part 1 · Commonalities

四、PhD 是否必须

4. Is a PhD required?

三种角色，三种答案。

Three roles, three different answers.

必须

Required

研究科学家

Research Scientist

pre-training、新架构、对齐基础理论。样本里 14/15 是 PhD。

Pre-training, novel architectures, alignment foundations. 14/15 of the sample hold PhDs.

不必要

Not needed

研究工程师 / Infra

Research Engineer / Infra

vLLM / Megatron / CUDA kernel。Shazeer、Jeff Dean、梁文锋全无 ML PhD。

vLLM / Megatron / CUDA kernels. Shazeer, Jeff Dean, Liang Wenfeng — none have an ML PhD.

奢侈品

Luxury

产品 / 应用层

Product / Applied

完全不需要 PhD，应用直觉与产品判断更值钱。

PhD entirely unnecessary; product intuition and judgment are worth more.

学生密度最高的几个 lab

Highest-density advisor lineages

Hinton（Toronto / Google）→ Sutskever、Krizhevsky、Graves
Abbeel（Berkeley）→ Schulman、Chelsea Finn、Peter Chen
李飞飞（Stanford）→ Karpathy、Justin Johnson、Jim Fan
Christopher Ré（Stanford）→ Tri Dao、Albert Gu（Mamba 系全员）
Salakhutdinov（CMU）→ 杨植麟与大量 NLP 中国学生
汤晓鸥（CUHK）→ 何恺明与商汤一代
朱军（清华）→ 智谱核心、Diffusion 中国阵营
LeCun（NYU / FAIR）、Bengio（Mila）自成生态

Hinton (Toronto / Google) → Sutskever, Krizhevsky, Graves
Abbeel (Berkeley) → Schulman, Chelsea Finn, Peter Chen
Fei-Fei Li (Stanford) → Karpathy, Justin Johnson, Jim Fan
Christopher Ré (Stanford) → Tri Dao, Albert Gu (the Mamba lineage)
Salakhutdinov (CMU) → Zhilin Yang and many Chinese NLP students
Xiaoou Tang (CUHK) → Kaiming He and the SenseTime generation
Jun Zhu (Tsinghua) → Zhipu core, Chinese diffusion school
LeCun (NYU / FAIR), Bengio (Mila) — self-contained ecosystems

这些 lab 的师承血统在 hiring 时是隐性硬通货。

Lineage from these labs is the hidden hard currency in hiring.

Part 1 · 共性研究

Part 1 · Commonalities

五、早期实习与项目

5. Internships & projects

几乎所有 90 后样本都有至少一段大厂研究院实习。

Almost every 90s-born researcher in the sample did at least one big-lab internship.

实习圣杯

Internship grail

Google BrainDeepMindFAIRMSROpenAIAnthropic

Aravind Srinivas 在 OpenAI、DeepMind、Google 都实习过 → 回 OpenAI → 创 Perplexity，是教科书路径。杨植麟在 Google Brain、FAIR 都有实习。Mistral 三人组全部 DeepMind / FAIR 出身。

Aravind Srinivas interned at OpenAI, DeepMind, and Google → returned to OpenAI → founded Perplexity — the textbook trajectory. Zhilin Yang interned at Brain and FAIR. The Mistral trio all came from DeepMind / FAIR.

Residency · 无 PhD 进研究岗的官方后门

Residency · Official back door to research without a PhD

Anthropic Fellows（最值钱）Anthropic Fellows (most valuable) OpenAI ResidencyGoogle AI ResidencyMeta FAIR Residency

开源贡献 · 隐性招聘渠道

Open source · The shadow hiring channel

nanoGPT / llm.cFlashAttention / MambavLLMSGLangHuggingFace transformersPyTorch core

这些 repo 的 top 50 contributor 名单基本是各大厂招聘短名单。

The top-50-contributor list of these repos is essentially the short list every frontier lab is recruiting from.

Part 1 · 共性研究

Part 1 · Commonalities

六、技能侧重

6. Skill emphasis

不同细分方向对数学和系统的要求差异极大。

Math and systems demands vary dramatically across sub-tracks.

方向	数学权重	系统 / CUDA	备注
Track	Math weight	Systems / CUDA	Notes
Pre-training algorithm	高	中	Kaplan 系，物理直觉重要
Post-training / RLHF	中	中	Schulman 系
新架构（Mamba / MoE）	高	高	Tri Dao 范本，IO-aware
Training infra	低	极高	Jeff Dean / Noam / 梁文锋
Inference infra	低	极高	vLLM / SGLang，系统出身吃香
Agents	中	中	产品直觉 > 数学
Multimodal	中	中	视觉 / 语音传统
Evals / safety	中	低	写作 + 实验设计
Pre-training algorithm	High	Mid	Kaplan lineage; physics intuition matters
Post-training / RLHF	Mid	Mid	Schulman lineage
Novel architectures (Mamba / MoE)	High	High	Tri Dao archetype, IO-aware
Training infra	Low	Extreme	Jeff Dean / Noam / Liang Wenfeng
Inference infra	Low	Extreme	vLLM / SGLang; systems people thrive
Agents	Mid	Mid	Product intuition > math
Multimodal	Mid	Mid	Vision / speech tradition
Evals / safety	Mid	Low	Writing + experimental design

Part 1 · 共性研究

Part 1 · Commonalities

七、趋势变化

7. Trend shift

从学术派到 infra 派，从研究院到工程师。

From academic lineage to infra muscle, from research labs to engineers.

2015 – 2020

研究院模式

The research-lab era

学术派主导，PhD + 顶会一作 = 入场券。CV / NLP 各做各的，单卡 / 8 卡跑实验。

Academic lineage dominates. PhD + first-author top-tier paper = ticket. CV and NLP run in parallel; single-GPU / 8-GPU experiments.

2020 后 · Scaling Laws

Post-2020 · Scaling Laws

Infra 重度倾斜

Infra-heavy tilt

一个能把 7B 训练效率 +20% 的工程师，价值超过十篇 NeurIPS。Noam Shazeer 在 Google 内部据传拿到资深 VP 级薪酬就是信号。

An engineer who improves 7B-model training efficiency by 20% is worth more than ten NeurIPS papers. Noam Shazeer reportedly drew senior-VP-level compensation at Google — a clear signal.

2023 后 · GPT-4

Post-2023 · GPT-4

新蓝海打开

A new blue ocean

post-training（RLHF / RLAIF / RLVR）+ data quality + evals 成为新蓝海，吸纳大批从应用层转入的人。

Post-training (RLHF / RLAIF / RLVR), data quality, and evals open up — absorbing people pivoting in from the application layer.

2024 后 · DeepSeek 时刻

Post-2024 · The DeepSeek moment

非传统出身证明力

Non-traditional backgrounds prove themselves

DeepSeek 证明非传统 ML 出身（量化 infra）也能 SOTA。但前提是十年自建 GPU 集群 + 高强度 infra 工程能力，不是"小作坊逆袭"故事。

DeepSeek proves non-traditional ML backgrounds (quant infra) can hit SOTA. But the precondition is a decade of self-built GPU clusters and heavy infra muscle — not a "small-shop underdog" story.

Part 2 · 路径建议

Part 2 · Path recommendations

给年轻人的三条路径

Three paths for young aspirants

三条路的最优学习路线不同，不要搞混。

The optimal learning route differs across the three — do not conflate them.

研究科学家

Research Scientist

想做 scaling、新架构、对齐基础理论

For scaling, novel architectures, alignment foundations

高中 / 本科阶段

High-school / Undergrad

国家：首选美本，或国内顶尖 + 美研。纯本土路径在前沿研究岗的天花板目前仍明显低于美研路径——不是智商问题，是 lab 师承和合作网络。
学校：MIT、Stanford、CMU、Berkeley、Princeton、Caltech、Toronto；国内清华基科 / 姚班、北大图灵班、中科大少年班、上交 ACM 班。
专业：数学 + CS 双修，或物理 + CS 双修。不要只读"AI 专业"——AI 课程半年过时，数学 / 物理底子十年不过时。
竞赛：IMO / IPhO / Putnam 选一打到金牌或前 100。这是 PhD 申请最硬的通货之一。
项目：大三前复现 nanoGPT；大三做一个能投 workshop 的小工作；大四争取一段 MSR / Google / DeepMind 实习。

Country: US undergrad first; or top Chinese undergrad + US grad. The ceiling of a purely-domestic path on frontier research roles remains visibly lower — not for IQ reasons, but for lab lineage and collaboration networks.
Schools: MIT, Stanford, CMU, Berkeley, Princeton, Caltech, Toronto. In China: Tsinghua Foundation Class / Yao Class, PKU Turing Class, USTC Junior College, SJTU ACM Class.
Major: Math + CS double, or Physics + CS double. Do not chase "AI majors" alone — AI course content goes stale in 6 months; math/physics fundamentals last a decade.
Competitions: pick one of IMO / IPhO / Putnam and reach gold-medal or top-100 level. This is one of the hardest currencies for PhD admissions.
Projects: replicate nanoGPT before junior year; produce a workshop-publishable side work in junior year; lock in an MSR / Google / DeepMind internship in senior year.

已本科 CS / 数学

After CS / Math undergrad

是否读 PhD：是。这条路径上 PhD 不是可选项。
申 lab 优先级：Christopher Ré、Percy Liang、Chelsea Finn、Sergey Levine、Yejin Choi、Tatsu Hashimoto；欧洲 Yoshua Bengio、Max Welling；国内朱军、孙茂松、刘知远。
Residency 备选：Anthropic Fellows（最值钱）、OpenAI Residency、Google AI Residency、Meta FAIR Residency。
Side project：复现 Chinchilla scaling 曲线（小尺度即可）；为 vLLM / SGLang 贡献一个 sampler；做一篇 mechanistic interpretability 复现（Anthropic 那条线在招人）。

PhD? Yes. On this path it is not optional.
Top labs to target: Christopher Ré, Percy Liang, Chelsea Finn, Sergey Levine, Yejin Choi, Tatsu Hashimoto; in Europe — Yoshua Bengio, Max Welling; in China — Jun Zhu, Maosong Sun, Zhiyuan Liu.
Residency fallback: Anthropic Fellows (most valuable), OpenAI Residency, Google AI Residency, Meta FAIR Residency.
Side projects: replicate Chinchilla scaling curves (small scale is fine); contribute a sampler to vLLM / SGLang; reproduce a mechanistic-interpretability paper (the Anthropic lineage is hiring on this).

研究工程师 / Infra

Research Engineer / Infra

想做训练框架、推理优化、CUDA

For training frameworks, inference optimisation, CUDA

高中 / 本科

High-school / Undergrad

国家：中国本土在这条路上占优。DeepSeek、Moonshot、字节 Seed、阿里 Qwen 都在疯抢 infra。
学校：清华 / 上交 ACM / 中科大 / 浙大 / 哈工大；美国 CMU / UIUC / Berkeley 系统方向。
专业：CS（系统方向）+ 数学辅修。
竞赛：ICPC 区域奖牌 + Codeforces 2200+ 比任何论文都管用。
项目：写 CUDA kernel（Triton、CUTLASS 都行）；给 PyTorch / vLLM / SGLang / TransformerEngine / Megatron 提 PR；自己用 4 张 4090 训一个 1B 模型并 blog 出来。

Country: domestic China has the structural edge here. DeepSeek, Moonshot, ByteDance Seed, Alibaba Qwen are all aggressively poaching infra talent.
Schools: Tsinghua / SJTU ACM / USTC / Zhejiang U. / HIT; in the US — CMU / UIUC / Berkeley systems.
Major: CS (systems track) + Math minor.
Competitions: ICPC regional medal + Codeforces 2200+ beats any paper.
Projects: write CUDA kernels (Triton, CUTLASS — either is fine); contribute PRs to PyTorch / vLLM / SGLang / TransformerEngine / Megatron; train a 1B model on four RTX 4090s and blog it.

已本科 CS / 数学

After CS / Math undergrad

是否读 PhD：不必要，甚至应该跳过。一年的 vLLM commit 比三年水 PhD 价值大。
直接进字节 Seed / DeepSeek / Moonshot / Qwen / Anthropic infra / xAI infra。
关键技能栈：NCCL、FSDP、TP/PP/EP、CUDA Graphs、PagedAttention、Triton、编译器（torch.compile / TVM）。
Side project：写一个 MoE 分布式训练的最小实现并开源；做一个 FP8 训练数值稳定性 report。

PhD? Not needed; arguably you should skip it. One year of meaningful vLLM commits is worth more than three years of a mediocre PhD.
Go directly to ByteDance Seed / DeepSeek / Moonshot / Qwen / Anthropic infra / xAI infra.
Stack: NCCL, FSDP, TP/PP/EP, CUDA Graphs, PagedAttention, Triton, compilers (torch.compile / TVM).
Side projects: write a minimal MoE distributed-training implementation and open-source it; produce an FP8 training numerical-stability report.

已工作想转入

Lateral entrants from industry

应用 / 产品 / evals / data 的切入路线

Entry routes via applied / product / evals / data

切入点排序

Entry points, ranked

Evals 工程师：门槛最低、最缺人。会写 Python + 有领域知识（医疗、法律、金融、教育）就能切。Anthropic、OpenAI、Scale AI 都在大规模招。
Data quality / annotation pipeline：数据工程 + 一点 LLM 经验。Surge、Scale、Snorkel 系。
Infra 应用工程：SRE + 懂 GPU 调度，比从 ML 转 infra 反而容易。
产品层 / agent wrapper：Cursor、Devin、Perplexity 这类。要会做产品判断 + prompt + eval 循环。
垂直行业 fine-tune + 评测：对原行业 know-how 是杠杆。

Evals engineer: lowest barrier, highest unmet demand. Python + domain knowledge (medicine, law, finance, education) is enough to break in. Anthropic, OpenAI, Scale AI are hiring at scale.
Data quality / annotation pipeline: data engineering + some LLM exposure. The Surge / Scale / Snorkel cluster.
Infra-adjacent engineering: SRE + GPU scheduling — easier than crossing in from ML to infra.
Product layer / agent wrappers: Cursor, Devin, Perplexity. You need product judgment + prompt + eval loops.
Vertical fine-tune + eval: your prior industry know-how is leverage.

不建议 Not recommended

试图自学三个月就去抢 pre-training 岗。那个市场对自学者关闭。

Attempting to self-study for three months and compete for pre-training roles. That market is closed to self-learners.

Part 3 · 非主流判断

Part 3 · Contrarian calls

四条非主流判断

Four contrarian calls

这些是我的明确观点，不是行业共识。

These are my own claims, not industry consensus.

PhD 不是必需品，但"PhD 替代品"门槛同样高 A PhD isn't required — but "PhD substitutes" set the same bar

要么是 Putnam / IMO 级竞赛，要么是 vLLM / FlashAttention 级开源贡献。中间地带（普通硕士 + 几个 Kaggle 银牌）现在最难。

Either Putnam / IMO-tier competition pedigree, or vLLM / FlashAttention-tier open-source contributions. The middle ground (a generic master's + a few Kaggle silvers) is the hardest spot to be in right now.

美本 / 美研在研究路径上结构性占优，中国本土在 Infra 路径上结构性占优 US schooling is structurally favoured for research; domestic China for infra

研究端的优势来自 lab 师承网络；infra 端的优势来自算力市场和工程文化。两条路要分开优化。

The research-side advantage comes from lab lineage and collaboration networks. The infra-side advantage comes from compute markets and engineering culture. Optimise the two paths separately.

物理 / 数学 PhD 转 ML 的红利期还有约 5 年 The physics/math PhD-to-ML window has about 5 years left

因为 scaling / 新架构方向仍在出新范式；等范式稳定后，CS 系统派会重新占优。

Because scaling / novel-architecture directions are still producing new paradigms. Once the paradigms stabilise, the CS-systems school will reclaim the edge.

梁文锋路径不可复制 The Liang Wenfeng path is not reproducible

他成功的前提是十年量化 infra 积累 + 自有 GPU 集群。年轻人模仿"绕开 PhD 直接做大模型"会失败，因为缺少他那十年的 infra 复利。

His precondition was a decade of quant-infra compounding plus a self-owned GPU cluster. Young people imitating "skip the PhD, jump straight to LLMs" will fail because they lack his decade of infra compounding.

Part 4 · 概念解释

Part 4 · Concept glossary

概念解释

Concept glossary

vLLM commit、Evals 工程师、DeepSeek 团队画像反推。

vLLM commits, evals engineers, and a reverse-engineered profile of the DeepSeek team.

⚡

"一年 vLLM commit" 是什么意思

What "a year of vLLM commits" means

vLLM 是 2023 年 Berkeley Sky Lab（Woosuk Kwon、Zhuohan Li）开源的 LLM 推理引擎，核心创新是 PagedAttention——把操作系统虚拟内存的分页思想搬到 KV cache。现在和 SGLang、TensorRT-LLM、llama.cpp 并列事实标准。

vLLM is an LLM inference engine open-sourced by Berkeley Sky Lab in 2023 (Woosuk Kwon, Zhuohan Li). Its core innovation is PagedAttention — porting OS virtual-memory paging onto KV cache. Today it stands as a de facto standard alongside SGLang, TensorRT-LLM, and llama.cpp.

"一年 vLLM commit"是简写，指持续 12 个月以上、有实质性贡献（不是改 typo）的开源工作。它值钱的原因：

"A year of vLLM commits" is shorthand for sustained, 12+ months of substantive contributions (not typo fixes). It's valuable because:

公开可验证：PR、代码质量、review 记录全部可查，比简历可信度高一个数量级。
接触真实生产系统：连续批处理、KV cache 管理、speculative decoding、FP8、MoE inference、TP / PP 调度——闭门写不出来。
直接进入招聘视野：core team 和 top 50 contributor 基本被 NVIDIA、Anthropic、OpenAI、xAI、Together、Anyscale、Red Hat（收购 Neural Magic）瓜分。
同质等价物：SGLang、TensorRT-LLM、llama.cpp、MLX、HuggingFace transformers core。

Publicly verifiable: PRs, code quality, and review history are all auditable — an order of magnitude more credible than a resume.
Forces contact with real production systems: continuous batching, KV cache management, speculative decoding, FP8, MoE inference, TP / PP scheduling — none of it can be reproduced in a vacuum.
Direct path into hiring pipelines: core team and top-50 contributors are essentially split among NVIDIA, Anthropic, OpenAI, xAI, Together, Anyscale, and Red Hat (which acquired Neural Magic).
Equivalents: SGLang, TensorRT-LLM, llama.cpp, MLX, HuggingFace transformers core.

"实质性"的颗粒度：加一个新模型架构、写一个 fused kernel、修一个 TP edge case、实现一个 sampler、做 FP8 数值稳定性 patch。README 改字不算。

The granularity of "substantive": adding a new model architecture, writing a fused kernel, fixing a TP edge case, implementing a sampler, patching FP8 numerical stability. README typo fixes don't count.

🎯

Evals 工程师

Evals engineer

Evals = evaluations。不是建模，是测量。

Evals = evaluations. Not modelling — measurement.

工作内容

What the work involves

设计 benchmark（MMLU、GPQA、SWE-bench、AIME、ARC-AGI 这类）
写 harness（Anthropic 的 Inspect、EleutherAI 的 lm-eval-harness、OpenAI 的 simple-evals）
领域 evals：医疗、法律、代码、agentic（METR 的 RE-Bench、Apollo 的 sandbagging eval）
危险能力红队：生化、网络攻击、自主复制——直接挂在 Anthropic RSP / OpenAI Preparedness 框架上，决定模型能不能发布
生产侧 online evals + regression 监控

Designing benchmarks (MMLU, GPQA, SWE-bench, AIME, ARC-AGI, etc.)
Writing harnesses (Anthropic's Inspect, EleutherAI's lm-eval-harness, OpenAI's simple-evals)
Domain evals: medical, legal, code, agentic (METR's RE-Bench, Apollo's sandbagging eval)
Dangerous-capability red-teaming: bio/chem, cyber-offence, autonomous replication — wired directly into Anthropic RSP / OpenAI Preparedness frameworks; determines whether a model ships
Production-side online evals + regression monitoring

雇主

Employers

Anthropic（团队最大，50+）Anthropic (largest team, 50+) OpenAI PreparednessMETRApollo ResearchUK AISI Scale AI 红队Scale AI red team

"门槛低却缺人"的三个原因

Why "low barrier yet under-staffed"

真正的瓶颈是领域知识 + 实验严谨度 + 写作清晰，不是 ML 理论。会写 Python 的医生 / 律师 / 生物学家比纯 CS 毕业生更值钱。
ML 圈传统认为 evals 不 prestigious，researcher 不愿做——但 RSP 出来后地位飙升。
统计功底（采样、置信区间、多重比较、IRR）很多 ML 工程师反而不熟。

The real bottleneck is domain knowledge + experimental rigour + crisp writing, not ML theory. A Python-fluent physician / lawyer / biologist is worth more than a generic CS grad.
Traditionally the ML field considered evals non-prestigious; researchers avoided it — but status jumped sharply after RSPs landed.
Statistical fundamentals (sampling, confidence intervals, multiple comparisons, inter-rater reliability) are oddly weak among many ML engineers.

下游路径：evals → safety researcher、→ AI governance / policy、→ 产品 PM。

Downstream paths: evals → safety researcher, → AI governance / policy, → product PM.

🔍

DeepSeek 工程师画像反推

Reverse-engineering the DeepSeek engineer profile

公开信源：V2 / V3 / R1 论文作者名单、《暗涌》《揭秘 DeepSeek》专访、36kr、知乎离职片段、幻方早期 JD。

Public sources: V2 / V3 / R1 paper author lists, the two Anyong interviews, the "Inside DeepSeek" feature, 36kr, Zhihu post-departure threads, early High-Flyer JDs.

构成

Composition

学校：清华、北大、浙大、上交、中科大、复旦为主体。几乎全本土培养，没有美研主力。
学历：硕士占多数，PhD 是少数派——和 Anthropic / OpenAI 完全相反。
年龄：97 / 98 / 99 后比例极高。多个核心作者是应届或工作 1–3 年。

Schools: Tsinghua, PKU, Zhejiang U., SJTU, USTC, Fudan dominate. Almost entirely domestically trained, no US-grad core.
Degrees: master's majority, PhDs are the minority — the opposite of Anthropic / OpenAI.
Age: born 1997–1999 cohort overrepresented. Several core authors are new grads or 1–3 years in.

两支前职业

Two prior career streams

幻方量化内部转岗（最重要的一支）——原本写高频交易系统，熟悉低延迟、CUDA、NVLink、自建集群运维。
高校直招——竞赛背景偏多，ICPC / 信息学奥赛 / 数学竞赛。

Internal transfers from High-Flyer Quant (the most important stream) — formerly building HFT systems, fluent in low latency, CUDA, NVLink, self-managed clusters.
Direct campus hires — heavy competition background, ICPC / informatics olympiads / math contests.

不招的人（来自访谈）

Who they don't hire (per interviews)

BAT 老员工
海归 senior researcher
"有成功 ML 经验"的人

Veteran BAT (Baidu/Alibaba/Tencent) employees
Returnee senior researchers
People with "successful ML track records"

梁文锋原话："认知比经验重要"——是 Anthropic 式 hiring 的反面极端。

In Liang Wenfeng's own words: "Insight matters more than experience" — the polar opposite of Anthropic-style hiring.

组织反推

Org structure inferred

扁平，没有 director / principal 阶梯
算力不限——上万张 H800，研究员有"无限算力"幻觉
发论文不是 KPI，是招人和定位手段
工资行业 top（应届顶尖 200 万+ RMB base），无大厂层级政治

Flat — no director / principal ladder
Compute is uncapped — tens of thousands of H800s, researchers experience an "infinite compute" illusion
Publishing is not a KPI; it's a recruiting and positioning tool
Top-of-industry pay (top new-grad ¥2M+ base), no big-tech ladder politics

技能反推（从公开成果反推必备能力）

Skills inferred from public output

MLA（Multi-head Latent Attention）：架构创新，懂 attention 内部数学
DeepSeekMoE + 细粒度专家：MoE 系统工程
FP8 混合精度训练：底层数值 + CUDA
DualPipe + 自写 all-to-all 通信 kernel：硬核系统，已触到 NVIDIA 工程师领域
GRPO：把 PPO 简化但保持 RL 稳定，理论嗅觉
R1-Zero 的纯 RL 路线：敢做大胆实验，且有算力支撑

MLA (Multi-head Latent Attention): architectural innovation, deep grasp of attention internals
DeepSeekMoE + fine-grained experts: MoE systems engineering
FP8 mixed-precision training: low-level numerics + CUDA
DualPipe + custom all-to-all comms kernel: hardcore systems work, already brushing NVIDIA-engineer territory
GRPO: simplifying PPO while preserving RL stability — theoretical taste
R1-Zero's pure-RL route: willingness to run bold experiments, backed by compute

真正启发 The real lesson

DeepSeek 不是"年轻人逆袭"故事，而是"量化资本 + 自建算力 + 反主流 hiring + 工程师文化"的组合拳。

DeepSeek is not a "young-people-against-the-odds" story. It's a combination punch: quant capital + self-built compute + counter-consensus hiring + engineering culture.

年轻人能学的：早期囤系统能力（CUDA、分布式、低延迟），不要早期囤 ML 论文数。

What young people can actually copy: front-load systems capability (CUDA, distributed, low latency); do not front-load ML paper count.

但复制路径需要资本前置——这是它和 OpenAI 早期"几个天才靠论文起家"最大的不同，也是为什么国内其他六小虎走不通这条路：他们没有一个已经赚到钱的量化母体提供十年算力复利。从博弈论看，DeepSeek 是资本 + 人才耦合策略的胜利，而不是单独的人才策略——所以"模仿 DeepSeek 的 hiring 方式"而没有匹配的算力底座，是注定失败的局部模仿。

But replicating the path requires capital upfront — and that's the biggest difference from early OpenAI ("a few geniuses starting from papers"). It's also why the other Chinese "six little tigers" can't follow this route: none of them has a monetised quant parent supplying a decade of compounding compute. From a game-theoretic view, DeepSeek is a victory of coupled capital + talent strategy, not a pure talent strategy. Imitating DeepSeek's hiring approach without the matching compute base is a partial mimicry that is structurally guaranteed to fail.