一、学校分布的三条带
1. Three school clusters
研究科学家路径上"门票学校"不到 15 所;Infra 路径上学校权重显著下降,"做过的事"权重上升。
For the research-scientist track, fewer than 15 schools issue most of the "tickets". For the infra track, school weight drops sharply — what you've shipped weighs more.
深度学习正统
Deep-learning orthodoxy
- Toronto · Sutskever(Hinton 学生)、Karpathy 本科
- Stanford · Karpathy PhD、Tri Dao、Jared Kaplan
- Berkeley · Schulman(Abbeel)、Aravind Srinivas
- CMU · 杨植麟 PhD
- Princeton · Dario Amodei 生物物理
- Caltech / MIT · Schulman 本科、何恺明 2024 任教
- Toronto · Sutskever (Hinton's student), Karpathy undergrad
- Stanford · Karpathy PhD, Tri Dao, Jared Kaplan
- Berkeley · Schulman (Abbeel), Aravind Srinivas
- CMU · Zhilin Yang PhD
- Princeton · Dario Amodei (biophysics)
- Caltech / MIT · Schulman undergrad; Kaiming He on faculty since 2024
古典学院体系
Classical academy lineage
- 剑桥 + UCL · Hassabis CS 本 + 认知神经 PhD
- X + ENS · Mistral 三人组(Mensch / Lample / Lacroix)全部出身
- ETH Zürich · 大量 FAIR / DeepMind 研究员
- Cambridge + UCL · Hassabis (CS undergrad + cognitive-neuro PhD)
- X + ENS · The Mistral trio (Mensch / Lample / Lacroix) all from here
- ETH Zürich · Heavy presence at FAIR / DeepMind
本土主力学校
Domestic powerhouses
- 清华 · 何恺明基科、杨植麟、唐杰
- CUHK · 汤晓鸥系(何恺明 PhD → 商汤一代)
- 浙大 · 梁文锋(异类:量化 infra 转 AI)
- 北大 / 上交 ACM / 中科大少年班
- Tsinghua · Kaiming He (Foundation Class), Zhilin Yang, Jie Tang
- CUHK · Xiaoou Tang's lab (He's PhD advisor → SenseTime generation)
- Zhejiang U. · Liang Wenfeng (outlier: quant-infra to AI)
- PKU / SJTU ACM Class / USTC Junior College
研究科学家路径"门票学校"不到 15 所;研究工程师 / Infra 路径上学校权重显著下降,"做过的事"权重上升。
Fewer than 15 schools dominate the research-scientist pipeline. On the infra track, school weight drops sharply and shipped work outweighs pedigree.
二、专业背景
2. Undergrad majors
物理 / 数学背景在"从零搭新范式"上有结构性优势;纯 CS 在"把现有范式做到极致"上更熟练。
Physics/math backgrounds have a structural edge at building new paradigms; pure CS excels at pushing existing ones to their limit.
Kaplan 的 Scaling Laws (2020) 本质是统计物理思维(Wilson RG、有限尺度标度),物理 PhD 训练出的"找 universal scaling"直觉对路。这条路径在 Anthropic 密度极高。
Kaplan's Scaling Laws (2020) is essentially statistical-physics thinking (Wilson RG, finite-size scaling). The physics-PhD instinct for "finding universal scaling" maps directly onto frontier ML. The density of this lineage at Anthropic is extreme.
物理 / 数学背景在"从零搭新范式"(scaling laws、Mamba、新架构)上有结构性优势;纯 CS 在"把现有范式做到极致"(系统、ranking、infra)上更熟练。两者不可替代。
Physics/math people have a structural edge at building new paradigms from scratch (scaling laws, Mamba, novel architectures). Pure-CS people are better at perfecting existing paradigms (systems, ranking, infra). Neither is replaceable.
三、竞赛经历
3. Competitions
不同公司偏好不同竞赛——这是简历筛选的隐性硬通货。
Different labs favour different competitions — this is the resume's hidden hard currency.
Noam Shazeer 是 Putnam Fellow(前五)。Google Brain / Anthropic senior 里 Putnam 出现频率极高。
Noam Shazeer is a Putnam Fellow (top five). Putnam shows up at extreme frequency among Google Brain / Anthropic seniors.
DeepMind 大量招(AlphaProof / AlphaGeometry 团队尤其)。Anthropic、OpenAI、xAI 都偏好。
DeepMind hires heavily (especially AlphaProof / AlphaGeometry teams). Anthropic, OpenAI, xAI all favour these.
中国 + 东欧 infra 岗密度高。字节 Seed、月之暗面、DeepSeek 招聘的实际信号。
Dense among Chinese + Eastern-European infra hires. The actual hiring signal at ByteDance Seed, Moonshot, DeepSeek.
产品 / 应用 ML 权重高,前沿研究权重低。Tesla 早期、各 fintech 偏好。
Heavy weight in product / applied ML, light weight in frontier research. Favoured by early Tesla AI, fintechs.
粗略偏好:Anthropic 偏数学 / 物理奥赛 + 理论品味;OpenAI 早期偏 Putnam + 工程;DeepMind 偏 IMO + 学术 PhD;DeepSeek / Moonshot / MiniMax 偏 ICPC + Codeforces + 顶会一作。
Rough preferences: Anthropic — math/physics olympiads + theoretical taste; early OpenAI — Putnam + engineering; DeepMind — IMO + academic PhD; DeepSeek / Moonshot / MiniMax — ICPC + Codeforces + first-author top-tier papers.
四、PhD 是否必须
4. Is a PhD required?
三种角色,三种答案。
Three roles, three different answers.
学生密度最高的几个 lab
Highest-density advisor lineages
- Hinton(Toronto / Google)→ Sutskever、Krizhevsky、Graves
- Abbeel(Berkeley)→ Schulman、Chelsea Finn、Peter Chen
- 李飞飞(Stanford)→ Karpathy、Justin Johnson、Jim Fan
- Christopher Ré(Stanford)→ Tri Dao、Albert Gu(Mamba 系全员)
- Salakhutdinov(CMU)→ 杨植麟与大量 NLP 中国学生
- 汤晓鸥(CUHK)→ 何恺明与商汤一代
- 朱军(清华)→ 智谱核心、Diffusion 中国阵营
- LeCun(NYU / FAIR)、Bengio(Mila)自成生态
- Hinton (Toronto / Google) → Sutskever, Krizhevsky, Graves
- Abbeel (Berkeley) → Schulman, Chelsea Finn, Peter Chen
- Fei-Fei Li (Stanford) → Karpathy, Justin Johnson, Jim Fan
- Christopher Ré (Stanford) → Tri Dao, Albert Gu (the Mamba lineage)
- Salakhutdinov (CMU) → Zhilin Yang and many Chinese NLP students
- Xiaoou Tang (CUHK) → Kaiming He and the SenseTime generation
- Jun Zhu (Tsinghua) → Zhipu core, Chinese diffusion school
- LeCun (NYU / FAIR), Bengio (Mila) — self-contained ecosystems
这些 lab 的师承血统在 hiring 时是隐性硬通货。
Lineage from these labs is the hidden hard currency in hiring.
五、早期实习与项目
5. Internships & projects
几乎所有 90 后样本都有至少一段大厂研究院实习。
Almost every 90s-born researcher in the sample did at least one big-lab internship.
实习圣杯
Internship grail
Aravind Srinivas 在 OpenAI、DeepMind、Google 都实习过 → 回 OpenAI → 创 Perplexity,是教科书路径。杨植麟在 Google Brain、FAIR 都有实习。Mistral 三人组全部 DeepMind / FAIR 出身。
Aravind Srinivas interned at OpenAI, DeepMind, and Google → returned to OpenAI → founded Perplexity — the textbook trajectory. Zhilin Yang interned at Brain and FAIR. The Mistral trio all came from DeepMind / FAIR.
Residency · 无 PhD 进研究岗的官方后门
Residency · Official back door to research without a PhD
开源贡献 · 隐性招聘渠道
Open source · The shadow hiring channel
这些 repo 的 top 50 contributor 名单基本是各大厂招聘短名单。
The top-50-contributor list of these repos is essentially the short list every frontier lab is recruiting from.
六、技能侧重
6. Skill emphasis
不同细分方向对数学和系统的要求差异极大。
Math and systems demands vary dramatically across sub-tracks.
| 方向 | 数学权重 | 系统 / CUDA | 备注 |
|---|---|---|---|
| Track | Math weight | Systems / CUDA | Notes |
| Pre-training algorithm | 高 | 中 | Kaplan 系,物理直觉重要 |
| Post-training / RLHF | 中 | 中 | Schulman 系 |
| 新架构(Mamba / MoE) | 高 | 高 | Tri Dao 范本,IO-aware |
| Training infra | 低 | 极高 | Jeff Dean / Noam / 梁文锋 |
| Inference infra | 低 | 极高 | vLLM / SGLang,系统出身吃香 |
| Agents | 中 | 中 | 产品直觉 > 数学 |
| Multimodal | 中 | 中 | 视觉 / 语音传统 |
| Evals / safety | 中 | 低 | 写作 + 实验设计 |
| Pre-training algorithm | High | Mid | Kaplan lineage; physics intuition matters |
| Post-training / RLHF | Mid | Mid | Schulman lineage |
| Novel architectures (Mamba / MoE) | High | High | Tri Dao archetype, IO-aware |
| Training infra | Low | Extreme | Jeff Dean / Noam / Liang Wenfeng |
| Inference infra | Low | Extreme | vLLM / SGLang; systems people thrive |
| Agents | Mid | Mid | Product intuition > math |
| Multimodal | Mid | Mid | Vision / speech tradition |
| Evals / safety | Mid | Low | Writing + experimental design |
七、趋势变化
7. Trend shift
从学术派到 infra 派,从研究院到工程师。
From academic lineage to infra muscle, from research labs to engineers.
研究院模式
The research-lab era
学术派主导,PhD + 顶会一作 = 入场券。CV / NLP 各做各的,单卡 / 8 卡跑实验。
Academic lineage dominates. PhD + first-author top-tier paper = ticket. CV and NLP run in parallel; single-GPU / 8-GPU experiments.
Infra 重度倾斜
Infra-heavy tilt
一个能把 7B 训练效率 +20% 的工程师,价值超过十篇 NeurIPS。Noam Shazeer 在 Google 内部据传拿到资深 VP 级薪酬就是信号。
An engineer who improves 7B-model training efficiency by 20% is worth more than ten NeurIPS papers. Noam Shazeer reportedly drew senior-VP-level compensation at Google — a clear signal.
新蓝海打开
A new blue ocean
post-training(RLHF / RLAIF / RLVR)+ data quality + evals 成为新蓝海,吸纳大批从应用层转入的人。
Post-training (RLHF / RLAIF / RLVR), data quality, and evals open up — absorbing people pivoting in from the application layer.
非传统出身证明力
Non-traditional backgrounds prove themselves
DeepSeek 证明非传统 ML 出身(量化 infra)也能 SOTA。但前提是十年自建 GPU 集群 + 高强度 infra 工程能力,不是"小作坊逆袭"故事。
DeepSeek proves non-traditional ML backgrounds (quant infra) can hit SOTA. But the precondition is a decade of self-built GPU clusters and heavy infra muscle — not a "small-shop underdog" story.
给年轻人的三条路径
Three paths for young aspirants
三条路的最优学习路线不同,不要搞混。
The optimal learning route differs across the three — do not conflate them.
研究科学家
Research Scientist
想做 scaling、新架构、对齐基础理论
For scaling, novel architectures, alignment foundations
高中 / 本科阶段
High-school / Undergrad
- 国家:首选美本,或国内顶尖 + 美研。纯本土路径在前沿研究岗的天花板目前仍明显低于美研路径——不是智商问题,是 lab 师承和合作网络。
- 学校:MIT、Stanford、CMU、Berkeley、Princeton、Caltech、Toronto;国内清华基科 / 姚班、北大图灵班、中科大少年班、上交 ACM 班。
- 专业:数学 + CS 双修,或物理 + CS 双修。不要只读"AI 专业"——AI 课程半年过时,数学 / 物理底子十年不过时。
- 竞赛:IMO / IPhO / Putnam 选一打到金牌或前 100。这是 PhD 申请最硬的通货之一。
- 项目:大三前复现 nanoGPT;大三做一个能投 workshop 的小工作;大四争取一段 MSR / Google / DeepMind 实习。
- Country: US undergrad first; or top Chinese undergrad + US grad. The ceiling of a purely-domestic path on frontier research roles remains visibly lower — not for IQ reasons, but for lab lineage and collaboration networks.
- Schools: MIT, Stanford, CMU, Berkeley, Princeton, Caltech, Toronto. In China: Tsinghua Foundation Class / Yao Class, PKU Turing Class, USTC Junior College, SJTU ACM Class.
- Major: Math + CS double, or Physics + CS double. Do not chase "AI majors" alone — AI course content goes stale in 6 months; math/physics fundamentals last a decade.
- Competitions: pick one of IMO / IPhO / Putnam and reach gold-medal or top-100 level. This is one of the hardest currencies for PhD admissions.
- Projects: replicate nanoGPT before junior year; produce a workshop-publishable side work in junior year; lock in an MSR / Google / DeepMind internship in senior year.
已本科 CS / 数学
After CS / Math undergrad
- 是否读 PhD:是。这条路径上 PhD 不是可选项。
- 申 lab 优先级:Christopher Ré、Percy Liang、Chelsea Finn、Sergey Levine、Yejin Choi、Tatsu Hashimoto;欧洲 Yoshua Bengio、Max Welling;国内朱军、孙茂松、刘知远。
- Residency 备选:Anthropic Fellows(最值钱)、OpenAI Residency、Google AI Residency、Meta FAIR Residency。
- Side project:复现 Chinchilla scaling 曲线(小尺度即可);为 vLLM / SGLang 贡献一个 sampler;做一篇 mechanistic interpretability 复现(Anthropic 那条线在招人)。
- PhD? Yes. On this path it is not optional.
- Top labs to target: Christopher Ré, Percy Liang, Chelsea Finn, Sergey Levine, Yejin Choi, Tatsu Hashimoto; in Europe — Yoshua Bengio, Max Welling; in China — Jun Zhu, Maosong Sun, Zhiyuan Liu.
- Residency fallback: Anthropic Fellows (most valuable), OpenAI Residency, Google AI Residency, Meta FAIR Residency.
- Side projects: replicate Chinchilla scaling curves (small scale is fine); contribute a sampler to vLLM / SGLang; reproduce a mechanistic-interpretability paper (the Anthropic lineage is hiring on this).
研究工程师 / Infra
Research Engineer / Infra
想做训练框架、推理优化、CUDA
For training frameworks, inference optimisation, CUDA
高中 / 本科
High-school / Undergrad
- 国家:中国本土在这条路上占优。DeepSeek、Moonshot、字节 Seed、阿里 Qwen 都在疯抢 infra。
- 学校:清华 / 上交 ACM / 中科大 / 浙大 / 哈工大;美国 CMU / UIUC / Berkeley 系统方向。
- 专业:CS(系统方向)+ 数学辅修。
- 竞赛:ICPC 区域奖牌 + Codeforces 2200+ 比任何论文都管用。
- 项目:写 CUDA kernel(Triton、CUTLASS 都行);给 PyTorch / vLLM / SGLang / TransformerEngine / Megatron 提 PR;自己用 4 张 4090 训一个 1B 模型并 blog 出来。
- Country: domestic China has the structural edge here. DeepSeek, Moonshot, ByteDance Seed, Alibaba Qwen are all aggressively poaching infra talent.
- Schools: Tsinghua / SJTU ACM / USTC / Zhejiang U. / HIT; in the US — CMU / UIUC / Berkeley systems.
- Major: CS (systems track) + Math minor.
- Competitions: ICPC regional medal + Codeforces 2200+ beats any paper.
- Projects: write CUDA kernels (Triton, CUTLASS — either is fine); contribute PRs to PyTorch / vLLM / SGLang / TransformerEngine / Megatron; train a 1B model on four RTX 4090s and blog it.
已本科 CS / 数学
After CS / Math undergrad
- 是否读 PhD:不必要,甚至应该跳过。一年的 vLLM commit 比三年水 PhD 价值大。
- 直接进字节 Seed / DeepSeek / Moonshot / Qwen / Anthropic infra / xAI infra。
- 关键技能栈:NCCL、FSDP、TP/PP/EP、CUDA Graphs、PagedAttention、Triton、编译器(torch.compile / TVM)。
- Side project:写一个 MoE 分布式训练的最小实现并开源;做一个 FP8 训练数值稳定性 report。
- PhD? Not needed; arguably you should skip it. One year of meaningful vLLM commits is worth more than three years of a mediocre PhD.
- Go directly to ByteDance Seed / DeepSeek / Moonshot / Qwen / Anthropic infra / xAI infra.
- Stack: NCCL, FSDP, TP/PP/EP, CUDA Graphs, PagedAttention, Triton, compilers (torch.compile / TVM).
- Side projects: write a minimal MoE distributed-training implementation and open-source it; produce an FP8 training numerical-stability report.
已工作想转入
Lateral entrants from industry
应用 / 产品 / evals / data 的切入路线
Entry routes via applied / product / evals / data
切入点排序
Entry points, ranked
- Evals 工程师:门槛最低、最缺人。会写 Python + 有领域知识(医疗、法律、金融、教育)就能切。Anthropic、OpenAI、Scale AI 都在大规模招。
- Data quality / annotation pipeline:数据工程 + 一点 LLM 经验。Surge、Scale、Snorkel 系。
- Infra 应用工程:SRE + 懂 GPU 调度,比从 ML 转 infra 反而容易。
- 产品层 / agent wrapper:Cursor、Devin、Perplexity 这类。要会做产品判断 + prompt + eval 循环。
- 垂直行业 fine-tune + 评测:对原行业 know-how 是杠杆。
- Evals engineer: lowest barrier, highest unmet demand. Python + domain knowledge (medicine, law, finance, education) is enough to break in. Anthropic, OpenAI, Scale AI are hiring at scale.
- Data quality / annotation pipeline: data engineering + some LLM exposure. The Surge / Scale / Snorkel cluster.
- Infra-adjacent engineering: SRE + GPU scheduling — easier than crossing in from ML to infra.
- Product layer / agent wrappers: Cursor, Devin, Perplexity. You need product judgment + prompt + eval loops.
- Vertical fine-tune + eval: your prior industry know-how is leverage.
试图自学三个月就去抢 pre-training 岗。那个市场对自学者关闭。
Attempting to self-study for three months and compete for pre-training roles. That market is closed to self-learners.
四条非主流判断
Four contrarian calls
这些是我的明确观点,不是行业共识。
These are my own claims, not industry consensus.
要么是 Putnam / IMO 级竞赛,要么是 vLLM / FlashAttention 级开源贡献。中间地带(普通硕士 + 几个 Kaggle 银牌)现在最难。
Either Putnam / IMO-tier competition pedigree, or vLLM / FlashAttention-tier open-source contributions. The middle ground (a generic master's + a few Kaggle silvers) is the hardest spot to be in right now.
研究端的优势来自 lab 师承网络;infra 端的优势来自算力市场和工程文化。两条路要分开优化。
The research-side advantage comes from lab lineage and collaboration networks. The infra-side advantage comes from compute markets and engineering culture. Optimise the two paths separately.
因为 scaling / 新架构方向仍在出新范式;等范式稳定后,CS 系统派会重新占优。
Because scaling / novel-architecture directions are still producing new paradigms. Once the paradigms stabilise, the CS-systems school will reclaim the edge.
他成功的前提是十年量化 infra 积累 + 自有 GPU 集群。年轻人模仿"绕开 PhD 直接做大模型"会失败,因为缺少他那十年的 infra 复利。
His precondition was a decade of quant-infra compounding plus a self-owned GPU cluster. Young people imitating "skip the PhD, jump straight to LLMs" will fail because they lack his decade of infra compounding.
概念解释
Concept glossary
vLLM commit、Evals 工程师、DeepSeek 团队画像反推。
vLLM commits, evals engineers, and a reverse-engineered profile of the DeepSeek team.
"一年 vLLM commit" 是什么意思
What "a year of vLLM commits" means
vLLM 是 2023 年 Berkeley Sky Lab(Woosuk Kwon、Zhuohan Li)开源的 LLM 推理引擎,核心创新是 PagedAttention——把操作系统虚拟内存的分页思想搬到 KV cache。现在和 SGLang、TensorRT-LLM、llama.cpp 并列事实标准。
vLLM is an LLM inference engine open-sourced by Berkeley Sky Lab in 2023 (Woosuk Kwon, Zhuohan Li). Its core innovation is PagedAttention — porting OS virtual-memory paging onto KV cache. Today it stands as a de facto standard alongside SGLang, TensorRT-LLM, and llama.cpp.
"一年 vLLM commit"是简写,指持续 12 个月以上、有实质性贡献(不是改 typo)的开源工作。它值钱的原因:
"A year of vLLM commits" is shorthand for sustained, 12+ months of substantive contributions (not typo fixes). It's valuable because:
- 公开可验证:PR、代码质量、review 记录全部可查,比简历可信度高一个数量级。
- 接触真实生产系统:连续批处理、KV cache 管理、speculative decoding、FP8、MoE inference、TP / PP 调度——闭门写不出来。
- 直接进入招聘视野:core team 和 top 50 contributor 基本被 NVIDIA、Anthropic、OpenAI、xAI、Together、Anyscale、Red Hat(收购 Neural Magic)瓜分。
- 同质等价物:SGLang、TensorRT-LLM、llama.cpp、MLX、HuggingFace transformers core。
- Publicly verifiable: PRs, code quality, and review history are all auditable — an order of magnitude more credible than a resume.
- Forces contact with real production systems: continuous batching, KV cache management, speculative decoding, FP8, MoE inference, TP / PP scheduling — none of it can be reproduced in a vacuum.
- Direct path into hiring pipelines: core team and top-50 contributors are essentially split among NVIDIA, Anthropic, OpenAI, xAI, Together, Anyscale, and Red Hat (which acquired Neural Magic).
- Equivalents: SGLang, TensorRT-LLM, llama.cpp, MLX, HuggingFace transformers core.
"实质性"的颗粒度:加一个新模型架构、写一个 fused kernel、修一个 TP edge case、实现一个 sampler、做 FP8 数值稳定性 patch。README 改字不算。
The granularity of "substantive": adding a new model architecture, writing a fused kernel, fixing a TP edge case, implementing a sampler, patching FP8 numerical stability. README typo fixes don't count.
Evals 工程师
Evals engineer
Evals = evaluations。不是建模,是测量。
Evals = evaluations. Not modelling — measurement.
工作内容
What the work involves
- 设计 benchmark(MMLU、GPQA、SWE-bench、AIME、ARC-AGI 这类)
- 写 harness(Anthropic 的 Inspect、EleutherAI 的 lm-eval-harness、OpenAI 的 simple-evals)
- 领域 evals:医疗、法律、代码、agentic(METR 的 RE-Bench、Apollo 的 sandbagging eval)
- 危险能力红队:生化、网络攻击、自主复制——直接挂在 Anthropic RSP / OpenAI Preparedness 框架上,决定模型能不能发布
- 生产侧 online evals + regression 监控
- Designing benchmarks (MMLU, GPQA, SWE-bench, AIME, ARC-AGI, etc.)
- Writing harnesses (Anthropic's Inspect, EleutherAI's lm-eval-harness, OpenAI's simple-evals)
- Domain evals: medical, legal, code, agentic (METR's RE-Bench, Apollo's sandbagging eval)
- Dangerous-capability red-teaming: bio/chem, cyber-offence, autonomous replication — wired directly into Anthropic RSP / OpenAI Preparedness frameworks; determines whether a model ships
- Production-side online evals + regression monitoring
雇主
Employers
"门槛低却缺人"的三个原因
Why "low barrier yet under-staffed"
- 真正的瓶颈是领域知识 + 实验严谨度 + 写作清晰,不是 ML 理论。会写 Python 的医生 / 律师 / 生物学家比纯 CS 毕业生更值钱。
- ML 圈传统认为 evals 不 prestigious,researcher 不愿做——但 RSP 出来后地位飙升。
- 统计功底(采样、置信区间、多重比较、IRR)很多 ML 工程师反而不熟。
- The real bottleneck is domain knowledge + experimental rigour + crisp writing, not ML theory. A Python-fluent physician / lawyer / biologist is worth more than a generic CS grad.
- Traditionally the ML field considered evals non-prestigious; researchers avoided it — but status jumped sharply after RSPs landed.
- Statistical fundamentals (sampling, confidence intervals, multiple comparisons, inter-rater reliability) are oddly weak among many ML engineers.
下游路径:evals → safety researcher、→ AI governance / policy、→ 产品 PM。
Downstream paths: evals → safety researcher, → AI governance / policy, → product PM.
DeepSeek 工程师画像反推
Reverse-engineering the DeepSeek engineer profile
公开信源:V2 / V3 / R1 论文作者名单、《暗涌》《揭秘 DeepSeek》专访、36kr、知乎离职片段、幻方早期 JD。
Public sources: V2 / V3 / R1 paper author lists, the two Anyong interviews, the "Inside DeepSeek" feature, 36kr, Zhihu post-departure threads, early High-Flyer JDs.
构成
Composition
- 学校:清华、北大、浙大、上交、中科大、复旦为主体。几乎全本土培养,没有美研主力。
- 学历:硕士占多数,PhD 是少数派——和 Anthropic / OpenAI 完全相反。
- 年龄:97 / 98 / 99 后比例极高。多个核心作者是应届或工作 1–3 年。
- Schools: Tsinghua, PKU, Zhejiang U., SJTU, USTC, Fudan dominate. Almost entirely domestically trained, no US-grad core.
- Degrees: master's majority, PhDs are the minority — the opposite of Anthropic / OpenAI.
- Age: born 1997–1999 cohort overrepresented. Several core authors are new grads or 1–3 years in.
两支前职业
Two prior career streams
- 幻方量化内部转岗(最重要的一支)——原本写高频交易系统,熟悉低延迟、CUDA、NVLink、自建集群运维。
- 高校直招——竞赛背景偏多,ICPC / 信息学奥赛 / 数学竞赛。
- Internal transfers from High-Flyer Quant (the most important stream) — formerly building HFT systems, fluent in low latency, CUDA, NVLink, self-managed clusters.
- Direct campus hires — heavy competition background, ICPC / informatics olympiads / math contests.
不招的人(来自访谈)
Who they don't hire (per interviews)
- BAT 老员工
- 海归 senior researcher
- "有成功 ML 经验"的人
- Veteran BAT (Baidu/Alibaba/Tencent) employees
- Returnee senior researchers
- People with "successful ML track records"
梁文锋原话:"认知比经验重要"——是 Anthropic 式 hiring 的反面极端。
In Liang Wenfeng's own words: "Insight matters more than experience" — the polar opposite of Anthropic-style hiring.
组织反推
Org structure inferred
- 扁平,没有 director / principal 阶梯
- 算力不限——上万张 H800,研究员有"无限算力"幻觉
- 发论文不是 KPI,是招人和定位手段
- 工资行业 top(应届顶尖 200 万+ RMB base),无大厂层级政治
- Flat — no director / principal ladder
- Compute is uncapped — tens of thousands of H800s, researchers experience an "infinite compute" illusion
- Publishing is not a KPI; it's a recruiting and positioning tool
- Top-of-industry pay (top new-grad ¥2M+ base), no big-tech ladder politics
技能反推(从公开成果反推必备能力)
Skills inferred from public output
- MLA(Multi-head Latent Attention):架构创新,懂 attention 内部数学
- DeepSeekMoE + 细粒度专家:MoE 系统工程
- FP8 混合精度训练:底层数值 + CUDA
- DualPipe + 自写 all-to-all 通信 kernel:硬核系统,已触到 NVIDIA 工程师领域
- GRPO:把 PPO 简化但保持 RL 稳定,理论嗅觉
- R1-Zero 的纯 RL 路线:敢做大胆实验,且有算力支撑
- MLA (Multi-head Latent Attention): architectural innovation, deep grasp of attention internals
- DeepSeekMoE + fine-grained experts: MoE systems engineering
- FP8 mixed-precision training: low-level numerics + CUDA
- DualPipe + custom all-to-all comms kernel: hardcore systems work, already brushing NVIDIA-engineer territory
- GRPO: simplifying PPO while preserving RL stability — theoretical taste
- R1-Zero's pure-RL route: willingness to run bold experiments, backed by compute
DeepSeek 不是"年轻人逆袭"故事,而是"量化资本 + 自建算力 + 反主流 hiring + 工程师文化"的组合拳。
DeepSeek is not a "young-people-against-the-odds" story. It's a combination punch: quant capital + self-built compute + counter-consensus hiring + engineering culture.
年轻人能学的:早期囤系统能力(CUDA、分布式、低延迟),不要早期囤 ML 论文数。
What young people can actually copy: front-load systems capability (CUDA, distributed, low latency); do not front-load ML paper count.
但复制路径需要资本前置——这是它和 OpenAI 早期"几个天才靠论文起家"最大的不同,也是为什么国内其他六小虎走不通这条路:他们没有一个已经赚到钱的量化母体提供十年算力复利。从博弈论看,DeepSeek 是资本 + 人才耦合策略的胜利,而不是单独的人才策略——所以"模仿 DeepSeek 的 hiring 方式"而没有匹配的算力底座,是注定失败的局部模仿。
But replicating the path requires capital upfront — and that's the biggest difference from early OpenAI ("a few geniuses starting from papers"). It's also why the other Chinese "six little tigers" can't follow this route: none of them has a monetised quant parent supplying a decade of compounding compute. From a game-theoretic view, DeepSeek is a victory of coupled capital + talent strategy, not a pure talent strategy. Imitating DeepSeek's hiring approach without the matching compute base is a partial mimicry that is structurally guaranteed to fail.