My take is that Python and PyTorch both rely on well-optimized underlying application-specific and hardware-optimized packages that are tuned in the underlying native "dataflow assembly code" (CUDA stuff for NVIDIA, ROCM for AMD, etc.). DeepSeek identified some great areas for optimization - model (multi headed latent attention), model partitioning (prefill/decode disaggregation) and system-level communication, and improved them at mostly the low level (not via packages).Newbie question: Is the infrastructure cost of AI partly related to inefficient coding languages like Python or PyTorch? Is efficient coding (which we glimpsed but didn't really understand with DeepSeek) something they do in China but not the USA?
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative...
I don't know about other HW suppliers, but NVIDIA has created new system infrastructure inside Dynamo, plus worked on the three main model serving engines that support PyTorch, SGLang, vLLM, and TensorRT-LLM, to implement these DeepSeek optimizations, and more, in the underlying packages and architecture for inference serving with NVIDIA hardware.
So Python and PyTorch need help be fast and efficient with new models and hardware. DeepSeek mostly optimized at low level, but NVIDIA and associated developers followed up to make the optimizations accessible to all, through mostly open source.
