This is a far too simplistic way of looking at it IMO. TSMC mobile process and HPC are not the same. A phone SOC needs ultra low leakage to maximize battery life and they pay for that with higher active power consumption. HPC wants the best performance in a given power envelop. Intel client CPUs live in their own little world of just highest performance at all other costs. If you look at a Xeon on the other hand those cores run at extremely low voltages because as you said it lets you slap down more cores in your TDP and get a bigger performance boost than only scaling freq. As for your comment of "slow, dense, more efficient.", I would argue there is nothing "new" about it. This was the driving factor for computers to move to transistors and later ICs instead of tubes, this is why amd/intel integrated more functionality (IMC, graphics, IO controllers) onto their CPUs, etc etc.
Taking a look at the current poster boy of HPC, NVIDIA, they don't practice "slow" for their Xtors. If HPC was only a function of lowering power NVIDIA would operate at Vmin and lower their freq below their fmax @Vmin to minimize active power. Yet they don't do this, and instead they do push freq pretty hard for a GPU, taking care to not have the freq inc to be greater than the resulting loss in core count. It seems like they find a voltage and core count config that gives the best performance in a power envelop and then cranks freq as fast as they can get it at that voltage. For those smaller GPU dies that means higher freq and the bigger dies spend more of their power budget on cores lowering the voltage they have to operate with.
That is on a 3 fin to 3 fin logic cell basis. TSMC's N5 2 fin cell is 210h if memory serves and is a bit denser than intel 4 HP library.