That's what I meant when I said lower C isn't going to save us, it's not dropping very rapidly -- even lower-K dielectrics are unlikely, wires are getting more densely packed, gate capacitance is dropping a bit as smaller transistors give more drive but again this is low -- real channel length has been around 17nm ever since N7, N3 is almost the same, N2 with GAA might reduce it by a couple of nm (not sure). The biggest gains with each node now are density and these are mainly from DTCO not raw pitches, power savings are gradually decreasing with each node.
Operating voltage is the only thing left which can drastically reduce power, especially because of the square-law, but you give away speed in return for this. If power is the #1 priority and area/chip cost is of less concern, the optimum operating voltage is already less than 0.5V, you can get big power savings this way but the reduced speed means maybe halving the clock rate and doubling the silicon area (double the parallelism), and few applications can afford to do this.
You can also do adaptive Vdd to maintain the same speed over process corners and temperature -- lower Vdd for fast hot chips, higher Vdd for slow cold chips -- and this also greatly reduces the chip-to-chip power variation (especially worst-case), but again many systems can't afford the complexity (and power supplies) to do this.