Sapphire Rapids does have 4 CPU dies (no I/O die) butted together, but they're still pretty big, around 350mm2 each going by the delidded photo here:
www.extremetech.com
With 56 cores total (14 per chiplet) this is about 1400mm2 total which is 25mm2 of 10nm per core. AMD Milan has something around 640mm2 of CPUs split over 8 x 8core chiplets (80mm of 7nm each, 10mm2 per core) plus a 420mm2 14nm I/O die (7mm2 per core), which is a lot cheaper per mm2 -- probably about half that of 7nm. So total silicon area for AMD is about 25% less than for Intel -- and for 64 cores, not 56, so 35% less per core, 40% of which is cheaper 14nm.
Add this onto the lower yield for Intel (CPU die are 350mm2 in lower-yielding Intel 10nm compared to 80mm2 in higher-yielding TSMC 7nm), and this suggest at least a 2x die cost advantage for AMD.
Intel can't easily deal with this by using more smaller CPU chiplets, their inter-die I/O is very short (low latency) on the edges that butt together, this only works for a 2x2 array -- they've basically sawn one enormous CPU chip into 4 big pieces. To use more smaller chiplets they would need a complete re-architect with longer higher-latency links between chips, in other works a proper inter-CPU network -- probably via an I/O die like AMD.
In other words, to compete with AMD at TSMC (or even in their own fab) they need an AMD-style architecture, not an Intel-style one...
Intel’s 10nm Sapphire Rapids CPU Delidded, Photographed | ExtremeTech
Intel's Sapphire Rapids has been sighted in the wild. It looks as if Intel will hook the chip together a ...

With 56 cores total (14 per chiplet) this is about 1400mm2 total which is 25mm2 of 10nm per core. AMD Milan has something around 640mm2 of CPUs split over 8 x 8core chiplets (80mm of 7nm each, 10mm2 per core) plus a 420mm2 14nm I/O die (7mm2 per core), which is a lot cheaper per mm2 -- probably about half that of 7nm. So total silicon area for AMD is about 25% less than for Intel -- and for 64 cores, not 56, so 35% less per core, 40% of which is cheaper 14nm.
Add this onto the lower yield for Intel (CPU die are 350mm2 in lower-yielding Intel 10nm compared to 80mm2 in higher-yielding TSMC 7nm), and this suggest at least a 2x die cost advantage for AMD.
Intel can't easily deal with this by using more smaller CPU chiplets, their inter-die I/O is very short (low latency) on the edges that butt together, this only works for a 2x2 array -- they've basically sawn one enormous CPU chip into 4 big pieces. To use more smaller chiplets they would need a complete re-architect with longer higher-latency links between chips, in other works a proper inter-CPU network -- probably via an I/O die like AMD.
In other words, to compete with AMD at TSMC (or even in their own fab) they need an AMD-style architecture, not an Intel-style one...