Xilinx, last week announced that it has shipped the 20nm Kintex-115 Device, and I quote:
“Xilinx has produced a 20nm FPGA for data center acceleration called Kintex UltraScaleKU115 FPGA.
The chips deliver up to:
· 1.16M logic cells,
· 5,520 optimized DSP slices,
· 76 Mbits of block RAM,
· 16.3Gbps backplane-capable transceivers,
· PCIe Gen3 hard blocks,
· integrated 100Gb/s Ethernet MAC and
· 150 Gb/s Interlaken IP Cores, and
· DDR4 memory interfaces operating at 2,400 Mb/s
· 2 Flux Capacitor Cores (1.21 Jiga Watts per core for time travel)”. Just checking if you’re still reading.
Two area’s I would like to draw your attention to is:
1) The number of DSP(5520), speed(741 MHz), and bits (27×18)
2) GT’s : Gigabit Transceivers, speed(16.3 Gbps) and density (64)
I think I need to remind us that only about 12 years ago, the Virtex-II Pro50 had 232 DSP. We are so spoiled today, 5220 DSP? Yikes. Just think about the amount of work this one FPGA can do. It is definitely not midrange and the FPGA blob strikes again as this one Xilinx FPGA just ate a whole rack of 6U cards from 10 years ago.
Just as important as DSP is moving data on and off the FPGA via what I will call the ‘Serial Revolution’. Serial interfaces do make sense right? Over the years we have been able to witness, TTL/CMOS –> LVDS/DDR–>GTs. Less IO, more bandwidth and lower power. Ok, yes there is higher latency going serial which can be overcome with faster FPGA clocks and some architecture changes. The Serial Revolution is not only striking the Xilinx FPGAs but it is allowing data converter companies like TI, ADI to reduce the IO demands of high speed/high bandwidth data converters using JESD204b over GT’s. This means a data converter that needed 64+ LVDS IO, can be reduced to 5 lanes of JESD204b. For example if you had a digital receiver design 5 years ago, you were hard pressed to get 2 ADC’s to feed one FPGA over a Giga Sample. Not anymore, easily connect 8-12 of these puppies depending on the system requirements.
Not only can you use JESD204b but also Hybrid Memory Cube (HMC) as well. By the way, Xilinx 28nm handles HMC and JESD204b also, that is why it is important to look at Xilinx not at a certain technology node but as a portfolio of solutions. You may not need 5520 DSP, nor 64 GTs. See how Xilinx fits into your design but it does not always have to be the newest fastest process, though it is tempting. The table below captures the trade offs of Xilinx 7 Series compared with Xilinx UltraScale. Also remember the 20nm UltraScale has a new DSP, which means when performing complex multiplies you only need half the DSP in 20nm when compared to the 28nm, 7 series. That is very important and very powerful especially of you design is using complex arithmetic.
The Xilinx DSP in the 20nm family are the world’s fastest, widest, densest DSP. Raw fixed point arithmetic will give you about 8.2 TMACs, and about 1.3 TFLOPs. So do you think it is wise and efficient to program this puppy the old fashioned VHDL/Verilog way? You can, but for real productivity, Vivado HLS will really lighten your load enabling you to code the design in C/C++. This means real portable libraries, faster simulation times and less errors at system integration. A white paper that I wrote,wp452 highlights the power of HLS by tackling one of the hardest problems to solve in silicon, which is complex floating point matrix inversion. At 8.2 TMACS, 1.3 TFLOPs Xilinx once again has shown the world that they are the FPGA leader, and more than that, they are an EDA leader as well as a SoC Company. So I encourage you to familiarize yourself with the family of Xilinx FPGAs and see for yourself why Xilinx is the Global FPGA leader. Click Here