Processor cores used in computers and smartphones have become impaired by their own complexity and can’t fully utilize future CMOS generations for increasing their efficiency. Due to the continued increase of density and speed of transistors, these big cores produce too much heat per mm[SUP]2[/SUP] if trying to follow Moore’s law for both transistor count and frequency.
Every transistor switch event produces energy, i.e. heat. Both size and delay time for transistors are reduced for every generation, and if these are utilized, i.e. density or clock frequency is increased, then the heat density increases. This has become a problem with the latest generations. If both maximum density and maximum frequency would be utilized over the entire chip, then the heat would destroy the chip. For a processor core is of course desirable to be able to utilize the maximum performance the technology allows, but this can be done only on a few percent of the chip area, and the percentage is reduced for every generation. The remainder has to be “dark silicon”, meaning that is has to have much lower activity.
A radically new processor architecture, reducing overhead high frequency switching, is needed in order to fully utilize the potential of future CMOS technology. Optimizing for energy efficiency, throughput, cost, code density, adaptability and scalability is a big challenge for the computer architect.
Imsys’ processor has a different, yet well proven, fundamental design that doesn’t have the above mentioned limitation and is therefore suitable for the new situation in semiconductor technology. The core itself consists mainly of memory and it has rich functionality, which enables it to save energy by efficient use of its small flexible arithmetic logic.
Almost the entire chip area, 97%, is memory. Energy efficiency has, for the first time and for the foreseeable future, become the most important characteristic of processors, big and small, also for other reasons than the problem described above.
A proof-of-concept chip, prototyping a tile of a many-core system, has been produced and verified. 97% of its transistors are used in memory blocks. It includes Imsys’ patented dual core solution, where the pair of cores occupies 40% less space and consumes 25% less power than two single cores while doubling performance. The chip is manufactured by UMC using the 65 nm LL process and draws 18 mA at 1.2V and 350 MHz with both cores active. The cores share memories and a 5-port grid network router, NoC. Each core has local memory capacity sufficient for its immediate need, bringing down the load on the grid network on the chip. Memory management is handled by microcode and memory is interwoven with the processor and there is no cache or memory controller needed.
Simply placing 128 copies of this verified tile next to each other results in 256 cores, 42 MByte ROM and 25 MByte RAM on 320 mm[SUP]2[/SUP] silicon, consuming 2.8 W with all cores running at 350 MHz. This can simply be scaled down – with 14 nm technology, an area of 238 mm[SUP]2[/SUP] could have 4096 cores, 672 MByte ROM and 400 MByte RAM and consume 31 W at 1.6 GHz.
The second core only need half the power used by the first core. Each core has almost constant power consumption when active, and the heat it generates spreads across the adjacent memory areas. This allows a higher total power dissipation and simplifies cooling system and power budgeting.
Microcode, as opposed to logic gates, is compact and energy efficient. Imsys uses extensive microprogramming to accomplish a rich set of instructions, thereby reducing the number of cycles needed without energy inefficient speculative activity and duplicated hardware logic. Each core has two instruction sets, including native Java bytecode execution.
Microcode is also used for computationally intensive standard routines, such as crypto algorithms, which would otherwise be assembly coded library routines or even special hardware blocks. Optimizing CPU intensive tasks by microcode can reduce execution time and energy consumption of hot spots by more than an order of magnitude compared to C code.
The rich instruction set optimized for the compiler reduces the memory needed for software and, just like the microcoded special algorithms, it reduces the number of clock cycles needed for execution. The reduced requirements for memory bandwidth and the flexible microprogram control allow the compact arithmetic unit to do useful work all the time.
This platform has a certified JVM and uses an RTOS kernel certified to ISO 26262 safety standard for automotive applications. The development tools will be enhanced with the support enabled by the LLVM infrastructure. A new instruction set optimized for an LLVM backend has been developed and is being implemented in the coming hardware generation.
More information HERE.
Don’t forget to follow SemiWiki on LinkedIn HERE…Thank you for your support!
Share this post via:
Next Generation of Systems Design at Siemens