In the first part of this article, I talked about some of the key business aspects along with some technical aspects like system performance, functionality, and IP integration that drive the architecture of an SoC for its best optimization and realization in an economic sense. In this part, let’s dive into some more aspects that are needed to make your SoC robust enough to survive in today’s global environment.
Hardware, Software, and Embedded Software: Today, before you architect an SoC, you have to think about how it will be driven by the software systems sitting on it and make provisions for those. It has to account for interfaces with other multimedia, graphics, networking, and connectivity devices and software. Accordingly it has to be targeted for particular applications such as IoT, wearable, medical, automotive, and so on.
[GENIVI Software Architecture, courtesy Mentor Graphics, GENIVI]
Above is an example of a Linux based open source software architecture promoted by GENIVI in the automotive space. This architecture is supported on the reference boards available from Renesas, Texas Instruments, and Freescale.
The embedded software again is an essential part of any software for SoC. It is heavily dependent on the underlying hardware and has to be designed according to the application. As we move up from the hardware up to application level, the embedded software transforms the data through Hardware Abstraction Layer (HAL) and its corresponding APIs, OS and its corresponding APIs, Communication middleware and its corresponding APIs, and the application software itself. The embedded software needs to be optimized by exploiting the characteristics of the underlying hardware and re-used in several similar systems as far as possible.
The idea is to architect a complete system including hardware, software and embedded software; i.e. a complete ecosystem. One who can do that and supply a complete SoC with authority can be the winner. Why Appleis the most valuable company today? One may criticize it for not being open, but the truth is that Apple is good at both hardware and software. Googleand Microsoftare good at software, but not hardware. Google does great innovation in hardware, but has not been able to pursue it to practice; it has not been able to encash on Google Glass, for example. Samsung is good at hardware, but not at software; they have realized that they need to build their own mobile OS, but have long way to go.
An SoC is not a hardware chip anymore; both software and hardware are its core components. Hence to develop an effective SoC, one has to have core competence in both software and hardware.
Connectivity: In today’s highly connected world, operating at frequencies ranging from baseband to RF in different segments, the right connectivity solution to work with the SoC has to be investigated and planned in the beginning during the architecture stage. Moreover, there are various protocols such as Wi-Fi, Bluetooth, ZigBee, and so on in the wireless domain, and several M2M protocols in the IoT space. Provisions for the right band and the right kind of protocols to be supported have to be planned during the SoC architecture stage. A versatile SoC with programmable microcontroller, appropriate memory and connectors’ support, and support of complete frequency range applicable for a particular segment can become far more successful than an SoC connecting with only some portion of the intended devices in that segment.
Security: In an IoT world, even a small chip in a device located in a remote corner of the world can be globally connected through M2M connections and connection with the internet. In this multi-point connection, every point-to-point connection can be vulnerable to cyber attack. The security aspect has to be dealt with at hardware as well as software level for the whole of the interconnection infrastructure. The data has to be protected at the device level and its authenticity maintained during its transfer. At the device level the data has to be restricted for access only to the authorized individuals; and that has to be done at the source in the hardware, i.e. your SoC. Moreover, an SoC or an IP also has to be protected from being reverse engineered, duplicated or counterfeited. There are several methods which can be employed to secure SoCs. For example, PUFs (Physically Unclonable Functions) can be instantiated on the chip for device authentication and providing secure keys. PUFs can be successfully used to protect and secure memories, smartcards, USBs, and other mobile devices.
[PUF: Protecting Smartcard ICs. Source NXP]
NXP has successfully used PUFs to protect next-generation smartcard ICs. During production or personalization, the IC measures its PUF environment and stores this unique measurement. From then on the IC can repeat the measurement as and when required to check if the environment has changed, thus protecting the card.
Side Channel Attack (SCA), which is non-invasive, is a new kind of threat for ICs. In this, information can be obtained out of a chip based on its power profile, electromagnetic analysis, or even timing analysis. Such attacks have to be prevented by deploying security at the physical level, may be at the leaf cells of the design. As an example, by preventing any variation in power to be detected during the circuit operation, one can secure the chip from SCA on power variation during switching of the circuit. There are newer methods evolving to address security at the SoC level; several terms are in use today such as TRNG (True Random Number Generators), Root-of-Trust, watermarking, and so on. Several methods are being used to detect hardware Trojans.
The point is that for a particular SOC to be designed for certain applications or environments, its security aspects must be thought of during the architecture stage and the same must be implemented from the base level.
Reliability: In SoCs at ultra-low process nodes (20nm and below) having transistors at extremely low noise margins, and SoCs taking care of several functions, reliability cannot be denied or considered after functional implementation. In the pursuit of PPA optimization, reliability is often overlooked. High performance, at high frequency will consume high power and can become a source of heat leading to electro migration and other complications. If the heat is not estimated and rated, and provisions made for its diffusion, then it can deteriorate the life of the device sooner than later. There are tools available to estimate power dissipation, noise, and reliability. Although a CPU’s real power dissipation can be computed, actual consideration should be TDP (Thermal Design Power) for determining temperature ranges and designing appropriate cooling systems where needed. Recently, Inteldesigned its CoreM processor at 14nm technology node that has a TDP range between 3.5W (with down freq 600 MHz) and 6W (with up freq 1.4 GHz) with a nominal value of 4.5W. Interestingly, Apple used Intel’s CoreM for its new MacBook and didn’t need to put a fan in it.
Verifiability and Testability: Various DFT methodologies such as scan chain, built-in self test (BIST), MBIST, and so on are well known for making a design testable. And that has become a usual practice. With an increase in design complexity, now a day there is emphasis on making a design verifiable. The focus is on the micro-architecture that should simplify the verification process. This includes synchronization of the design, minimization of cycle based logic, ensuring the design to be CDC (Clock Domain Crossing) safe, avoiding complex interfaces, and so on. The Verification IP (VIP) should be easily migrated between different design levels, e.g. RTL to gate level. A better verifiable design should also be easily debug-able. For this interfaces need to be defined formally and cleanly. Assertions need to be added at various places to verify conditions.
Serviceability: In large SoC different types of errors may occur at any point of time due to various reasons; however that should not initiate replacement of the whole SoC. There must be provisions in the SoC for easy diagnosis of the problems and their correction. The components that are likely to fail should be easily identifiable and reparable or replaceable.
There can be soft errors as well as hard errors in an SoC. Memories such as DRAMs are very susceptible to errors due to large data storage and activities at extremely high rate of data transmission. These errors can be soft errors, hard errors or retention errors.
Typically, ECC (Error Correcting Code) circuitry is used in the SoC to correct the errors that could be caused in the data. The ECC can be used in various modes of operations depending on the location or errors. For example, ECC scrubbing can be used to check the whole memory array and correct all single-bit errors. Memory sparing is another activity which can instantly replace any failing memory by spare memory in the system. Similarly there are other techniques such as data/address parity, cyclical redundancy checks (CRC) etc. which can be used in the SoCs to address problems and maintain data integrity. Power-on-Self-Test (POST) and Built-in-Self-Test (BIST) are the techniques which can detect hard errors. The hard errors cannot be corrected; they have to be mapped out of the usable area by the operating system.
The idea is to make provisions in the SoCs to have appropriate methods and circuitry for different types of components to be diagnosed and corrected as and when required.
Read the first part, “SoCs in New Context Look beyond PPA” to know about the factors discussed earlier. The two parts of this article provide a general preview of several factors which are important to consider in today’s SoCs, unlike only power, performance and area in earlier chips. Each factor in itself is a complete field in semiconductor and there are several companies providing solutions for the same. It’s not possible for a single company to have solutions for all of these. It requires a collaborative approach for today’s SoCs to find the best possible solution from different companies specializing in these areas and then architect them in best possible manner.
Pawan Kumar Fangaria
Founder & President at www.fangarias.com
The Intel Common Platform Foundry Alliance