ARM estimates that many SOC’s designed today have over 200 IP components. This statistic comes from a recent white paper ARM published addressing the topic of system performance analysis. This number is only going to go up. According the ARM this creates a huge challenge in ensuring the system is designed with adequate performance margins. Conversely, over-design comes with high costs in terms of silicon area and power consumption. Their white paper is the first in a series that talks about how ARM is helping their customers model system performance at the earliest possible stages of the design process.
System performance in a SOC is in part determined by the following factors:
- Processor speed (CPU, GPU, Video, Display, etc.)
- Cache types and sizes
- Memory speed, efficiency and data width
- Effectiveness of IP integration
To facilitate their customers’ implementation projects, ARM has embarked on a program of developing and utilizing performance analysis methodologies on SOC designs. They are pursuing this internally as well as with customers. They see high value in making sure that performance analysis can be done effectively and systematically. ARM describes a multi-step process that they have developed.
As you might expect the first iteration is done with a spreadsheet. This is where the most fundamental issues such as bandwidth and latency are estimated for the first time. Immediately after this ARM constructs a model of the system including the path from the major IP blocks to the memory. This is exercised with verification IP to simulate traffic first for single IP blocks, then IP blocks in combination. This is where latency and traffic management are looked at. If there are no fundamental bottlenecks and the system is dimensioned properly, the next level of analysis can begin.
Key master IP such as CPU’s and GPU’s are run individually to see if they have enough data bus bandwidth with sufficiently low latency. Real world effects like competing traffic, shared resource competition, and other system loads can be modeled for more insight. Code for these tests typically run on bare metal to avoid the complexities of operating system issues. Single and multiple CPU interactions are examined at this stage, including ARM big.LITTLE technology. At this point, there probably are benchmarks or real world test cases to take advantage of.
Video codec performance in the system is tested for the gamut of encode formats, bit rates, pixel type, frame sizes, etc. These are run in isolation and in combination with other IP components. Due to the real-time nature of this data, the penalty for delays is high. If part of a frame is late, the entire frame might need to be dumped. Stress testing at this phase is essential to proper system function later.
Other ancillary functions are layered on for system performance analysis. There may be DMA, DSP, security, communications, and other blocks that all need to be factored in. Also modeling will shift to include code running on OS software.
ARM is running these kinds of analysis internally to validate their IP and the performance levels that finished products incorporating them can achieve. Various levels of abstraction are used – everything from static analysis through RTL, gate, FPGA and silicon. Software simulation and emulation are both used as needed.
ARM also works closely with their licensees, and shares their knowledge and experience in this kind of analysis. The goal is to work early in the process to anticipate system needs well in advance of the point of no return for design decisions. If you are interested in reading this white paper and the upcoming subsequent follow-on piece, look here on the ARM website.
Read Other Articles by Tom Simon