Simulation and Analysis of Power and Thermal Management Policies

Simulation and Analysis of Power and Thermal Management Policies
by Daniel Payne on 11-18-2014 at 10:00 pm

Earlier this month I blogged about Power Management Policies for Android Devices, so this blog is part two in the series and delves into the details of using ESL-level tools for simulation and analysis. The motivation behind all of this is to optimize a power management system during the early design phase, instead of waiting until RTL or logic synthesis to estimate power. RTL and gate level power simulation is often too late and the simulation speed and complexity is not amenable to dynamic use case simulation with interactive active power management techniques. Docea Power has written a white paper, demonstrations and additional information on modeling power-thermal management policies. To request a demo or evaluation, just send an emailor visit their web site.

The Docea approach to model and simulate a power management algorithm has four concepts:

[LIST=1]

  • A power model describing how each hardware block consumes power.
  • An RC thermal model of your system, where the coupled power and thermal models are solved with Aceplorer.
  • A set of real-life scenarios written as a set of processing tasks mapped to processing units, or as a sequence of steps over time.
  • Your power and thermal management algorithm.

    ​Modeling a Power Management Scheme

    With this approach the solver plays the scenario and stops at each timer tick to evaluate the system, then reads the thermal sensors values. Your specific power management algorithm then determines if the solver should change the scenario resolution using an operating mode defined in your power mode table.

    Consider an example SoC with the following characteristics:

    • Four cores modeled as Processing Units (PU), each having its own frequency and voltage.
    • Three components (peripherals and memory): USB Controller, SRAM, Standard Definition Display DAC
    • Voltage/frequency sources and interconnect

    System-level Power Model Schematic

    The PUs have parameters controlled in the scenario by tasks, where a task has a processing load and priority level. The Aceplorer tool has a scheduler to prioritize each task and then schedule them during simulation.

    Each PU can start traffic on the interconnect and to the memories. For this simple example there is only one memory instance, however you can still model different memory types and optimize the memory configuration.

    Power management algorithms for this SoC include:

    • CPU power modes depend on their use rates and idle residencies
    • Thermal sensor values drive the thermal throttling algorithm
    • If all initiators are idle, then shared resources are set to low power mode

    The algorithm for use-rate based power management in our example is:


    Use rate and idle residency-based power manager

    In a similar fashion we can define our thermal algorithm as:

    For the memory and interconnect blocks we decided that they go to low power modes when all initiators are in idle mode.

    For each core in the processor cluster a power mode table is defined as:

    Timers are defined where the use rate is checked every 20 ms and the temperature every 10 ms.

    Scenarios are modeled visually in Aceplorer and they have a sequence of tasks where each is mapped to a core (PU). Here’s the initialization step followed by four tasks, each running on its own core:

    You can simulate and compare various power management schemes to see which is more optimal for lowest power, operation within a thermal envelope, best throughput and tradeoffs between battery life, performance and ergonomic, safety, reliability or thermal behavior.

    For comparison, four simulations were generated: a baseline cases Unconstrained: 1.2 GHz (maximum operating frequency) @ high voltage (1.2 volt), with No Power management. Three Power-Thermal management policies: On-demand governor, Hot-plug governor and Thermal management policies. The power for the unconstrained case and each power management simulation is summarized below:

    In the unconstrained case the processing unit core cluster junction temperature could exceed the desired reliability limit resulting in thermal runaway or shut down after a prolonged period.


    CPU Core cluster junction temp (blue) and power (red)

    To evaluate task consumption, which is a proxy for application performance, we can compare the task execution time for each policy: thermal management, on-demand and hot plug power-thermal management schemes.

    In the thermal management algorithm a simple frequency control is used. In the on demand policy model frequency control and clock gating are used. In the hot plug scheme we also power gate cores if the idle residency is > 10 ms. In the thermal management policy we reduce frequency if the temp exceeds 80 C and increase frequency when the junction temp is less than 75C for each core. A broader range and other thresholds can also be applied for sensitivity analysis.

    Use-rate, idle residency power management such as on-demand or hot plug algorithms can maintain a thermal envelope and may also increase low power state residency to buy back thermal headroom. The on-demand and hot plug policies may provide an additional advantage of providing better applications specific performance than simple DFVS or a conservative fixed operating point.

    The figure below shows some of the different reports profiles obtained using the two power management algorithms on one of the processing units (core1). Similar differences are seen on the other cores for these types of simulations.

    Conclusions:

    While it may be difficult, it is quite important to evaluate Power Thermal Management (PTM) strategies early in the product development process. It is critical to analyze realistic use case scenarios with fast simulation speed and configurability. Getting meaningful results at a reasonable simulation time is the name of the game:

    • In the early stages of design different HW choices and SW policies may apply: exploration and what if analysis will enable you to optimize the hardware (thermal aware floor planning, power intent definition, packaging and assembly) and the software (governors, drivers).
    • There are a large number of use cases and corner cases to validate before delivering the code to a customer (different ambient temperatures, different process corners, different packaging enclosures)
    • Software changes may occur at a much faster pace and at different development timeline than the hardware. Simulations running for a few hours or days on emulators and low level design tools may be unacceptable to SW developers.

    What deters current solutions from obtaining reasonable Power thermal management simulations speed?

    • Power thermal management policy development should include thermal monitor and sensor data. Computational Fluid Dynamic (CFD) models are large and take too long to run dynamic use cases driven by software execution.
    • Virtual Platforms do not have temperature feedback which is crucial to get any meaningful leakage results for demanding use cases on modern SoCs.

    Why is it possible with the Docea approach?

    • The power behavior of an IP is described in a model-based approach. All the power related information is available: voltage tree, functional clock tree, power states for IPs, current consumption dependencies to temperature, operating points, activity, load or traffic. Even non-linear behavior of voltage regulators efficiencies can be modeled.
    • The thermal model: Docea proposes a solution for automatic generation of compact RC thermal networks that are multi-source that can represent the multi-layer stack and the complete assembled module or system (multiple chips on a board or a complete phone).
    • Docea Power’s solver takes into account the coupling between power and thermals. As many thermal management decisions are taken based on the leakage power consumed at a given time, this coupling is a must have for any realistic PTM simulation.
    • PTM strategies are algorithms describing when and how operating conditions (frequencies and voltages) change given the state of the system (use rate of processing cores, temperature of the die or of the case). Scenarios can be described as a sequence of processing loads. The processing depends on the operating conditions and the strategies.

  • 0 Replies to “Simulation and Analysis of Power and Thermal Management Policies”

    You must register or log in to view/post comments.