WP_Term Object
(
    [term_id] => 98
    [name] => Andes Technology
    [slug] => andes-technology
    [term_group] => 0
    [term_taxonomy_id] => 98
    [taxonomy] => category
    [description] => 
    [parent] => 14433
    [count] => 12
    [filter] => raw
    [cat_ID] => 98
    [category_count] => 12
    [category_description] => 
    [cat_name] => Andes Technology
    [category_nicename] => andes-technology
    [category_parent] => 14433
    [is_post] => 1
)

Andes Plays an ACE

Andes Plays an ACE
by Paul McLellan on 07-16-2014 at 9:01 am

 There is a perception that ARM is the only microprocessor game in town due to their strong position in many markets, especially mobile. In areas where the instruction set shows through, then this is probably true. There is no rush to build smartphones where the application processor is something else. But even in a phone there are perhaps ten more processors where the instruction set doesn’t show through since the user has no access to the code (bluetooth, audio decode, and so on). For these processors the decision matrix is different. Power, cost and configurability are the important dimensions. As the dominant IP supplier of microprocessors, ARM is not going to have a strategy to be the lowest price and commodify their market. It turns out that they are not the lowest power supplier either. And they are less interested in configurability than others since it doesn’t play to their strength which is that ARM is a standard.

Andes, which I like to describe as the biggest microprocessor company that you’ve never heard of (although you should have by now, I’ve been writing about them since the time I first ran across them at the Linley Mobile Conference about 18 months ago) is a Taiwanese company that historically has done most of their business in Asia. But now they are moving into the US and already have several licensees.

Up until now, AndesCores have had two of the three attributes that users require: they are not as pricey as ARM and they are lower power than equivalent cores. They also have a range of cores from simple low performance, very low power and small up to multi-stage pipeline, high performance and, while not as low power obviously as the slower cores, still very low when measured by MIPS/W.

The reason that customization of the instruction set is so important is that increasingly functionality that used to be implemented in hardware (so Verilog or SystemVerilog) is moving into software for time to market and flexibility reasons. But running software implementations of many DSP and video-processing functions on a general purpose microprocessor is too expensive in terms of power (and sometimes the performance is not enough). For example, MP3 decode on a general purpose microprocessor consumes much more power than doing it on a core with the right additional instructions. And trying to implement an LTE modem or a lot of video processing algorithms on a general purpose microprocessor will fall short on the performance available when running the processor flat out. It seems to now be received wisdom that most of these “offload” functions are best implemented in a processor core optimized for either the specific algorithm or at least for the domain (e.g. video). This gives you 90% of the flexibility of pure software and 90% of the hardware performance/power of a pure hardware implementation.


Now Andes have the EN801 which is the first extensible AndesCore. This is accomplished partially in the way the core itself is configured using ACE, the Andes Custom Extension framework and partially through the software environment used to do the configuration called COPILOT, which, pushing acronyms to some sort of asymptotic limit, stands for Custom OPtimization Instruction develOPment tools.


The EN801 is based on the highly efficient AndesCore N801, which has a 3-stage pipeline. The basic core remains important since it is used to implement the 80% of the code that is rarely executed with excellent power characteristics and reasonable performance. For the other code, the inner loops and so on, additional instructions can be added to implement them very efficiently. Added instructions can be single cycle or multi-cycle, interruptible or non-interruptible, up the 3 reads and 2 writes from the registers. When different instructions have some overlap in functionality, logic sharing is possible.


One example is building a finite impulse response (FIR) filter. Using pure C code and no instruction extensions takes 175 cycles. Adding a FIR instruction reduces this down to 10 cycles. The pure C code also consumes 28 times as much power. Of course there is a cost in terms of added hardware, nearly 7K gates. But you can trade off area, power and performance to hit what you consider the sweet spot, that is one of the attractions of configurability.

Andes website is here.


More articles by Paul McLellan…