WP_Term Object
(
    [term_id] => 15
    [name] => Cadence
    [slug] => cadence
    [term_group] => 0
    [term_taxonomy_id] => 15
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 635
    [filter] => raw
    [cat_ID] => 15
    [category_count] => 635
    [category_description] => 
    [cat_name] => Cadence
    [category_nicename] => cadence
    [category_parent] => 157
)

Cadence loads up on MACs for vision with CNNs

Cadence loads up on MACs for vision with CNNs
by Don Dingee on 05-02-2016 at 4:00 pm
Categories: Cadence, IP, Mobile

For vision DSP IP running convolutional neural networks (CNNs), a big driver of performance is increasing the bits processed per cycle with parallel MACs. Tom Simon did a great job in recent posts of introducing CNNs at a high level, so I’ll look at what is architecturally behind Cadence’s latest announcement: the Tensilica Vision P6 DSP.

CNNs are essentially pattern matching and sifting engines. A server farm goes after a large labeled dataset of images and derives a set of coefficients, or weights in a convolution, in a training procedure. Once those coefficients are defined, they can be loaded into an embedded CNN engine that rapidly processes a new incoming image by sifting it through successive convolutional layers until the desired pattern is found.

However, architecturally speaking looking at the entire incoming image with a CNN is still inefficient and extremely computationally intense. Most of an image is, well, boring – actual information used in recognizing objects is contained in a few candidate regions of interest. A vision DSP can run more conventional algorithms to enhance the image and extract those regions, handing them over to the neural network side for quicker object recognition.

That implies vision DSP IP needs to be good at both jobs, handling image processing and CNNs. Most DSPs have concentrated on the image processing side: more memory bandwidth, VLIW operations, floating point operations, deep pipelining, and more.

Leaps in memory bandwidth to keep the DSP fed are a good thing, but from there what makes a CNN perform well starts to differ. Image sensor data typically comes in with less than 16-bit resolution, and 8-bit coefficients are plenty wide for CNNs. Optimizing 8- and 16-bit operations and launching many operations on small data elements in a single cycle is the way to faster CNNs at the back end of the vision subsystem.

In the Vision P5 DSP, Cadence had the memory bandwidth and pipelining well handled. Just 7 months after the Vision P5 introduction, the opportunity for increasing CNN performance while maintaining image processing performance became clear. In the Vision P6 DSP, the big change is increasing the vector processing capability from 64 MACs to 256 MACs. Other enhancements include FP16 support in the optional 32-way SIMD vector floating point unit, and new custom instruction capability supporting CNNs.

The result is a massive 9728 bits processed per cycle – what Pulin Desai, Director of Product Marketing in the Imaging/Vision Group at Cadence, says is better than twice the current DSP IP competition. Cadence has the Vision P6 DSP targeted at 1.1 GHz in 16nm FF. Software for the Vision P5 DSP will run while users can recompile to take advantage of the new Vision P6 DSP features.

Blending traditional image processing with neural networks in one IP block has a lot of merit, using the strengths of each approach in vision processing to improve performance and reduce power. Seeing a DSP vendor like Cadence go back and optimize 8- and 16-bit operations specifically for CNNs is an architectural twist that may have a large payoff – we only have a self-referencing comparison at this point.

More from Cadence in their press release:

Cadence Announces New Tensilica Vision P6 DSP Targeting Embedded Neural Network Applications

Are CNNs the new battleground for embedded vision? We are seeing the DSP IP and the mobile GPU IP vendors all talking CNNs. Something to watch.

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.

Instance

Array
(
    [node_name] => Cadence
    [node_id] => Array
        (
            [0] => 2
        )

)

Instance

Array
(
    [node_name] => 
    [node_id] => Array
        (
            [0] => 2
        )

    [title] => Recent Forum Threads
)

Threads

Recent Forum Threads

Is SK Hynix Buying Intel’s Ohio Fab? Korean Chipmaker Denies Report — Now All Eyes Are on Earnings

latest reply by Barnsley on July 28, 2026

started by Daniel Nenni on July 22, 2026
Viscosity Is Not Flow Dr. Moh Kolbehdari

started by moh.kolb on July 28, 2026
China Begins Limited Production of Domestic Immersion DUV Machines

latest reply by yanfeng on July 28, 2026

started by tonyget on July 27, 2026
Intel Reports Second-Quarter 2026 Financial Results

latest reply by MKWVentures on July 27, 2026

started by Daniel Nenni on July 23, 2026
Samsung Wins $200 Billion Order to Supply Chips to Broadcom (The NOT TSMC Market Thrives!)

latest reply by siliconbruh999 on July 27, 2026

started by Daniel Nenni on July 25, 2026
How China's DRAM Maker CXMT Caught Up With Micron Without EUV

latest reply by tim_b on July 27, 2026

started by karin623 on July 24, 2026
Will CXMT take over micron in 2030?

latest reply by Barnsley on July 27, 2026

started by DanX on July 24, 2026
Cerebrus and AMD

latest reply by KevinK on July 27, 2026

started by Markwrob on July 24, 2026
'The AI bubble is an OpenAI bubble:' Ed Zitron says the ChatGPT maker is the Lehman Brothers of AI

latest reply by Barnsley on July 27, 2026

started by Daniel Nenni on July 17, 2026
The Vapor Chamber Is Not the Heatsink Dr. Moh Kolbehdari

latest reply by moh.kolb on July 26, 2026

started by moh.kolb on July 26, 2026

Search Semiwiki

Recent Cadence Articles

Comments

Recent Forum Threads