Instance

Array
(
    [title] => Recent Forum Threads
    [title_url] => 
    [ignore_sticky] => 0
    [exclude_current] => 0
    [limit] => 10
    [sluglist] => ["jobs-dashboard"]
    [rw_opt] => Array
        (
            [widget_select] => 1
            [pageid_281769] => 1
            [pageid_281772] => 1
        )

    [display_widget_mobile] => 
    [rw_opt_exclude] => Array
        (
            [pageid_274493] => 1
            [cpt_podcast] => 1
            [cpta_podcast] => 1
            [category_16613] => 1
            [category_16631] => 1
            [taxonomy_series] => 1
            [pageid_354254] => 1
        )

    [node_id] => Array
        (
            [0] => 2
        )

)

Threads

Recent Article Comments

TSMC N3 Process Technology Wiki
It was down few months ago and yeah I miss the content I had hoped he would make new posts…

— siliconbruh999 on July 17, 2025
TSMC N3 Process Technology Wiki
Agreed, marketing numbers are always misleading. Customers know that which is why they rely on PDK derived numbers. Did you…

— Daniel Nenni on July 17, 2025
TSMC N3 Process Technology Wiki
1.7X density number is misleading as well if we compare the densest library between the improvement is only 1.56X. https://fuse.wikichip.org/news/7375/tsmc-n3-and-challenges-ahead/

— siliconbruh999 on July 17, 2025
Moore’s Law Wiki
Yes, I am trying to teach AI how to do semiconductor wikis and put the Wiki back in SemiWiki. Should…

— Daniel Nenni on July 14, 2025
TSMC N3 Process Technology Wiki
I am trying to teach AI to speak semiconductor wikis. The problem is the date of the references. A 2023…

— Daniel Nenni on July 14, 2025
TSMC N3 Process Technology Wiki
Hmm - what's the source for 0.015-0.016? -- this thread shows 0.0199 (N3B) and 0.021 (N3E) https://semiwiki.com/forum/threads/tsmc-officially-halts-sram-scaling.17223/ Perhaps this source…

— Xebec on July 14, 2025
Moore’s Law Wiki
Are these AI Generated? :)

— Xebec on July 14, 2025
TSMC N3 Process Technology Wiki
It should be 25-30% smaller? Process Node Typical SRAM Cell Size Density Improvement TSMC N5 ~0.021 µm² — TSMC N3…

— Daniel Nenni on July 14, 2025
TSMC N3 Process Technology Wiki
~1.6x denser vs. N5 SRAM I thought the scaling was more like 1.05X? (Various threads here on 'SRAM scaling dead…

— Xebec on July 14, 2025
Facing the Quantum Nature of EUV Lithography
This presentation considers 5 nm Gaussian acid blur: https://www.youtube.com/watch?v=MYLdE69RDBg

— Fred Chen on July 7, 2025

WP_Term Object
(
    [term_id] => 151
    [name] => General
    [slug] => general
    [term_group] => 0
    [term_taxonomy_id] => 151
    [taxonomy] => category
    [description] => 
    [parent] => 0
    [count] => 444
    [filter] => raw
    [cat_ID] => 151
    [category_count] => 444
    [category_description] => 
    [cat_name] => General
    [category_nicename] => general
    [category_parent] => 0
)

August 8, 2014 by Don Dingee

IBM thinks neural nets in chip with 4K cores

IBM thinks neural nets in chip with 4K cores
by Don Dingee on 08-08-2014 at 2:00 pm
Categories: Foundries, General

Neural networks have been the darlings of researchers since the 1940s, but have eluded practical hardware implementations on all but a small scale, or an enormous one given how many processing elements and interconnects are needed. To make significant brain-like decisions, one needs at least several thousand fairly capable cores and a massively configurable interconnect.

An example of how much fun this problem is surfaced a couple years ago, when Google created a software neural network with 16,000 nodes and turned it loose on 10 million YouTube videos looking for cats. (No, they weren’t looking for the text “I can haz cheezburger?” That would be cheating. They were training on pictures of randomly selected cat faces.)

One of the closest attempts at neural net hardware I’ve seen so far is based on Parallella, from Adepteva. The primary processor is a Xilinx Zynq with its FPGA fabric and a dual core ARM Cortex-A9, but there is also a 16 or 64 core Epiphany accelerator on board. Those cores are small, homegrown RISC designs running at 800 MHz with a mesh interconnect.

It’s not hard to envision a cluster of maybe 64 Parallellas working together on a neural net problem – but there still needs to be a software paradigm to program that many cores effectively. OpenCL provides a solid, scalable environment for that task, able to deal with distributed processors in a heterogenous network.

Reading through Nick Oppen’s blog gives a flavor of both the potential and the level of difficulty in such an exercise. Given enough time, programming acumen, and a few thousand bucks (Parallella boards start at $149) and space to set up 64 boards, it’s possible to get that working.

If you’re IBM, and have a bunch of really smart researchers, and cost and development schedule is not an issue, you try to put that on one chip (minus the ARM cores and a fully programmable FPGA – think simple cores and configurable fabric). The project, working under the auspices of the DARPA SyNAPSE initiative, involves both hardware and software.

The specs of the IBM TrueNorth chip just introduced are impressive: 1 million programmable neurons, 256 million programmable synapses, and 4096 neurosynaptic cores. The fascinating feature: there is no clock. The cores aren’t processors in the usual sense – they are event handlers, waiting dormant until a “spike” fires. Fabbed in Samsung 28nm, TrueNorth is IBM’s largest chip to date with 5.4B transistors, yet consumes less than 100mW.

As researcher Dharmendra Modha shares, this isn’t an engine to run compiled C/C++. IBM matched the hardware to the Compass software simulator, a cognitive computing approach introducing the concept of corelets. This is sort of the network-on-chip on steroids; it abstracts and encapsulates intra-network connectivity and intra-core physiology, exposing external inputs and outputs which users connect to.

IBM already has a 16-chip board running, and is targeting an unbelievable scale of 4096 chips in a rack – 4 billion neurons and 1 trillion synapses, in less than 4kW. They are also playing with a retinal camera from iniLabs which produces spikes instead of the traditional 2D imagery requiring DSP handling.

We in EDA are somewhat slaves to the C/C++ or Java programming paradigm, because there is so much software and hardware IP and experience out there. As the Internet of Things unfolds, there will be new programming methods – much as FORTRAN and Pascal waned for general purpose use, I think C/C++ will eventually be supplanted as the underlying architecture morphs.

Before that happens, developments like TrueNorth have to become a lot more cost effective, for sure; this is still research on a pretty intensive scale. However, did we imagine something as inexpensive as Parallella just a few years ago?

Share this post via:

Comments

0 Replies to “IBM thinks neural nets in chip with 4K cores”

You must register or log in to view/post comments.

TSMC N3 Process Technology Wiki
It was down few months ago and yeah I miss the content I had hoped he would make new posts…

— siliconbruh999 on July 17, 2025
TSMC N3 Process Technology Wiki
Agreed, marketing numbers are always misleading. Customers know that which is why they rely on PDK derived numbers. Did you…

— Daniel Nenni on July 17, 2025
TSMC N3 Process Technology Wiki
1.7X density number is misleading as well if we compare the densest library between the improvement is only 1.56X. https://fuse.wikichip.org/news/7375/tsmc-n3-and-challenges-ahead/

— siliconbruh999 on July 17, 2025
Moore’s Law Wiki
Yes, I am trying to teach AI how to do semiconductor wikis and put the Wiki back in SemiWiki. Should…

— Daniel Nenni on July 14, 2025
TSMC N3 Process Technology Wiki
I am trying to teach AI to speak semiconductor wikis. The problem is the date of the references. A 2023…

— Daniel Nenni on July 14, 2025
TSMC N3 Process Technology Wiki
Hmm - what's the source for 0.015-0.016? -- this thread shows 0.0199 (N3B) and 0.021 (N3E) https://semiwiki.com/forum/threads/tsmc-officially-halts-sram-scaling.17223/ Perhaps this source…

— Xebec on July 14, 2025
Moore’s Law Wiki
Are these AI Generated? :)

— Xebec on July 14, 2025
TSMC N3 Process Technology Wiki
It should be 25-30% smaller? Process Node Typical SRAM Cell Size Density Improvement TSMC N5 ~0.021 µm² — TSMC N3…

— Daniel Nenni on July 14, 2025

Search Semiwiki

Recent Forum Threads

Recent Article Comments

Recent Podcast Episodes

Comments

0 Replies to “IBM thinks neural nets in chip with 4K cores”

Recent Forum Threads

Recent Article Comments