Instance

Array
(
    [title] => Recent Forum Threads
    [title_url] => 
    [ignore_sticky] => 0
    [exclude_current] => 0
    [limit] => 10
    [sluglist] => ["jobs-dashboard"]
    [rw_opt] => Array
        (
            [widget_select] => 1
            [pageid_281769] => 1
            [pageid_281772] => 1
        )

    [display_widget_mobile] => 
    [rw_opt_exclude] => Array
        (
            [pageid_274493] => 1
            [cpt_podcast] => 1
            [cpta_podcast] => 1
            [category_16613] => 1
            [category_16631] => 1
            [taxonomy_series] => 1
            [pageid_354254] => 1
        )

    [node_id] => Array
        (
            [0] => 2
        )

)

Threads

Recent Article Comments

TSMC CoWoS versus Intel EMIB Semiconductor Packaging
I think the picture is bit of wrong for the scalability EMIB mentioned as 6X in 26 and CoWoS-L is…

— siliconbruh999 on July 17, 2026
Consolidation and Competition: Who is Winning the $4.5 Billion Interface IP Race?
HPC can be Chiplet. Wondering why UCIe is not considered. Internally AMBA neither

— chiro.lentz on July 11, 2026
The Packaging PDK Is the Missing Layer for Co-Packaged Optics
Thank you to Daniel Nenni and SemiWiki for publishing my latest article: The Packaging PDK Is the Missing Layer for…

— moh.kolb on July 8, 2026
The Packaging PDK Is the Missing Layer for Co-Packaged Optics
Very interesting. Thanks.

— U235 on July 8, 2026
Why Huawei Says It Will Match TSMC’s Most Advanced Chips by 2031
N+3 is denser than N6: https://newsletter.semianalysis.com/p/steel-smic-n3-teardown?open=false

— Fred Chen on July 5, 2026
Why Huawei Says It Will Match TSMC’s Most Advanced Chips by 2031
Fixed, thank you.

— Daniel Nenni on July 4, 2026
Why Huawei Says It Will Match TSMC’s Most Advanced Chips by 2031
The article is not correct. EUV equipment is not primarily produced by ASML. It is only produced by ASML. It…

— AndyG on July 4, 2026
Intel 18A vs Intel 18A-P: What Is the Difference and Why Does It Matter?
Nice writeup

— Rahul Razdan on June 27, 2026
Available Is Not In Control: Balancing Output, Quality, and Risk in High-Volume Fabs
In a DoD centric III-V fab I had wafers run in a few decades ago, yield was miserable, but adequate…

— PBealo on June 27, 2026
Available Is Not In Control: Balancing Output, Quality, and Risk in High-Volume Fabs
Another thing that can help improve availability is a very old but often overlooked basic bedrock: Having good SPC, that…

— benb on June 24, 2026

WP_Term Object
(
    [term_id] => 6435
    [name] => AI
    [slug] => artificial-intelligence
    [term_group] => 0
    [term_taxonomy_id] => 6435
    [taxonomy] => category
    [description] => Artificial Intelligence
    [parent] => 0
    [count] => 849
    [filter] => raw
    [cat_ID] => 6435
    [category_count] => 849
    [category_description] => Artificial Intelligence
    [cat_name] => AI
    [category_nicename] => artificial-intelligence
    [category_parent] => 0
)

January 4, 2018 by Bernard Murphy

How Deep Learning Works, Maybe

How Deep Learning Works, Maybe
by Bernard Murphy on 01-04-2018 at 7:00 am
Categories: AI
3 Comments

Deep learning, modeled (loosely) on the way living neurons interact, has achieved amazing success in automating recognition tasks, from recognizing images more accurately in some cases than we or even experts can, to recognizing speech and written text. The engineering behind this technology revolution continues to advance at a blistering pace, so much so that there are now bidding wars between the giants (Google, FB, Amazon, Microsoft et al) for AI experts commanding superstar paychecks.

It might seem surprising then that we don’t really have a deep understanding of how deep learning works. I’m not talking about what you might call a mechanical understanding of neural nets; that we have down pretty well and we continue to improve through more hidden layers and techniques like sharpening and pooling. We understand how layers recognize features and how together these ultimately lead to recognition of objects. But we don’t have a good understanding of how recognition evolves in training and why ultimately it works as well as it does.

On reflection, this should not be surprising. Whenever technology advances rapidly, theory lags behind and catches up only as technology advances moderate. Some might wonder why we even need theory. We need it because all sustainable major advances eventually need a solid basis of theory if they are to have predictive power. Without that power, figuring out how to build even better solutions and knowing where the limits lie would all depend on trial and error, quickly becoming prohibitively expensive and undependable. Theoretical predictions still have to be tested (and adjusted) in practice but at least you know where to start.

Naftali Tishby of the Hebrew University of Jerusalem has developed an information theory of deep learning as a contribution to this domain, which seems like a pretty reasonable place to start. He makes the point that classical information theory is concerned only with accurate communication without an understanding of the semantics of what is communicated, whereas deep learning is all about the semantics (is this a dog or not a dog?). So an effective theory for deep learning, while following somewhat similar lines to Shannon’s theory, needs to look at loss of “relevant” information rather than loss of any information.

The theory details get quite technical, but what is more immediately accessible are implications for how deep learning evolves, especially as exposed by this team’s work in studying many training experiments on a variety of networks. They mapped the current state of their information metric by planes in a network and looked at how this evolves by epoch (a complete pass through the training data; multiple passes are typically made until error rate is acceptable). Before the first epoch there is high information in the first (labeled) layer and very little in final layers.

As epochs proceed, information with respect to labelling rises rapidly by layer (fitting), until this reaches a transition. At this transition, the network has minimized error in classifying the training examples seen so far, but the interesting part happens in subsequent training. Here detection accuracy does not improve but the number of bits in their input information metric (by plane) begin to drop. Tishby calls this compression; in effect, layers in the network are starting to drop information which is not relevant to the recognition problem. Put another way, during this phase, the network is learning to generalize, ignoring features in training examples which are not relevant to the object of interest.

The theory promises value not only in understanding this evolution but also being able to quantify the value of hidden layers in accelerating the compression phase, also bounds on accuracy, both of which are important in understanding how far this technique can be pushed and into what domains.

This is obviously not the last word on theory for deep learning (explaining unsupervised learning, for example) but is seems like an interesting start. A number of other researchers of note find this work at minimum intriguing and quite possibly an important breakthrough. Others are not so sure. In any case, it is by efforts like this that deeper understanding progresses, and we can certainly use more of that in this field. You can read more on this topic in this Wired article.

Share this post via:

Comments

3 Replies to “How Deep Learning Works, Maybe”

You must register or log in to view/post comments.

TSMC CoWoS versus Intel EMIB Semiconductor Packaging
I think the picture is bit of wrong for the scalability EMIB mentioned as 6X in 26 and CoWoS-L is…

— siliconbruh999 on July 17, 2026
Consolidation and Competition: Who is Winning the $4.5 Billion Interface IP Race?
HPC can be Chiplet. Wondering why UCIe is not considered. Internally AMBA neither

— chiro.lentz on July 11, 2026
The Packaging PDK Is the Missing Layer for Co-Packaged Optics
Thank you to Daniel Nenni and SemiWiki for publishing my latest article: The Packaging PDK Is the Missing Layer for…

— moh.kolb on July 8, 2026
The Packaging PDK Is the Missing Layer for Co-Packaged Optics
Very interesting. Thanks.

— U235 on July 8, 2026
Why Huawei Says It Will Match TSMC’s Most Advanced Chips by 2031
N+3 is denser than N6: https://newsletter.semianalysis.com/p/steel-smic-n3-teardown?open=false

— Fred Chen on July 5, 2026
Why Huawei Says It Will Match TSMC’s Most Advanced Chips by 2031
Fixed, thank you.

— Daniel Nenni on July 4, 2026
Why Huawei Says It Will Match TSMC’s Most Advanced Chips by 2031
The article is not correct. EUV equipment is not primarily produced by ASML. It is only produced by ASML. It…

— AndyG on July 4, 2026
Intel 18A vs Intel 18A-P: What Is the Difference and Why Does It Matter?
Nice writeup

— Rahul Razdan on June 27, 2026

Search Semiwiki

Recent Forum Threads

Recent Article Comments

Recent Podcast Episodes

Comments

3 Replies to “How Deep Learning Works, Maybe”

Recent Forum Threads

Recent Article Comments