Instance

Array
(
    [title] => Recent Forum Threads
    [title_url] => 
    [ignore_sticky] => 0
    [exclude_current] => 0
    [limit] => 10
    [sluglist] => ["jobs-dashboard"]
    [rw_opt] => Array
        (
            [widget_select] => 1
            [pageid_281769] => 1
            [pageid_281772] => 1
        )

    [display_widget_mobile] => 
    [rw_opt_exclude] => Array
        (
            [pageid_274493] => 1
            [cpt_podcast] => 1
            [cpta_podcast] => 1
            [category_16613] => 1
            [category_16631] => 1
            [taxonomy_series] => 1
            [pageid_354254] => 1
        )

    [node_id] => Array
        (
            [0] => 2
        )

)

Threads

Recent Forum Threads

COLLAPSE: Intel is Falling Apart

latest reply by Xebec on August 12, 2025

started by soAsian on August 12, 2025
TSM's foundry domination, how far can it go? Quantum Computing?

latest reply by Arthur Hanson on August 12, 2025

started by Arthur Hanson on August 12, 2025
Trump demand's Intel CEO's resignation

latest reply by siliconbruh999 on August 12, 2025

started by siliconbruh999 on August 7, 2025
CNBC - Trump flips on Lip-Bu Tan, calls him "success"

latest reply by Artificer60 on August 12, 2025

started by Xebec on August 11, 2025
Gamers Nexus has a "movie" coming re: Nvidia GPU smuggling into China

latest reply by Paul2 on August 12, 2025

started by Xebec on August 9, 2025
Former Intel CEO Craig Barrett on saving Intel

latest reply by hist78 on August 12, 2025

started by Xebec on August 11, 2025
GlobalWafers has received $200 million from CHIPS Act for US projects

latest reply by eding42 on August 12, 2025

started by Daniel Nenni on August 10, 2025
Tesla shuts down Dojo supercomputer team, reassigns workers amid strategic AI shift, Bloomberg News reports

latest reply by eding42 on August 12, 2025

started by Daniel Nenni on August 8, 2025
Nova Lake to use TSMC N2P for all but Entry Configuration According to Moore's Law is Dead

latest reply by eding42 on August 12, 2025

started by benb on August 10, 2025
mob boss protection/commission money to be paid for sales of chips to China

latest reply by Paul2 on August 12, 2025

started by user nl on August 11, 2025

Recent Article Comments

What is Vibe Coding and Should You Care?
I totally understand, I was in the same domain as you for almost two decades. I started using AI after…

— Debamitro Chakraborti on August 11, 2025
Should Intel be Split in Half?
Hopefully the USG steps in to help: “I met with Mr. Lip-Bu Tan, of Intel, along with Secretary of Commerce,…

— Daniel Nenni on August 11, 2025
Making Intel Great Again!
https://semiwiki.com/forum/threads/cnbc-trump-flips-on-lip-bu-tan-calls-him-success.23382/

— Daniel Nenni on August 11, 2025
What is Vibe Coding and Should You Care?
Thanks for the comment. As more of an observer than a practitioner I report primarily on views from others, adding…

— Bernard Murphy on August 11, 2025
What is Vibe Coding and Should You Care?
The name 'vibe coding' is not serious, but it is very likely that tomorrow's coding becomes what we call vibe…

— Debamitro Chakraborti on August 11, 2025
Should Intel be Split in Half?
I think realistically Intel Foundry won't survive without Intel Products. And the more important part of Intel at this point…

— benb on August 11, 2025
Why I Think Intel 3.0 Will Succeed
It’s the PDKs. MES is fine.

— NEO on August 10, 2025
Making Intel Great Again!
I very much agree with Above quote you need the fab to be tightly coupled with 2-3 customers and only…

— siliconbruh999 on August 10, 2025
Why I Think Intel 3.0 Will Succeed
The main strength of TSMC is well described in the paper : "Virtual factory and relationship marketing—a case study of…

— laurchar on August 10, 2025
Making Intel Great Again!
Unless the acquiring entity has vast experience (and cash) and can successfully institute the needed cultural changes, handing a Fab…

— GYHTBT on August 8, 2025

WP_Term Object
(
    [term_id] => 6435
    [name] => AI
    [slug] => artificial-intelligence
    [term_group] => 0
    [term_taxonomy_id] => 6435
    [taxonomy] => category
    [description] => Artificial Intelligence
    [parent] => 0
    [count] => 659
    [filter] => raw
    [cat_ID] => 6435
    [category_count] => 659
    [category_description] => Artificial Intelligence
    [cat_name] => AI
    [category_nicename] => artificial-intelligence
    [category_parent] => 0
)

January 4, 2018 by Bernard Murphy

How Deep Learning Works, Maybe

How Deep Learning Works, Maybe
by Bernard Murphy on 01-04-2018 at 7:00 am
Categories: AI
3 Comments

Deep learning, modeled (loosely) on the way living neurons interact, has achieved amazing success in automating recognition tasks, from recognizing images more accurately in some cases than we or even experts can, to recognizing speech and written text. The engineering behind this technology revolution continues to advance at a blistering pace, so much so that there are now bidding wars between the giants (Google, FB, Amazon, Microsoft et al) for AI experts commanding superstar paychecks.

It might seem surprising then that we don’t really have a deep understanding of how deep learning works. I’m not talking about what you might call a mechanical understanding of neural nets; that we have down pretty well and we continue to improve through more hidden layers and techniques like sharpening and pooling. We understand how layers recognize features and how together these ultimately lead to recognition of objects. But we don’t have a good understanding of how recognition evolves in training and why ultimately it works as well as it does.

On reflection, this should not be surprising. Whenever technology advances rapidly, theory lags behind and catches up only as technology advances moderate. Some might wonder why we even need theory. We need it because all sustainable major advances eventually need a solid basis of theory if they are to have predictive power. Without that power, figuring out how to build even better solutions and knowing where the limits lie would all depend on trial and error, quickly becoming prohibitively expensive and undependable. Theoretical predictions still have to be tested (and adjusted) in practice but at least you know where to start.

Naftali Tishby of the Hebrew University of Jerusalem has developed an information theory of deep learning as a contribution to this domain, which seems like a pretty reasonable place to start. He makes the point that classical information theory is concerned only with accurate communication without an understanding of the semantics of what is communicated, whereas deep learning is all about the semantics (is this a dog or not a dog?). So an effective theory for deep learning, while following somewhat similar lines to Shannon’s theory, needs to look at loss of “relevant” information rather than loss of any information.

The theory details get quite technical, but what is more immediately accessible are implications for how deep learning evolves, especially as exposed by this team’s work in studying many training experiments on a variety of networks. They mapped the current state of their information metric by planes in a network and looked at how this evolves by epoch (a complete pass through the training data; multiple passes are typically made until error rate is acceptable). Before the first epoch there is high information in the first (labeled) layer and very little in final layers.

As epochs proceed, information with respect to labelling rises rapidly by layer (fitting), until this reaches a transition. At this transition, the network has minimized error in classifying the training examples seen so far, but the interesting part happens in subsequent training. Here detection accuracy does not improve but the number of bits in their input information metric (by plane) begin to drop. Tishby calls this compression; in effect, layers in the network are starting to drop information which is not relevant to the recognition problem. Put another way, during this phase, the network is learning to generalize, ignoring features in training examples which are not relevant to the object of interest.

The theory promises value not only in understanding this evolution but also being able to quantify the value of hidden layers in accelerating the compression phase, also bounds on accuracy, both of which are important in understanding how far this technique can be pushed and into what domains.

This is obviously not the last word on theory for deep learning (explaining unsupervised learning, for example) but is seems like an interesting start. A number of other researchers of note find this work at minimum intriguing and quite possibly an important breakthrough. Others are not so sure. In any case, it is by efforts like this that deeper understanding progresses, and we can certainly use more of that in this field. You can read more on this topic in this Wired article.

Share this post via:

Comments

3 Replies to “How Deep Learning Works, Maybe”

You must register or log in to view/post comments.

What is Vibe Coding and Should You Care?
I totally understand, I was in the same domain as you for almost two decades. I started using AI after…

— Debamitro Chakraborti on August 11, 2025
Should Intel be Split in Half?
Hopefully the USG steps in to help: “I met with Mr. Lip-Bu Tan, of Intel, along with Secretary of Commerce,…

— Daniel Nenni on August 11, 2025
Making Intel Great Again!
https://semiwiki.com/forum/threads/cnbc-trump-flips-on-lip-bu-tan-calls-him-success.23382/

— Daniel Nenni on August 11, 2025
What is Vibe Coding and Should You Care?
Thanks for the comment. As more of an observer than a practitioner I report primarily on views from others, adding…

— Bernard Murphy on August 11, 2025
What is Vibe Coding and Should You Care?
The name 'vibe coding' is not serious, but it is very likely that tomorrow's coding becomes what we call vibe…

— Debamitro Chakraborti on August 11, 2025
Should Intel be Split in Half?
I think realistically Intel Foundry won't survive without Intel Products. And the more important part of Intel at this point…

— benb on August 11, 2025
Why I Think Intel 3.0 Will Succeed
It’s the PDKs. MES is fine.

— NEO on August 10, 2025
Making Intel Great Again!
I very much agree with Above quote you need the fab to be tightly coupled with 2-3 customers and only…

— siliconbruh999 on August 10, 2025

Search Semiwiki

Recent Forum Threads

Recent Article Comments

Recent Podcast Episodes

Comments

3 Replies to “How Deep Learning Works, Maybe”

Recent Forum Threads

Recent Article Comments