Instance

Array
(
    [title] => Recent Forum Threads
    [title_url] => 
    [ignore_sticky] => 0
    [exclude_current] => 0
    [limit] => 10
    [sluglist] => ["jobs-dashboard"]
    [rw_opt] => Array
        (
            [widget_select] => 1
            [pageid_281769] => 1
            [pageid_281772] => 1
        )

    [display_widget_mobile] => 
    [rw_opt_exclude] => Array
        (
            [pageid_274493] => 1
            [cpt_podcast] => 1
            [cpta_podcast] => 1
            [category_16613] => 1
            [category_16631] => 1
            [taxonomy_series] => 1
            [pageid_354254] => 1
        )

    [node_id] => Array
        (
            [0] => 2
        )

)

Threads

Recent Article Comments

Moore’s Law Wiki
Yes, I am trying to teach AI how to do semiconductor wikis and put the Wiki back in SemiWiki. Should…

— Daniel Nenni on July 14, 2025
TSMC N3 Process Technology Wiki
I am trying to teach AI to speak semiconductor wikis. The problem is the date of the references. A 2023…

— Daniel Nenni on July 14, 2025
TSMC N3 Process Technology Wiki
Hmm - what's the source for 0.015-0.016? -- this thread shows 0.0199 (N3B) and 0.021 (N3E) https://semiwiki.com/forum/threads/tsmc-officially-halts-sram-scaling.17223/ Perhaps this source…

— Xebec on July 14, 2025
Moore’s Law Wiki
Are these AI Generated? :)

— Xebec on July 14, 2025
TSMC N3 Process Technology Wiki
It should be 25-30% smaller? Process Node Typical SRAM Cell Size Density Improvement TSMC N5 ~0.021 µm² — TSMC N3…

— Daniel Nenni on July 14, 2025
TSMC N3 Process Technology Wiki
~1.6x denser vs. N5 SRAM I thought the scaling was more like 1.05X? (Various threads here on 'SRAM scaling dead…

— Xebec on July 14, 2025
Facing the Quantum Nature of EUV Lithography
This presentation considers 5 nm Gaussian acid blur: https://www.youtube.com/watch?v=MYLdE69RDBg

— Fred Chen on July 7, 2025
Flynn Was Right: How a 2003 Warning Foretold Today’s Architectural Pivot
Appreciate your take, Rahul. You’re absolutely right that market scale drives architectural investment—scalar dominated when desktop and enterprise ruled, and…

— Jonah McLeod on June 29, 2025
Flynn Was Right: How a 2003 Warning Foretold Today’s Architectural Pivot
Well.. I found this to be a funny article. Flynn's critique is fine and good...but not really the driving factor…

— Rahul Razdan on June 29, 2025
Reachability in Analog and AMS. Innovation in Verification
Apologies for that slip-up on our part. Failing memories!

— Bernard Murphy on June 27, 2025

WP_Term Object
(
    [term_id] => 3611
    [name] => IoT
    [slug] => iot-internet-of-things
    [term_group] => 0
    [term_taxonomy_id] => 3611
    [taxonomy] => category
    [description] => Internet of Things
    [parent] => 0
    [count] => 551
    [filter] => raw
    [cat_ID] => 3611
    [category_count] => 551
    [category_description] => Internet of Things
    [cat_name] => IoT
    [category_nicename] => iot-internet-of-things
    [category_parent] => 0
)

July 20, 2016 by Bernard Murphy

Big Data Lessons from the LHC

Name: HLS Hackathon 2025
Start: 2025-07-30T00:00:00-07:00
End: 2025-07-30T23:59:59-07:00
Location: Online

Big Data Lessons from the LHC
by Bernard Murphy on 07-20-2016 at 7:00 am
Categories: IoT

Big Data techniques have become important in many domains, not just to drive marketing strategies but also for semiconductor design, as evidenced by Ansys’ recent announcements around their use of Big Data analytics. And they should become even more important in the brave new world of the IoT. So it makes sense to look at an organization that is managing bigger data than anyone else in order to understand approaches we may need as we scale.

Before we think about measurement data, CERN (the organization that host the LHC) uses Big Data analytics for control, for the accelerator and instrumentation, independent of data gathering. Why? Because running an accelerator of this class is very complicated. You are accelerating charged particles at very close to the speed of light around a very-high-vacuum tube 27km in circumference, which takes many vacuum pumps, many cryogenic systems, many power controls, many sensors. And that’s just the main accelerator. Add to that the ion source that feeds the accelerator, control for multiple complex detectors and you have a system more complex than any other I can imagine.

Managing all of that is first a giant sensor / actuator / feedback problem (like using IoTs for maintenance on a massive scale) and second a big data problem because the data gathered from those systems is necessarily massive (in one example, just the cryo data runs to a billion records). Complexity is high enough that the system as a whole is in a fault-state 37% of available time. CERN decided that preventive maintenance is not enough to get maximum value out of the LHC, and since they want to plan for the next generation which will be even bigger and more complex, they have worked with multiple partners to build a Big Data analytics systems to better forecast potential problems before they happen.

This is where IoT for maintenance is already moving – not just knowing when something is broken, or scheduled for repair, but being able to do predictive analytics. Perhaps there will be synergies between the work being done at CERN and in other enterprises. Hopefully Oracle (who play a big role in the CERN control systems) can exploit some of these synergies.

The control aspect is critically important, but when most of us are thinking about Big Data and the LHC, we’re probably thinking about managing measurement data – the information that leads to new physics. The largest detector (Atlas, pictured above) generates ~1 petabyte per second of data, far beyond levels you could consider storing. And the vast majority of the data is uninteresting anyway because it only contains known collision events and the goal is to find new physics.

Filtering has to reduce a O(10[SUP]9[/SUP]) event rate to O(10[SUP]2[/SUP]) with low probability of rejecting interesting events, which they accomplish using a series of specialized and massively pipelined triggers (traditional compute would be far too slow for the first stage of triggers). Only after this filtering is data then set on for further processing and storage. The parallel for the IoT world is that no, you can’t just ship all data to the cloud. You have to pre-filter and, depending on how much data your devices produce, you may have to pre-filter very aggressively using very sophisticated logic.

The data that survives filtering still amount to ~30PB/year. This data falls in the Big Data class of “never throw it away”, since you don’t know in advance how it may be used in different analyses. So you want permanent storage, but what you may find interesting is that this is not on disk – it goes to a tape archive (who knew we still had tape?). In fact, they have ~100k processors writing at peaks of 20GB/s to 80 tape drives. The rationale for tape is that cost is still a lot lower than for disk and power requirements are zero when a tape is not being accessed. And since generally users of the data don’t require instantaneous access across the whole dataset, performance is not an issue.

On the other hand, you lose a lot in random access flexibility with tape, so a catalog of metadata is stored online. Once you’ve found what you need, a tape robot will load the appropriate tapes. Could we ever see this for IoT cloud data (or the cloud in general)? There’s arguably a security issue in tapes you can carry away, but since the whole thing is managed by a robot, you might actually have better physical security around a tape vault than we see in conventional systems. Then again maybe we’ll eventually see higher density read-only material advances (an upcoming blog) that will replace both disk and tape.

CERN Big Data is definitely far bigger and far more challenging than we are likely to see in the IoT for some time. Still, I find it interesting to look at how they handle data to get some idea of where we may eventually find ourselves. You can learn more about Big Data for control at the LHC HERE and Big Data for measurement HERE. For the truly dedicated, you can learn about how CERN does real-time filtering of measurement data HERE.

Comments

There are no comments yet.

You must register or log in to view/post comments.

Moore’s Law Wiki
Yes, I am trying to teach AI how to do semiconductor wikis and put the Wiki back in SemiWiki. Should…

— Daniel Nenni on July 14, 2025
TSMC N3 Process Technology Wiki
I am trying to teach AI to speak semiconductor wikis. The problem is the date of the references. A 2023…

— Daniel Nenni on July 14, 2025
TSMC N3 Process Technology Wiki
Hmm - what's the source for 0.015-0.016? -- this thread shows 0.0199 (N3B) and 0.021 (N3E) https://semiwiki.com/forum/threads/tsmc-officially-halts-sram-scaling.17223/ Perhaps this source…

— Xebec on July 14, 2025
Moore’s Law Wiki
Are these AI Generated? :)

— Xebec on July 14, 2025
TSMC N3 Process Technology Wiki
It should be 25-30% smaller? Process Node Typical SRAM Cell Size Density Improvement TSMC N5 ~0.021 µm² — TSMC N3…

— Daniel Nenni on July 14, 2025
TSMC N3 Process Technology Wiki
~1.6x denser vs. N5 SRAM I thought the scaling was more like 1.05X? (Various threads here on 'SRAM scaling dead…

— Xebec on July 14, 2025
Facing the Quantum Nature of EUV Lithography
This presentation considers 5 nm Gaussian acid blur: https://www.youtube.com/watch?v=MYLdE69RDBg

— Fred Chen on July 7, 2025
Flynn Was Right: How a 2003 Warning Foretold Today’s Architectural Pivot
Appreciate your take, Rahul. You’re absolutely right that market scale drives architectural investment—scalar dominated when desktop and enterprise ruled, and…

— Jonah McLeod on June 29, 2025

Search Semiwiki

Recent Forum Threads

Recent Article Comments

Recent Podcast Episodes

Comments

Recent Forum Threads

Recent Article Comments