WP_Term Object
(
    [term_id] => 24
    [name] => TSMC
    [slug] => tsmc
    [term_group] => 0
    [term_taxonomy_id] => 24
    [taxonomy] => category
    [description] => 
    [parent] => 158
    [count] => 602
    [filter] => raw
    [cat_ID] => 24
    [category_count] => 602
    [category_description] => 
    [cat_name] => TSMC
    [category_nicename] => tsmc
    [category_parent] => 158
)
            
TSMC Banner 2023
WP_Term Object
(
    [term_id] => 24
    [name] => TSMC
    [slug] => tsmc
    [term_group] => 0
    [term_taxonomy_id] => 24
    [taxonomy] => category
    [description] => 
    [parent] => 158
    [count] => 602
    [filter] => raw
    [cat_ID] => 24
    [category_count] => 602
    [category_description] => 
    [cat_name] => TSMC
    [category_nicename] => tsmc
    [category_parent] => 158
)

Cerebras AI Inference Wins Demo of the Year Award at TSMC North America Technology Symposium

Cerebras AI Inference Wins Demo of the Year Award at TSMC North America Technology Symposium
by Daniel Nenni on 12-07-2025 at 2:00 pm

Cerebras TSMC OIP 2025

This is a clear reminder of how important the semiconductor ecosystem is and how closely TSMC works with customers. The TSMC Symposium started 30 years ago and I have been a part of it ever since.  This event is attended by TSMC’s top customers and partners and is the #1 semiconductor networking event of the year, absolutely.

Cerebras Systems, the pioneer in wafer-scale AI acceleration, today announced that its live demonstration of the CS-3 AI inference system received the prestigious Demo of the Year award at the 2025 TSMC North America Technology Symposium in Santa Clara.

The winning demonstration showcased the Cerebras CS-3, powered by the industry’s largest chip, the 4-trillion-transistor Wafer-Scale Engine 3 (WSE-3), delivering real-time, multi-modal inference on Meta’s Llama 3.1 405B model at over 1,800 tokens per second for a single user, and sustaining over 1,000 tokens per second even under heavy concurrent multi-user workloads. Running entirely in memory with no external DRAM bottlenecks, the CS-3 processed complex reasoning, vision-language, and long-context tasks with sub-200-millisecond latency performance previously considered impossible at this scale.

TSMC’s selection committee, composed of senior executives and technical fellows, cited three decisive factors:
  1. Unprecedented single-chip performance on frontier models without multi-node scaling
  2. True real-time interactivity on models larger than 400 billion parameters
  3. Seamless integration of TSMC’s most advanced 5 nm technology with Cerebras’ revolutionary wafer-scale architecture

During the live demo, the CS-3 simultaneously served dozens of concurrent users running Llama 3.1 405B with 128k context windows, answering sophisticated multi-turn questions, generating images from text prompts via integration with Flux.1, and performing real-time document analysis—all while maintaining conversational latency indistinguishable from smaller cloud-based models.

“Wafer-scale computing was considered impossible for fifty years, and together with TSMC we proved it could be done,” said Dhiraj Mallick, COO, Cerebras Systems. “Since that initial milestone, we’ve built an entire technology platform to run today’s most important AI workloads more than 20x faster than GPUs, transforming a semiconductor breakthrough into a product breakthrough used around the world.”

“At TSMC, we support all our customers of all sizes—from pioneering startups to established industry leaders—with industry-leading semiconductor manufacturing technologies and capacities, helping turn their transformative idea into realities,” said Lucas Tsai, Vice President of Business Management, TSMC North America. “We are glad to work with industry innovators likes Cerebras to enable their semiconductor success and drive advancements in AI.”

The CS-3’s memory fabric provides 21 petabytes per second of bandwidth and 44 gigabytes of on-chip SRAM—equivalent to the memory of over 3,000 GPUs—enabling entire 405B-parameter models to reside on a single processor. This eliminates the inter-GPU communication overhead that plagues traditional GPU clusters, resulting in dramatically lower latency and up to 20x higher throughput per dollar on large-model inference.

The recognition comes as enterprises increasingly demand cost-effective, low-latency access to frontier-scale models. Independent benchmarks published last month by Artificial Analysis confirmed the CS-3 as the fastest single-accelerator system for Llama 3.1 70B and 405B inference, outperforming NVIDIA H100 and Blackwell GPU clusters on both tokens-per-second and time-to-first-token metrics.

TSMC’s annual symposium attracts thousands of engineers and executives from across the semiconductor ecosystem. The Demo of the Year award has previously gone to groundbreaking advancements in 3 nm and 2 nm process technology; this year marks the first time an AI systems company has claimed the honor.

Cerebras is now shipping CS-3 systems to customers in healthcare, finance, government, and scientific research. The company also announced general availability of Cerebras Inference Cloud, offering developers instant API access to Llama 3.1 405B at speeds up to 1,800 tokens/second—the fastest publicly available inference for models of this scale.

Bottom line: With this award from TSMC, Cerebras solidifies its position as the performance leader in generative AI inference, proving that wafer-scale computing has moved from bold vision to deployed reality.

Also Read:

TSMC Kumamoto: Pioneering Japan’s Semiconductor Revival

AI-Driven DRC Productivity Optimization: Revolutionizing Semiconductor Design

Exploring TSMC’s OIP Ecosystem Benefits

Breaking the Thermal Wall: TSMC Demonstrates Direct-to-Silicon Liquid Cooling on CoWoS®

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.