Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/index.php?threads/amazons-graviton-chips-is-amazon-changing-the-game.17130/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2021370
            [XFI] => 1050270
        )

    [wordpress] => /var/www/html
)

Amazon's Graviton Chips, Is Amazon changing the game?

Arthur Hanson

Well-known member
Is Amazon going change the computing world, like they changed retailing? Is this the start of a new trend in computing that lowers the costs and increases the performance for everyone by Amazon using their massive scale that they bring to everything? Any thoughts or comments appreciated. Will this even change TSM's game since they are making the Gravition chip? Could Amazon also change the software market, by offering not only computing power, but also programs on an as use basis?

 
Last edited:
The way I see it, the trend of custom designing chips will only continue and seriously endangers the likes of Intel and AMD long term. Not that they are in trouble soon but I think in the future they will be. That’s why I think you want to be paying attention to the EDA and foundry space as the barriers to entry in that space are prohibitive in either IP or cost. This is my opinion of course and I’m sure those who know more vastly more in the space will have a more nuanced insight. I.E Daniel, Fred etc…
 
The way I see it, the trend of custom designing chips will only continue and seriously endangers the likes of Intel and AMD long term. Not that they are in trouble soon but I think in the future they will be. That’s why I think you want to be paying attention to the EDA and foundry space as the barriers to entry in that space are prohibitive in either IP or cost. This is my opinion of course and I’m sure those who know more vastly more in the space will have a more nuanced insight. I.E Daniel, Fred etc…
So, do you think Amazon may offer a menu of specialized chips and software for different tasks in their data centers?
 
So, do you think Amazon may offer a menu of specialized chips and software for different tasks in their data centers?
I just think the hyperscalers will try to build proprietary when they can. Is it not true Amazon, Google, Apple and Tesla are already amassing significant resources and experience in building proprietary semiconductor software and systems. Microsoft I’m not sure about if someone wants to correct me. Also I am far from qualified to offer expertise on this subject. This is just a trend that I believe is more probable then not going to accelerate. My opinion and I’m glad to critiqued on it.
 
So, do you think Amazon may offer a menu of specialized chips and software for different tasks in their data centers?
I believe he meant everyone is going to make their own ARM, RISC V, and ASIC products. You are already seeing this with all of the non traditional fabless companies (Apple, Google, FB, Amazon, Microsoft, Samsung, and a swarm of small AI and FPGA companies). I see nothing preventing pure chip companies like intel, AMD, Nvidia, MediaTek, and Qualcomm from selling more units year after year. However they need to compete with all of these new players who will reduce their share of semiconductor market growth.
 
Microsoft I’m not sure about if someone wants to correct me. .
I think they slowed down or stopped it for Azure (unless you count the co-development of the dual purpose nature of the xbox series x as a proprietary chip). However they recently announced a new generation of laptop CPUs to succeed their previous ARM SOCs, as well as working hard to make ARM64 support on windows less restrictive.
 
A quick Google search of SemiWiki finds "Graviton" mentioned 249 times, so yes, AWS and other systems companies are designing their own chips to accelerate domain-specific workloads, because CPUs and GPUs aren't efficient enough.
 
I believe he meant everyone is going to make their own ARM, RISC V, and ASIC products. You are already seeing this with all of the non traditional fabless companies (Apple, Google, FB, Amazon, Microsoft, Samsung, and a swarm of small AI and FPGA companies). I see nothing preventing pure chip companies like intel, AMD, Nvidia, MediaTek, and Qualcomm from selling more units year after year. However they need to compete with all of these new players who will reduce their share of semiconductor market growth.
Your last point is important. Intel and AMD are at the greatest risk, IMO, especially in server CPUs. Client CPUs are actually more difficult to design and develop, but less profitable. Several trends are proceeding and accelerating simultaneously. First, the cloud computing companies and Apple have sufficient volumes and high gross margins for their products and services that they can justify the design of custom products for themselves. The Nitro CPU family is just one example. For example, Amazon has designed a specialized processor for accelerating their cloud-based data warehousing service (Redshift), called AGUA, which uses a combination of Nitro CPUs and FPGA logic to perform query predicate selection and answer set processing close to storage pools.

Google has done a video transcoder chip for YouTube that is greatly reducing their need for general purpose CPUs. Google has also designed AI chips (TPUs). Google has also designed and implemented in their production networks their own completely optical network switching chips (using MEMS). It appears as if the Google strategy is to focus chip development on what they can't buy or partner for (like their partnership with Intel for the Infrastructure Processing Units), as opposed to Amazon's more aim-at-CPUs approach.

Of the big cloud companies, Microsoft is the laggard, and seems more interested in creatively deploying FPGAs than ASICs, but their applications (server virtualization, AI, etc.) still reduce the market for general purpose CPUs.

Modern IP libraries also make in-house chip design a lot less labor intensive it was a decade or more ago.

The business case for in-house chip development in the cloud companies looks very enticing. While foundry pricing has a significant margin built-in, the comparatively high marketing, sales, and product management costs which merchant chip companies must include are virtually eliminated. For in-house chips the application engineers tell the chip designer what they want, and the chip designers define exactly what's needed. No marketing and sales teams are necessary. Intel and AMD have huge and expensive organizations for these functions. Eliminating unneeded capabilities and circuits, and maximizing what's important, can result in big advantages in cost and power consumption. I don't have any information on Amazon's fully loaded cost for Graviton CPUs, but I would be surprised if it wasn't at least 50% lower than the price they can get from Intel or AMD.

The trend away from general purpose CPUs and towards in-house chip design has been going on for years. The next real question for AMD and Intel is, how will the PC vendors will compete with the Apple M-series CPUs? For now, it looks like Apple will take much of the high-margin PC and laptop sales, and leave Lenovo, Dell, Asus, etc, with the low-margin stuff, like has happened with iPhones versus Android phones. Apple's lead is huge.
 
The trend away from general purpose CPUs and towards in-house chip design has been going on for years. The next real question for AMD and Intel is, how will the PC vendors will compete with the Apple M-series CPUs? For now, it looks like Apple will take much of the high-margin PC and laptop sales, and leave Lenovo, Dell, Asus, etc, with the low-margin stuff, like has happened with iPhones versus Android phones. Apple's lead is huge.
I doubt this. Most laptops are not purchased for raw performance, and most people don't want to switch to OSX.
 
I doubt this. Most laptops are not purchased for raw performance, and most people don't want to switch to OSX.
Apple laptops are mostly not about performance, because, as you say, most people are not sensitive to it. But far less heat, no fan noise, and lighter weight are attractions. I just had a relative show me his new MacBook Pro 16", and I have to say it was incredibly nice, though for ~$2600 it ought to be nice. The display is simply awesome. "Compared to my HP it was worth every cent.", when I questioned the price. It is difficult to judge the market at this juncture, because Apple is still having supply chain problems limiting Mac sales. I was just reading that in 3Q22 Macs outsold Lenovo. I never thought I'd see the day.
 
. Most laptops are not purchased for raw performance, and most people don't want to switch to OSX.
OMG that is so not true. If Apple computers were the same price as "regular" PCs I bet they would outsell Windows machines in a heartbeat. I bought a Mac so I could rely on it. My Win 10 laptop at work crashes every week or two. My mac mini at home has been running continuously for several months. I've owned it for two years and I think I had to reboot twice during that time. Worth the price premium to be robust.
 
OMG that is so not true. If Apple computers were the same price as "regular" PCs I bet they would outsell Windows machines in a heartbeat. I bought a Mac so I could rely on it. My Win 10 laptop at work crashes every week or two. My mac mini at home has been running continuously for several months. I've owned it for two years and I think I had to reboot twice during that time. Worth the price premium to be robust.
As someone who spends most of his time in Linux and occasionally hopping back to windows for software compatibility reasons, I hear you brother. However good luck getting someone like my dad or sister to move to something that isn’t exactly like he/she already knows. The higher price is also what allows Apple to make a pc of higher quality than most of the PC fair. Combine this with the lower enterprise software compatibility, and I don’t see it happening.
 
OMG that is so not true. If Apple computers were the same price as "regular" PCs I bet they would outsell Windows machines in a heartbeat. I bought a Mac so I could rely on it. My Win 10 laptop at work crashes every week or two. My mac mini at home has been running continuously for several months. I've owned it for two years and I think I had to reboot twice during that time. Worth the price premium to be robust.
I think there's some truth to the part about people not wanting to switch to OSx from Windows. My wife was a long-time Mac person for her home computer, and being her personal IT guy forced me to learn it. I was always a Windows person, because for decades my work computers were Windows and that's what I was familiar with. Also, MSFT Office on OSx used to suck, but now it's equal. I still don't like Apple's AppleID limitations, complexity, and idiosyncrasies. When MSFT did the Win7 to Win8 transition, they lost me for my home systems. I despise touch screens for laptops, and Win8 was touch screen oriented. So I'm typing this on a Mac Mini, albeit an Intel i7 based system. Since I'm retired, I doubt I'll switch back. Corporate IT still likes Windows better, that's where the best management tools are and what their IT people are trained for, though my last work laptop was a high end MacBook. And Windows laptops are usually much less expensive for similar capabilities.

I've got my eye on a new Mac Studio, but retirement and the stock market has made me cheap lately.

My wife's home computer is a Win11 Dell laptop, and her office computer is a Win10 desktop, so I'm still familiar with both environments. I still like Macs better.

My worthless prediction - Apple gets to 20% of laptop sales volumes in five years.
 
We only have linux machines (desktops and durable thinkpad laptops) at my company for 18+ years now. No issues.
 
OMG that is so not true. If Apple computers were the same price as "regular" PCs I bet they would outsell Windows machines in a heartbeat. I bought a Mac so I could rely on it. My Win 10 laptop at work crashes every week or two. My mac mini at home has been running continuously for several months. I've owned it for two years and I think I had to reboot twice during that time. Worth the price premium to be robust.
I also use my MacBook as a daily driver. The user experience and reliability is worth the price premium to me. My last MacBook lasted me 9 years and I only got rid of it because it was old. Obviously for raw horsepower and more software options I use my desktop PC, but I still vastly prefer the day to day experience of OSX when I'm doing most tasks. I'm hoping M3 and beyond prove to be impressive. I also forget to turn it off for months at a time just sleep mode.
 
Last edited:
We call it bespoke silicon but I'm not sure if that name will catch on:


The trillion dollar cloud business is very competitive and if you are using the same chips as your competition how do you differentiate when power and performance is at an absolute premium?

The big cloud companies have been hiring chip people and making acquisitions like Apple did when they decided to create the iProducts 20+ years ago. Now Apple is in control of their silicon and profiting from it greatly.

Having worked with IDMs and fabless chip companies most of my career I can tell you the fabless systems companies (Apple, Google, Amazon, Microsoft, etc...) design chips much differently. When you sell complete systems or services (cloud) the cost of chip design is quite small in the scheme of things. Fabless and IDMs work on much tighter margins and limit design budgets for EDA, IP, Manufacturing, etc...

The result being better chips in regards to performance, power, and area. If anyone thinks Intel, AMD, and Nvidia can compete with domain specific chips coming from the fabless systems companies you are wrong. The big challenge is the software ecosystem of said chips. Intel, AMD, Nvidia have spent many years creating software ecosystems to keep their chips in play. Once the fabless systems companies get the software ecosystem in play for their silicon there will be no stopping them, absolutely.
 
The result being better chips in regards to performance, power, and area. If anyone thinks Intel, AMD, and Nvidia can compete with domain specific chips coming from the fabless systems companies you are wrong. The big challenge is the software ecosystem of said chips. Intel, AMD, Nvidia have spent many years creating software ecosystems to keep their chips in play. Once the fabless systems companies get the software ecosystem in play for their silicon there will be no stopping them, absolutely.
I've been saying it...

The big cloud companies have already solved most of the software ecosystem issues, because they already develop their own software stacks, including programming languages (for example, Go), operating systems (Apple is the best example, but so does Google, Microsoft (obviously), Oracle, and Amazon, network operating systems (SONIC, as an example), networks themselves (like Google Aquila), storage systems (Amazon EC2, among others), numerous databases... I could go on and on. The re-verticalization, if I may badly coin a term, is well underway. Just like the 1960s-1980s were. Personally, I think it's an exciting time for chip designers and application developers.
 
LAS VEGAS--(BUSINESS WIRE)--At AWS re:Invent, Amazon Web Services, Inc. (AWS), an Amazon.com, Inc. company (NASDAQ: AMZN), today announced three new Amazon Elastic Compute Cloud (Amazon EC2) instances powered by three new AWS-designed chips that offer customers even greater compute performance at a lower cost for a broad range of workloads. Hpc7g instances, powered by new AWS Graviton3E chips, offer up to 2x better floating-point performance compared to current generation C6gn instances and up to 20% higher performance compared to current generation Hpc6a instances, delivering the best price performance for high performance computing (HPC) workloads on AWS. C7gn instances, featuring new AWS Nitro Cards, offer up to 2x the network bandwidth and up to 50% higher packet-processing-per-second performance compared to current generation networking-optimized instances, delivering the highest network bandwidth, the highest packet rate performance, and the best price performance for network-intensive workloads. Inf2 instances, powered by new AWS Inferentia2 chips, are purpose built to run the largest deep learning models with up to 175 billion parameters and offer up to 4x the throughput and up to 10x lower latency compared to current-generation Inf1 instances, delivering the lowest latency at the lowest cost for machine learning (ML) inference on Amazon EC2.

“Our customers are constantly demanding faster, more accurate simulations at a lower cost to inform their designs at the early stages of development, and we are already anticipating how the introduction of Amazon EC2 Hpc7g instances with higher performance will help our customers innovate faster and more efficiently.”
Tweet this

AWS has a decade of experience designing chips developed for performance and scalability in the cloud at a lower cost. In that time, AWS has introduced specialized chip designs, which make it possible for customers to run even more demanding workloads with varying characteristics that require faster processing, higher memory capacity, faster storage input/output (I/O) and increased networking bandwidth. Since the introduction of the AWS Nitro System in 2013, AWS has developed multiple AWS-designed silicon innovations, including five generations of the Nitro System, three generations of Graviton chips optimized for performance and cost for a wide range of workloads, two generations of Inferentia chips for ML inference, and Trainium chips for ML training. AWS uses cloud-based electronic design automation as part of an agile development cycle for the design and verification of AWS-designed silicon, enabling teams to innovate faster and make chips available to customers more quickly. AWS has demonstrated that it can deliver a new chip based on a more modern, power-efficient silicon process at a predictable and rapid pace. With each successive chip, AWS delivers a step function improvement in performance, cost, and efficiency to the Amazon EC2 instances hosting them, giving customers even more choice of chip and instance combinations optimized for their unique workload requirements.

“Each generation of AWS-designed silicon—from Graviton to Trainium and Inferentia chips to Nitro Cards—offers increasing levels of performance, lower cost, and power efficiency for a diverse range of customer workloads,” said David Brown, vice president of Amazon EC2 at AWS. “That consistent delivery, combined with our customers’ abilities to achieve superior price performance using AWS silicon, drives our continued innovation. The Amazon EC2 instances we’re introducing today offer significant improvements for HPC, network-intensive, and ML inference workloads, giving customers even more instances to choose from to meet their specific needs.”

Hpc7g instances are purpose built to offer the best price performance for running HPC workloads at scale on Amazon EC2

Organizations across numerous sectors rely on HPC to solve their most complex academic, scientific, and business problems. Today, customers like AstraZeneca, Formula 1, and Maxar Technologies run conventional HPC workloads like genomics processing, computational fluid dynamics (CFD), and weather forecasting simulations on AWS to take advantage of the superior security, scalability, and elasticity it offers. Engineers, researchers, and scientists run their HPC workloads on Amazon EC2 network-optimized instances (e.g., C5n, R5n, M5n, and C6gn) that deliver virtually unlimited compute capacity and high levels of network bandwidth between servers that process and exchange data across thousands of cores. While the performance of these instances is sufficient for most HPC use cases today, emerging applications such as artificial intelligence (AI) and autonomous vehicles require HPC-optimized instances that can further scale to solve increasingly difficult problems and reduce the cost of HPC workloads, which can scale to tens of thousands of cores or more.
Hpc7g instances powered by new AWS Graviton3E processors offer the best price performance for customers’ HPC workloads (e.g., CFD, weather simulations, genomics, and molecular dynamics) on Amazon EC2. Hpc7g instances provide up to 2x better floating-point performance compared to current generation C6gn instances powered by Graviton2 processors and up to 20% higher performance compared to current generation Hpc6a instances, enabling customers to carry out complex calculations across HPC clusters up to tens of thousands of cores. Hpc7g instances also provide high-memory bandwidth and 200 Gbps of Elastic Fabric Adapter (EFA) network bandwidth to achieve faster time to results for HPC applications. Customers can use Hpc7g instances with AWS ParallelCluster, an open-source cluster management tool, to provision Hpc7g instances alongside other instance types, giving customers the flexibility to run different workload types within the same HPC cluster. For more information on HPC on AWS, visit aws.amazon.com/hpc.

C7gn instances offer the best performance for network-intensive workloads with higher networking bandwidth, greater packet rate performance, and lower latency

Customers use Amazon EC2 network-optimized instances to run their most demanding network-intensive workloads like network virtual appliances (e.g., firewalls, virtual routers, and load balancers) and data encryption. Customers need to scale the performance of these workloads to handle increasing network traffic in response to spikes in activity, or to decrease processing time to deliver a better experience to their end users. Today, customers use larger instance sizes to get more network throughput, deploying more compute resources than required, which increases costs. These customers need increased packet-per-second performance, higher network bandwidth, and faster cryptographic performance to reduce data processing times.

C7gn instances, featuring new AWS Nitro Cards powered by new, fifth generation Nitro chips with network acceleration, offer the highest network bandwidth and packet-processing performance across Amazon EC2 network-optimized instances, while using less power. Nitro Cards offload and accelerate I/Ofor functions from the host CPU to specialized hardware to deliver practically all of an Amazon EC2 instance’s resources to customer workloads for more consistent performance with lower CPU utilization. New AWS Nitro Cards enable C7gn instances to offer up to 2x the network bandwidth and up to 50% higher packet-processing-per-second performance, and reduced Elastic Fabric Adapter (EFA) network latency compared to current generation networking-optimized Amazon EC2 instances. C7gn instances deliver up to 25% better compute performance and up to 2x faster performance for cryptographic workloads compared to C6gn instances. Fifth generation Nitro Cards also offer 40% better performance per watt compared to fourth generation Nitro Cards, lowering power consumption for customer workloads. C7gn instances let customers scale for both performance and throughput and reduced network latency to optimize the cost of their most demanding, network-intensive workloads on Amazon EC2. C7gn instances are available today in preview. To learn more about C7gn instances, visit aws.amazon.com/ec2/instance-types/c7g.

Inf2 instances are purpose-built to serve today’s most demanding deep learning model deployments, with support for distributed inference and stochastic rounding

In response to demand for better applications and even more tailored personalized experiences, data scientists and ML engineers are building larger, more complex deep learning models. For example, large language models (LLMs) with more than 100 billion parameters are increasingly prevalent, but they train on enormous amounts of data, driving unprecedented growth in compute requirements. While training receives a lot of attention, inference accounts for the majority of complexity and cost (i.e., for every $1 spent on training, up to $9 is spent on inference) of running machine learning in production, which can limit its use and stall customer innovation. Customers want to use state-of-the-art deep learning models in their applications at scale, but they are constrained by high compute costs. When AWS launched Inf1 instances in 2019, deep learning models were millions of parameters. Since then, the size and complexity of deep learning models have grown exponentially with some deep learning models exceeding hundreds of billions of parameters—a 500x increase. Customers working on next-generation applications using the latest advancements in deep learning want cost-effective, energy-efficient hardware that supports low latency, high throughput inference, with flexible software that enables engineering teams to quickly deploy their latest innovations at scale.

Inf2 instances, powered by new Inferentia2 chips, support large deep learning models (e.g., LLMs, image generation, and automated speech detection) with up to 175 billion parameters, while delivering the lowest cost per inference on Amazon EC2. Inf2 is the first inference-optimized Amazon EC2 instance that supports distributed inference, a technique that spreads large models across several chips to deliver the best performance for deep learning models with more than 100 billion parameters. Inf2 instances support stochastic rounding, a way of rounding probabilistically that enables high performance and higher accuracy as compared to legacy rounding modes. Inf2 instances support a wide range of data types including CFP8, which improves throughput and reduces power per inference, and FP32, which boosts performance of modules that have not yet taken advantage of lower precision data types. Customers can get started with Inf2 instances using AWS Neuron, the unified software development kit (SDK) for ML inference. AWS Neuron is integrated with popular ML frameworks like PyTorch and TensorFlow to help customers deploy their existing models to Inf2 instances with minimal code changes. Since splitting large models across several chips requires fast inter-chip communication, Inf2 instances support AWS’s high-speed, intra-instance interconnect, NeuronLink, offering 192 GB/s of ring connectivity. Inf2 instances offer up to 4x the throughput and up to 10x lower latency compared to current-generation Inf1 instances, and they also offer up to 45% better performance per watt compared to GPU-based instances. Inf2 instances are available today in preview. To learn more about Inf2 instances, visit aws.amazon.com/ec2/instance-types/inf2.

The Water Institute is an independent, non-profit applied research organization that works across disciplines to advance science and develop integrated methods used to solve complex environmental and societal challenges. “The ability to make accurate, near-real-time numerical weather predictions to aid decision making is important to our clients. We’re excited to see Amazon EC2’s high performance computing offerings continue to evolve with the launch of Amazon EC2 Hpc7g instances,” said Zach Cobell, research engineer at The Water Institute. “With increased floating-point performance, higher efficiency using AWS Graviton3E processors, based on Arm architecture, and decreased inter-node latency using Elastic Fabric Adapter, we expect to continue to be able to deliver innovative and sustainable solutions across our computational portfolio.”

Arup is a global collective of designers, engineering and sustainability consultants, advisors and experts dedicated to sustainable development and to using imagination, technology and rigour to shape a better world. “We use AWS to run highly complex simulations to help our customers to build the next generation of high-rise buildings, stadiums, data-centres, and crucial infrastructure, along with assessing and providing insight into urban microclimates, global warming, and climate change that impacts the lives of so many people around the world,” said Dr. Sina Hassanli, senior engineer at Arup. “Our customers are constantly demanding faster, more accurate simulations at a lower cost to inform their designs at the early stages of development, and we are already anticipating how the introduction of Amazon EC2 Hpc7g instances with higher performance will help our customers innovate faster and more efficiently.”

HAProxy Technologies is the company behind HAProxy, the world’s fastest and most widely-used software load balancer. "HAProxy powers modern application delivery at any scale and in any environment, providing the utmost performance, observability, and security for some of the most popular websites in the world,” said Willy Tarreau, lead developer at HAProxy. “When HAProxy tested Amazon EC2 C6gn instances, we found unprecedented performance for a software load balancer. We are excited about the new C7gn instances with Graviton3E and fifth generation AWS Nitro Cards and the networking performance improvements they will bring to our customers.”

Aerospike Inc.'s real-time data platform is designed for organizations to build applications that fight fraud, enable global digital payments, deliver hyper-personalized user experiences to tens of millions of customers, and more. “The Aerospike Real-time Data Platform is a shared-nothing, multithreaded, multimodal data platform designed to operate efficiently on a cluster of server nodes, exploiting modern hardware and network technologies to drive reliably fast performance at sub-millisecond speeds across petabytes of data,” said Lenley Hensarling, chief product officer at Aerospike. “In our recent real-time database read tests, we were pleased to see a significant improvement in transactions per second on Amazon EC2 C7gn instances featuring new AWS Nitro Cards compared to C6gn instances. We look forward to taking advantage of C7gn instances and future AWS infrastructure improvements as they become available.”

Qualtrics designs and develops experience management software. “At Qualtrics, our focus is building technology that closes experience gaps for customers, employees, brands, and products. To achieve that, we are developing complex multi-task, multi-modal deep learning models to launch new features, such as text classification, sequence tagging, discourse analysis, key-phrase extraction, topic extraction, clustering, and end-to-end conversation understanding,” said Aaron Colak, head of Core Machine Learning at Qualtrics. “As we utilize these more complex models in more applications, the volume of unstructured data grows, and we need more performant inference-optimized solutions that can meet these demands, such as Inf2 instances, to deliver the best experiences to our customers. We are excited about the new Inf2 instances, because it will not only allow us to achieve higher throughputs, while dramatically cutting latency, but also introduces features like distributed inference and enhanced dynamic input shape support, which will help us scale to meet the deployment needs as we push towards larger, more complex large models.”

Finch Computing is a natural language technology company providing artificial intelligence applications for government, financial services, and data integrator clients. “To meet our customers’ needs for real-time natural language processing, we develop state-of-the-art deep learning models that scale to large production workloads. We have to provide low-latency transactions and achieve high throughputs to process global data feeds. We already migrated many production workloads to Inf1 instances and achieved an 80% reduction in cost over GPUs,” said Franz Weckesser, chief architect at Finch Computing. “Now, we are developing larger, more complex models that enable deeper, more insightful meaning from written text. A lot of our customers need access to these insights in real-time and the performance on Inf2 instances will help us deliver lower latency and higher throughput over Inf1. With the Inf2 performance improvements and new Inf2 features, such as support for dynamic input sizes, we are improving our cost-efficiency, elevating the real-time customer experience, and helping our customers glean new insights from their data.”


 
Question, does this trend to specialized custom chips mean in the future we will be using our personal and business machines will be this clients that just access state of the art software, processing power and memory that is in a constant state of improvement and equipment utilization? Will these future services be sold and priced at different levels so individuals and companies can buy computing power, software, memory and communication speeds on an ala carte basis according to their needs in the specific time frame they are using it? Will this be the future of computing? Any thoughts on other business models would be appreciated. Also appreciated from an investment standpoint is which companies will win this game and in what areas?
 
Question, does this trend to specialized custom chips mean in the future we will be using our personal and business machines will be this clients that just access state of the art software, processing power and memory that is in a constant state of improvement and equipment utilization? Will these future services be sold and priced at different levels so individuals and companies can buy computing power, software, memory and communication speeds on an ala carte basis according to their needs in the specific time frame they are using it? Will this be the future of computing? Any thoughts on other business models would be appreciated. Also appreciated from an investment standpoint is which companies will win this game and in what areas?
For the most part, cloud computing, as in AWS, is already doing all of what you're asking. While a lot of experts think, or I should say still think, everything in enterprise IT eventually runs in the cloud, I'm not on that page, and the evidence says otherwise. Cloud computing will certainly get an increasing share of the enterprise IT budgets and engineering devops budgets. However, hybrid cloud computing, where some of the hardware is on premises and some is in the cloud, is increasing at just under a 20% CAGR, last I looked. I expect it to increase faster as hybrid cloud software becomes more capable. Hybrid cloud software is, IMO, relatively nascent compared to completely cloud-based software. Some companies like mission-critical data and applications to be local, while they still use cloud computing for more mundane applications, and they want a single operational view. This is one reason among many why hybrid-cloud computing is growing. VMware, for example, is an important provider in the hybrid cloud, and can make where the application executes transparent to the application developers.

There are really two classes of applications when it comes to cloud versus on-prem computing. Applications which are latency-sensitive, and those which are throughput-optimized. By modern latency standards, cloud computing is not a low-latency environment. Video content creation is an often-mentioned example of a latency-sensitive application. On the other hand, some very important workloads, data warehousing and many HPC applications, are throughput problems, and these are two big expansion areas for cloud computing. Snowflake's data warehousing software is a good example of a cloud-native IT throughput application. HPC in the cloud is becoming huge, and both Google and Amazon are investing big time, as Daniel's post mentions for AWS.

One issue with cloud computing... putting data in a cloud environment is generally free, exporting data from the cloud costs, so for some applications on-prem equipment still has a cost advantage.


A person could easily write a very large book on these subjects. The problem is the book will likely be out of date by the time it's published.
 
Last edited:
Back
Top