In my prior post on NetApp, I discussed how the company’s FlexCache technology can keep distributed design teams in sync. Coordination and collaboration are critical elements of any complex design project. The ability to deliver results quickly while managing the massive amounts of data is also a critical element of success. The storage subsystem for a complex design flow needs to remain fast and efficient as SoC projects scale and this is not easy. When I heard that NetApp’s FlexGroup volumes was specifically designed for the scale and performance demands of 7 and 5 nm designs, I became quite interested. Is NetApp’s FlexGroup volumes a game changer for EDA workflows?
First, let’s put the technology in perspective. NetApp’s core storage operating system is called ONTAP. You can learn more about ONTAP in my prior post. For the last 20 years, NetApp’s FlexVol® volumes have been the gold standard for EDA workloads. But as semiconductor designs have grown in size and complexity, so has the need for scale out and scale up storage performance. FlexGroup volumes were designed specifically to meet the demanding needs of modern EDA workflows and shrinking process nodes. FlexGroup volumes unlocks the extreme performance of NetApp’s enterprise-grade storage systems. The result is game-changing design efficiency to meet quality and time-to-market requirements.
I recently caught up with Tony Gaddis, senior director of performance at NetApp to get some background on FlexGroup volumes. I started my conversation with Tony exploring what EDA workloads look like. Where are the challenges coming from? Tony provided quite a list:
- EDA workflows strive for the shortest possible runtime and thus should always strive to be CPU bound and not I/O bound. That means you want to design your workflow to minimize I/O and maximize CPU utilization, so you need high performance I/O
- EDA workloads are highly parallel (LSF/Grid) meaning 100s to 1,000s of jobs (CPU cores) are hitting the filer at the same. The more jobs you can run in parallel, the faster you complete your projects. Your filer needs to be able scale without running out of available performance
- EDA workloads are extremely high file count (millions to billions of small and large files in a single namespace) and can generate as much as 60-80% meta data I/O (file timestamp, does the file exists, etc.) which consumes available filer performance and often leads to performance bottlenecks
- And the challenges are mounting. 10, 7 and 5 nm designs are creating an explosion of data which compounds the problems
Tony explained that an enhancement of NFS file systems is needed to deal with these challenges and this is what ONTAP FlexGroup volumes deliver.
With FlexGroup volumes, a massive single namespace (up to 20PB and 400 billion files) can easily be provisioned in a matter of seconds. FlexGroup volumes have virtually no capacity or file count constraints outside the physical limits of hardware or the total volume limits of ONTAP – the stated limits of 20PB and 400 billion files are simply a matter of qualification across a 24-node cluster. Tony explained that there is no required maintenance or management overhead (or costs) with a FlexGroup volume. You simply create the FlexGroup volume and mount it as you would a FlexVol. In fact, as of the ONTAP version 9.7 release, you can non-disruptively upgrade an existing FlexVol to a FlexGroup volume. This kind of system management ease means design teams will experience superior up-time and faster support. More game-changing benefits.
How does a FlexGroup volume scale up in terms of performance and capacity? For starters, ONTAP allows up to 24 storage controllers for NAS configurations and up to 12 high availability pairs for six nines of data resiliency and availability.. When a FlexGroup volume is provisioned, ONTAP automatically writes data across the available storage nodes. The data is accessed as a single mount point, transparent to the NAS clients. All these clients see is a massive, high performance bucket to store data. A FlexGroup volume offers distinct advantages over the standard FlexVol volume.
With a FlexVol volume, metadata heavy workloads (i.e., CREATE and SETATTR) such as EDA can become bound to a single CPU thread, which performs serially in ONTAP. In addition, a FlexVol is “owned” by a single node, which means there is only a single node’s CPU, RAM, network and other resources able to apply to that workload at any given time
A FlexGroup volume takes advantage of multiple nodes to process I/O in parallel, which provides concurrency benefits to those EDA workloads.
NetApp stands behind this performance boost and backs it up with published SPEC SFS 2014 software build benchmark results. The software build profile is very similar to the EDA benchmark – heavy write metadata. In these results, ONTAP clusters showed near linear scale as more nodes were added to the workloads and were able to push more overall jobs to the cluster than the competition as a result of the parallelized nature of the NetApp ONTAP FlexGroup volume, as well as upwards of 40GB/s and 3 million IOPS with a 12 node A800 AFF cluster.
You can check out those officially published results here:
The SPEC SF benchmark results below show the difference between a 4-node FlexGroup volume and an 8-node volume. The number of EDA jobs that can run in parallel nearly doubles. Results have demonstrated almost linear performance scaling as more nodes are utilized.
Couple the performance benefits of FlexGroup volumes paired with NetApp’s latest NVME based storage controllers and customers are seeing upwards to 50% faster jobs by moving from traditional spinning drives with FlexGroup volumes to NVME with FlexGroup volumes. All Flash (SSD) based systems with FlexGroup volumes is the new gold standard for EDA workflows.
So, what does all of this performance improvement from FlexGroup mean to the chip designers?
With the higher performance and the ability to scale capacity transparently, it means that the most precious resource of an EDA design cycle, more time, is now available to be used in whatever way is most beneficial. For products with fixed release cycles, more time could mean more QA cycles before release, leading to higher initial product quality. For products with shortened time to market windows, more time could be used for maintaining the same quality but release sooner or it could be used to keep the release schedule and improve the quality with more QA cycles before release.
With the increasing complexity and storage requirements of leading-edge designs, 3nm storage requirements are expected to be 4X larger than 5nm designs, the ability to scale capacity transparently while also improving performance are necessary to design effectively at these leading-edge process nodes.
All this was quite an eye-opener. Sophisticated approaches to storage management can have a huge impact on the efficiency of EDA workloads, and getting to market faster is what it’s all about. NetApp has an excellent Technical Report on the subject entitled “Electronic Design Automation Best Practices”. This document will explain a lot more about FlexGroup volumes and how to deploy them in EDA workloads. You can get a copy of this report here. After perusing some of these resources you will understand how NetApp’s FlexGroup volumes is a game changer for EDA workflows.