Altair is a broad-based company that delivers critical enabling technology across many disciplines that will be familiar to SemiWiki readers. According to its website, Altair delivers open-architecture solutions for data analytics & AI, computer-aided engineering, and high-performance computing (HPC). You can learn more about Altair projects covered on SemiWiki here. Enterprise-level job scheduling is one example of the critical enabling technology Altair delivers. Recent, significant enhancements to the product are the subject of this post. Read on to learn about the latest updates to Altair Accelerator, the industry’s fastest enterprise job scheduler.
What It Is
Accelerator is a high-throughput, enterprise-grade job scheduler. Its application is focused on complex semiconductor design. The architecture is flexible and can support a variety of infrastructures from small, dedicated server farms to complex, distributed high-performance cluster environments.
A tool like this is needed by designers to allow for quick scheduling and resource management for their design tasks across CPU, memory and EDA license utilization. As compute infrastructures become more complex and distributed, there is also a growing need to manage resources to keep throughput high while managing overall costs. From a management perspective, there can be many thousands of jobs to schedule and prioritize each day – a tool with high visibility and low latency is needed to operate successfully in such an environment.
This year, a native Kafka interface was added to enhance visualization across a broad range of information regarding batch scheduling for Accelerator. Information like this is notoriously difficult and costly to capture, so this enhancement is significant. Apache Kafka is an open-source tool for processing multiple sources of streaming data in real-time. Kafka is quite popular, with more than 80% of all Fortune 100 companies using it.
Monitoring of a high-performance batch system can be difficult. Most approaches typically slow down the system because of the reporting tasks, which compete with the tasks to process jobs.
It gets more difficult when there are several consumers of the monitored data. Slowing the refresh rate of monitor data can help, but then the information is not accurate or real-time.
A system such as Kafka helps by supporting many consumers. Data is published once but many consumers can read the message, so there is only one extra load on the batch system instead of many. For multiple clusters, multiple batch systems can be configured that publish to a single Kafka instance.
The frequency of data publishing does require care, even in this setting. How often should the system publish and how is the data extracted from the batch system? It’s important to understand how fast things are changing in the batch system. For example, Altair Accelerator can dispatch several hundred jobs per second, and each dispatch may change the state of 20-30 metrics. That’s a huge volume of data for just some basic measurements.
With its internal metrics system, Accelerator accumulates data over a short time window — around 10 seconds. While this loses resolution at the individual dispatch loop level, the resulting data is often more useful because of the high variance between dispatch loop iterations. Even with such accumulation, there’s still the overhead of getting the data out of the inner loop. Altair chose to directly code the Kafka publisher routines into the batch system core for lower overhead.
To Learn More
The results of the enhancements to Altair Accelerator are significant. You can get more information and view sample reports here. You can also see a short demonstration of the new Accelerator dashboard here. There are also great examples of enhanced data streams that are now possible with the new release here. This information will help you learn about the latest updates to Altair Accelerator, the industry’s fastest enterprise job scheduler.
Share this post via: