As semiconductor manufacturing pushes toward advanced nodes with tighter feature sizes, the optical proximity correction (OPC) workflow is adopting curvilinear masks to achieve the larger process windows that traditional Manhattan geometries cannot deliver.
Traditional Manhattan masks constrain shapes to vertical and horizontal edges, forcing OPC algorithms to approximate curves using many small straight segments. Curvilinear masks use cubic Bezier splines—smooth mathematical curves—to represent shapes naturally, enabling more precise control around corners and curved features where lithography is most challenging. However, this transition introduces a computational challenge that threatens to slow the entire optical proximity correction workflow.
The problem centers on mask rule check (MRC), the validation step that ensures mask designs can be manufactured without defects. For curvilinear masks, MRC can represent a large portion of overall OPC runtime. When teams cannot validate mask manufacturability quickly, they face extended iteration cycles and delayed convergence, creating schedule risk precisely when advanced nodes demand faster time-to-market.
OPC methodologies are transitioning to GPU acceleration to reduce computation time and cost. Use of GPU processing alters the traditional paradigm where mask designs are broken into chunks distributed across CPU cores; instead, a grouping of cores is assigned to share a GPU machine. As seen in Figure 1, CPU cores delegate OPC tasks that benefit from the massive parallelism offered by GPUs, while serial tasks remain local to the CPU. The primary limitation is GPU memory (VRAM), which must hold all relevant data for all tiles being processed. Moving data on and off the GPU adds significant overhead and should be avoided.

Why curvilinear MRC becomes a bottleneck
With Manhattan geometries, calculating the minimum distance between two parallel mask segments is computationally efficient and exact. The answer is simple: the absolute value of the difference between two coordinates. No approximation, no error tolerance, no complexity.
Curvilinear masks don’t share this simplicity. Computing the minimum distance between two arbitrary cubic Bezier curves requires solving a multivariate optimization problem with no efficient closed-form solution. Common methods use iterative approximation methods that are both computationally expensive and inherently approximate. As illustrated in Figure 2, the key MRC problem for cubic Beziers is that we no longer have access to an exact answer. With cubic Beziers, error tolerance and approximation become core to MRC and must be weighed vs. runtime when designing an MRC flow at scale.

The addition of curvature adds complications depending on whether we are comparing Bezier sections within a polygon (internal) or between different polygons (external). These categories involve additional constraints in the form of a normal vector comparison (angle tolerance) and a perimeter check (separation distance).
The recursion problem blocking GPU acceleration
GPUs excel at massively parallel workloads, making them ideal for MRC where each curve pair can be evaluated independently. However, the empirical methods traditionally used for Bezier distance computation rely on recursion, which fundamentally restricts GPU acceleration.
Recursion creates unpredictable memory access patterns and divergent execution paths that undermine the parallel processing model GPUs require. When different threads follow different recursive paths, GPU streaming multiprocessors sit idle waiting for divergent threads to converge, making the massive parallelism that makes GPUs powerful inaccessible.
A GPU-native approach to Bezier MRC
We developed a GPU-native MRC solution that eliminates recursion while delivering superior performance and higher accuracy than traditional CPU approaches. The algorithm is specifically designed for the parallel execution model GPUs require, avoiding the recursion problem.
The approach achieves speedups ranging from 14x to 37x compared to CPU baselines, depending on accuracy specification. This performance improvement comes with an order of magnitude higher accuracy. As shown in Figure 3, for external violations, the error distributions are tightly centered around zero, with more than 90 percent of observed errors confined within one database unit (dbu), the minimum resolvable coordinate increment in the layout.

This tight error distribution demonstrates numerical stability matching or exceeding brute-force reference implementations. Internal violation results exhibit a broader error distribution due to increased geometric complexity incorporating additional constraints such as angle tolerance and intra-shape separation distance evaluation.
The algorithm handles localized angle tolerance, a capability that challenges recursive methods. When validating MRC violations, teams must verify not just the minimum distance between curves but also the angle between them at the closest points. The GPU-native approach performs these coupled evaluations efficiently within the same parallel framework.
Scaling to production with intelligent batching
GPU memory represents the primary constraint when deploying MRC at production scale. Large tiles generate substantial numbers of curve pairs that must be evaluated, and all relevant data must fit in GPU VRAM to avoid expensive data transfer overhead.
We addressed this limitation through a batching mechanism that partitions the workload into smaller subsets processed sequentially. This reduces peak VRAM requirements by an order of magnitude compared to unbatched configurations, enabling memory savings without compromising runtime performance.
Increasing the number of batches leads to significant reduction in peak GPU memory consumption, achieving an order of magnitude improvement compared to the unbatched configuration. For large tile sizes, batching does not adversely affect runtime performance. The observed GPU speedup remains essentially unchanged across different batch counts because GPU streaming multiprocessors remain fully utilized, even when only a subset of curve pairs resides in memory at any given time.
Implications for curvilinear OPC workflows
This GPU-native MRC flow directly addresses one of the primary computational bottlenecks in curvilinear OPC workflows. By delivering substantial performance improvements and superior accuracy within a scalable framework, it lowers runtime for curvilinear OPC in high-volume production environments.
The algorithmic advantages extend beyond immediate performance metrics. The elimination of recursion and the batching strategies developed here provide a foundation for future GPU-native computational lithography algorithms. As the semiconductor industry continues pushing toward lower k1 process nodes and GPU hardware evolves with increasing VRAM capacities, these architectural considerations will remain relevant for scaling to increasingly complex mask designs.
Teams adopting curvilinear masks no longer face the choice between validation accuracy and iteration speed. The GPU-native approach delivers both, removing a critical barrier to curvilinear OPC deployment at production scale. When MRC validation runs 14x to 37x faster with higher accuracy, teams can iterate more rapidly and converge with greater confidence that their masks will manufacture successfully.
Learn more: Download the white paper “GPU-native Bezier mask rule check for high-volume production” to explore the algorithmic innovations and validation results in detail.
Also Read:
Engineering the Next Era of Semiconductor Innovation
Library Characterization gets a Boost from AI
Europe is Getting Serious About ASIC Innovation
Share this post via:


Comments
There are no comments yet.
You must register or log in to view/post comments.