Instance

Array
(
    [title] => Recent Forum Threads
    [title_url] => 
    [ignore_sticky] => 0
    [exclude_current] => 0
    [limit] => 10
    [sluglist] => ["jobs-dashboard"]
    [rw_opt] => Array
        (
            [widget_select] => 1
            [pageid_281769] => 1
            [pageid_281772] => 1
        )

    [display_widget_mobile] => 
    [rw_opt_exclude] => Array
        (
            [pageid_274493] => 1
            [cpt_podcast] => 1
            [cpta_podcast] => 1
            [category_16613] => 1
            [category_16631] => 1
            [taxonomy_series] => 1
            [pageid_354254] => 1
        )

    [node_id] => Array
        (
            [0] => 2
        )

)

Threads

Recent Article Comments

TSMC CoWoS versus Intel EMIB Semiconductor Packaging
I think the picture is bit of wrong for the scalability EMIB mentioned as 6X in 26 and CoWoS-L is…

— siliconbruh999 on July 17, 2026
Consolidation and Competition: Who is Winning the $4.5 Billion Interface IP Race?
HPC can be Chiplet. Wondering why UCIe is not considered. Internally AMBA neither

— chiro.lentz on July 11, 2026
The Packaging PDK Is the Missing Layer for Co-Packaged Optics
Thank you to Daniel Nenni and SemiWiki for publishing my latest article: The Packaging PDK Is the Missing Layer for…

— moh.kolb on July 8, 2026
The Packaging PDK Is the Missing Layer for Co-Packaged Optics
Very interesting. Thanks.

— U235 on July 8, 2026
Why Huawei Says It Will Match TSMC’s Most Advanced Chips by 2031
N+3 is denser than N6: https://newsletter.semianalysis.com/p/steel-smic-n3-teardown?open=false

— Fred Chen on July 5, 2026
Why Huawei Says It Will Match TSMC’s Most Advanced Chips by 2031
Fixed, thank you.

— Daniel Nenni on July 4, 2026
Why Huawei Says It Will Match TSMC’s Most Advanced Chips by 2031
The article is not correct. EUV equipment is not primarily produced by ASML. It is only produced by ASML. It…

— AndyG on July 4, 2026
Intel 18A vs Intel 18A-P: What Is the Difference and Why Does It Matter?
Nice writeup

— Rahul Razdan on June 27, 2026
Available Is Not In Control: Balancing Output, Quality, and Risk in High-Volume Fabs
In a DoD centric III-V fab I had wafers run in a few decades ago, yield was miserable, but adequate…

— PBealo on June 27, 2026
Available Is Not In Control: Balancing Output, Quality, and Risk in High-Volume Fabs
Another thing that can help improve availability is a very old but often overlooked basic bedrock: Having good SPC, that…

— benb on June 24, 2026

WP_Term Object
(
    [term_id] => 157
    [name] => EDA
    [slug] => eda
    [term_group] => 0
    [term_taxonomy_id] => 157
    [taxonomy] => category
    [description] => Electronic Design Automation
    [parent] => 0
    [count] => 4451
    [filter] => raw
    [cat_ID] => 157
    [category_count] => 4451
    [category_description] => Electronic Design Automation
    [cat_name] => EDA
    [category_nicename] => eda
    [category_parent] => 0
)

September 17, 2014 by Pawan Fangaria

Designing the Right Architecture Using HLS

Designing the Right Architecture Using HLS
by Pawan Fangaria on 09-17-2014 at 9:05 am
Categories: EDA

With the advent of HLS tools, general notion which comes to mind is that okay, there’s an automated tool which can optimize your design description written in C++/SystemC and provide you a perfect RTL. In real life, it’s not so, any design description needs hardware designer’s expertise to adopt right algorithm and architecture in order to fulfil the right intent of the design; the desired RTL architecture must be understood before writing the design description. Effectively it’s a hardware design and not software synthesis. So, more than the transformation of an abstract level h/w description to RTL, major contribution of an HLS tool is in improving the QoR (Quality of Results) by tuning the micro-architecture according to HLS constraints and making the design technology specific from technology independence. Calypto’sHLS process using Catapult has a dedicated ‘Architecture Refinement’ stage between ESL Reference Model and ESL Synthesizable Model.

Consider the above example of a simple filter model where ‘multiply and accumulate’ loop can be unrolled for parallelism. The s/w code has bit-accurate types (Algorithmic C, or SystemC) with proper rounding, known sizes, internal taps and external coeffs. This s/w model can be easily synthesized.

Now consider an optimized architecture (reduced area and complexity) of the folded 5-tap filter as shown in the above picture, the coeffs are reduced to 3. The decision to share or unroll summing adders can be made in HLS. As shown in the s/w model, loop merging in HLS can share folding adder which becomes technology dependent.

HLS untimed model is technology and performance neutral. Depending on the system clock, sampling frequency and other design parameters such as throughput, the number of taps and appropriate level of folding or unrolling are decided. The area saving by folding becomes more pronounced with fully unrolled solutions with one sample per clock cycle.

Above is an example of circular buffer RAM implementation with mutually exclusive read and write that allows single port RAM for tap storage. Circular buffer RAM may require large number of taps.

Decimation is a technique to reduce sample rate by discarding samples, say 3 out of 4, and therefore it’s wise to reduce computational overhead for those discarded samples. Polyphase decimation is a concept that computes the required result in phases to reduce this overhead.

A more complex example can be from image processing. Below is a sample code of image windowing – edge detector.

It is inefficient to read an image 9 times for a single image out. For such cases, window & line buffer architecture is needed; a line buffer is a circular buffer delay line implementation with a write and read every cycle. In the above example considering positions 0 through 8 as registers and injecting pixels into position 8 and shifting (with appropriate delay of inputs) will get first pixel_out result at position 4. The line buffer can be implemented using dual port RAM with one read and one write or single port RAM with guaranteed read-before-write behavior or with double-width ping-pong read/write buffering.

In order to implement appropriate h/w for single port RAM, a template can be defined for SPRAM hardware_window class and corresponding SPRAM class constructor and member function are defined. The RAM access operations are appropriately defined for mutually exclusive read and write operations. Similarly, shifting of window pixels, injecting data from delay line and updating the window registers are defined appropriately.

The above image shows synthesis process in Catapult. The RAM array from SPRAM class instance can be mapped to SPRAM library. A 3×3 window on 1920 image width will have 958 deep double width RAM. 12-bit pixels, two lines to buffer and double width will require 48-bit wide RAM.

It’s clear from the above examples that the hardware expertise of a RTL designer proves quite valuable while writing the description at a higher level of abstraction which leads to productivity in design exploration and optimization, and accelerates verification and validation. To know more details and actual synthesis process about these examples, attend an on-line webinar(needs a quick registration on-line) presented by Stuart Clubb from Calypto. Stuart explained the code in great detail, pointing to specific variables, data and operations. It’s a must webinar to attend for designers and ESL specialists exploring to write hardware descriptions for SoCs at system level.

Comments

There are no comments yet.

You must register or log in to view/post comments.

TSMC CoWoS versus Intel EMIB Semiconductor Packaging
I think the picture is bit of wrong for the scalability EMIB mentioned as 6X in 26 and CoWoS-L is…

— siliconbruh999 on July 17, 2026
Consolidation and Competition: Who is Winning the $4.5 Billion Interface IP Race?
HPC can be Chiplet. Wondering why UCIe is not considered. Internally AMBA neither

— chiro.lentz on July 11, 2026
The Packaging PDK Is the Missing Layer for Co-Packaged Optics
Thank you to Daniel Nenni and SemiWiki for publishing my latest article: The Packaging PDK Is the Missing Layer for…

— moh.kolb on July 8, 2026
The Packaging PDK Is the Missing Layer for Co-Packaged Optics
Very interesting. Thanks.

— U235 on July 8, 2026
Why Huawei Says It Will Match TSMC’s Most Advanced Chips by 2031
N+3 is denser than N6: https://newsletter.semianalysis.com/p/steel-smic-n3-teardown?open=false

— Fred Chen on July 5, 2026
Why Huawei Says It Will Match TSMC’s Most Advanced Chips by 2031
Fixed, thank you.

— Daniel Nenni on July 4, 2026
Why Huawei Says It Will Match TSMC’s Most Advanced Chips by 2031
The article is not correct. EUV equipment is not primarily produced by ASML. It is only produced by ASML. It…

— AndyG on July 4, 2026
Intel 18A vs Intel 18A-P: What Is the Difference and Why Does It Matter?
Nice writeup

— Rahul Razdan on June 27, 2026

Search Semiwiki

Recent Forum Threads

Recent Article Comments

Recent Podcast Episodes

Comments

Recent Forum Threads

Recent Article Comments