Instance

Array
(
    [title] => Recent Forum Threads
    [title_url] => 
    [ignore_sticky] => 0
    [exclude_current] => 0
    [limit] => 10
    [sluglist] => ["jobs-dashboard"]
    [rw_opt] => Array
        (
            [widget_select] => 1
            [pageid_281769] => 1
            [pageid_281772] => 1
        )

    [display_widget_mobile] => 
    [rw_opt_exclude] => Array
        (
            [pageid_274493] => 1
            [cpt_podcast] => 1
            [cpta_podcast] => 1
            [category_16613] => 1
            [category_16631] => 1
            [taxonomy_series] => 1
            [pageid_354254] => 1
        )

    [node_id] => Array
        (
            [0] => 2
        )

)

Threads

Recent Article Comments

TSMC CoPoS Versus Intel EMIB Semiconductor Packaging
That is what I thought :-)

— Simon on August 2, 2026
From Photonics Precision to Repeatable Evidence
The article explores why photonic performance alone is not enough. Scalable photonic systems also require calibration, manufacturing correlation, traceable evidence,…

— moh.kolb on August 2, 2026
The Difference Between TSMC CoWoS-S and CoWoS-R
The simple distinction CoWoS-S: Everything communicates through one large piece of silicon. It offers excellent routing density, signal integrity, and…

— Daniel Nenni on August 2, 2026
The Difference Between TSMC CoWoS-S and CoWoS-R
Shoulda included CoWoS-L in this discussion since this is where it's trending.

— Rob McCance on August 2, 2026
TSMC CoPoS Versus Intel EMIB Semiconductor Packaging
Am I the only one who sees the Intel EMIB powerpoints differently than what I've seen elsewhere ? Aren't the…

— ChrisGar on July 31, 2026
DAC 2026: The Trouble with John Cooley’s Troublemaker Panel
Yes, it was recorded. It should be available sometime in August so stay tuned to SemiWiki. We will definitely write…

— Daniel Nenni on July 31, 2026
Formal Acceleration on FPGA. Innovation in Verification
Pretty interesting idea ....

— Rahul Razdan on July 31, 2026
DAC 2026: The Trouble with John Cooley’s Troublemaker Panel
Is a transcript or recording of the panel available?

— skmurphy on July 31, 2026
The Silicon Shield Has Never Been Stronger!
Mutually Assured Destruction. Several good movies about this concept - Dr Strangelove and War Games are two of my favorites.

— EganVector on July 30, 2026
Previewing FMS 2026: The Next Frontier of Enterprise Memory, CXL, and AI-Era Storage
I agree completely! I hope to see you all there!

— Daniel Nenni on July 30, 2026

WP_Term Object
(
    [term_id] => 157
    [name] => EDA
    [slug] => eda
    [term_group] => 0
    [term_taxonomy_id] => 157
    [taxonomy] => category
    [description] => Electronic Design Automation
    [parent] => 0
    [count] => 4471
    [filter] => raw
    [cat_ID] => 157
    [category_count] => 4471
    [category_description] => Electronic Design Automation
    [cat_name] => EDA
    [category_nicename] => eda
    [category_parent] => 0
)

July 27, 2015 by Daniel Payne

Designing an IDCT for H.265 using High Level Synthesis

Designing an IDCT for H.265 using High Level Synthesis
by Daniel Payne on 07-27-2015 at 8:00 pm
Categories: EDA

Math geeks know all about Inverse Discrete Cosine Transforms (IDCT) and a popular use is in the hardware architecture of High Efficiency Video Coding (HEVC), also known as H.265, the new video compression standard and widely used in consumer and industrial video devices. You could go about hand-coding RTL to create an IDCT function, but it would take you too many lines of code and precious engineering time compared to using higher level languages like C++ or SystemC. The promise of using High Level Synthesis (HLS) is that you can actually code your video algorithms in much less time and code compared to RTL, thus getting to market quicker with less engineering effort.

Uday Das from Calypto presented a tutorial at the #52DACevent last month in San Francisco with the subject, “Building an IDCT for H.265 Using Catapult“, so I reviewed the 46 slides and share my impressions in this brief blog. The HEVC specification calls for 4 transform units of various sizes: 4×4, 8×8, 16×16 and 32×32 to code the prediction residual. The hardware architecture here uses a row column decomposition approach that performs a 1-D operation on each row, followed by another 1-D operation on each column:

Algorithm
The IDCT algorithm can be described as a lower order matrix embedded in a higher order matrix, then detailed in a signal flow graph as an 8 point IDCT A8, made up of 4 point 1D IDCT A4 and an odd matrix M4:

Data flow for this algorithm can be designed using two major functions: Butterfly, Mult_odd.

An interface description can then be written in either C or SystemC, where C code is more compact:

A core class can be written and then re-used for the 4, 8, 16 and 32 points of Mult_odd and Butterfly member functions:

The Butterfly function is common for all sizes, and notice that there is no timing information at this level. The HLS tool Catapult will unroll the loop to create hardware for parallel execution.

Our functional model of the 1-D IDCT has instances of function calls and some muxes:

To meet the H.265 specification we have to make a parallel implementation and create a 2-D IDCT using some hierarchy:

Using HLS
Designers use the HLS tool Catapult by adding design files, clicking on a hierarchy tab selecting the top-level blocks, then clicking on libraries to select a specific technology and RAM models. Next you click on mapping an choose a target clock frequency, than map your data_in and data_out as RAM.

You next select your main loop and see which resources are being used in the design:

To schedule when operations are to occur you click on the schedule tab and work with a Gantt chart. Finally, you are ready to generate RTL code.

Verification
To double check that the generated RTL code is actually performing what we had in mind with our algorithm we need to create a testbench and verification flow. Most of this process is now push-button automated for us:

The transactors are what converts function calls into pin-level signal activity.

Summary
The tutorial from DAC showed me that C++ and SystemC coding are more compact to describe my video hardware than using RTL code. The Catapult tool for HLS is used to control micro-architectural decisions so that I can trade off power, performance and area metrics.

Companies like Google have found that using HLS on their VP9 video compression design was 2X faster than the previous approaches using hand-coded RTL, while dramatically reducing the number of lines written. Give the folks at Calypto a call to start discussing how appropriate HLS is for your hardware architecture, you may just find out that you can get your next IP or SoC to market in less time with fewer engineers, a nice benefit.

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.

TSMC CoPoS Versus Intel EMIB Semiconductor Packaging
That is what I thought :-)

— Simon on August 2, 2026
From Photonics Precision to Repeatable Evidence
The article explores why photonic performance alone is not enough. Scalable photonic systems also require calibration, manufacturing correlation, traceable evidence,…

— moh.kolb on August 2, 2026
The Difference Between TSMC CoWoS-S and CoWoS-R
The simple distinction CoWoS-S: Everything communicates through one large piece of silicon. It offers excellent routing density, signal integrity, and…

— Daniel Nenni on August 2, 2026
The Difference Between TSMC CoWoS-S and CoWoS-R
Shoulda included CoWoS-L in this discussion since this is where it's trending.

— Rob McCance on August 2, 2026
TSMC CoPoS Versus Intel EMIB Semiconductor Packaging
Am I the only one who sees the Intel EMIB powerpoints differently than what I've seen elsewhere ? Aren't the…

— ChrisGar on July 31, 2026
DAC 2026: The Trouble with John Cooley’s Troublemaker Panel
Yes, it was recorded. It should be available sometime in August so stay tuned to SemiWiki. We will definitely write…

— Daniel Nenni on July 31, 2026
Formal Acceleration on FPGA. Innovation in Verification
Pretty interesting idea ....

— Rahul Razdan on July 31, 2026
DAC 2026: The Trouble with John Cooley’s Troublemaker Panel
Is a transcript or recording of the panel available?

— skmurphy on July 31, 2026

Search Semiwiki

Recent Forum Threads

Recent Article Comments

Recent Podcast Episodes

Comments

Recent Forum Threads

Recent Article Comments