Part 1: Using HLS for Image Processing (Introduction)

Design space exploration involves exploring different combinations to achieve optimal trade-of between speed and resources. Within HLS, software profiling tools can help to identify processing bottlenecks, enabling more effort to be concentrated on where it can potentially achieve the greatest gains. The HLS tools can provide a reasonably accurate estimate of resources without having to synthesise the resulting RTL. In contrast, manual RTL coding will generally require considerable recoding to change both the data and control paths, making design space exploration at the RTL level both time consuming and error prone.

For each of the designs generated, simulation or verification is significantly faster within the high-level tools. This is because the verification takes place at a higher level. However, it is still necessary to validate the final design at the RTL level to ensure that the algorithm transformations are correct.

For appropriately structured code, modern HLS tools can generate designs that are as efficient as hand-coded RTL in terms of both resources and processing speed.

Algorithms for Hardware

The key factor to remember is that FPGA based design is hardware, not software design. It is not as simple as just compiling code for hardware. With most tools, the algorithm must be written in a particular style to enable the synthesis tools to identify and exploit parallelism. This requires restructuring the code. Without such restructuring, the HLS tools can still derive a hardware realisation, but the resulting hardware can be bloated and suffer from poor performance.

The best algorithms for hardware realisation are different from those used in software. Generally, software is memory based, with all data structures stored in a single monolithic. Many software algorithms, particularly for image processing, are therefore memory bound with their execution speed limited by memory bandwidth. On FPGAs, stream-based processing is efficient for image processing, especially for operations close to the camera (pre-processing) or display where the input and output data are naturally streamed. In some cases, algorithms which rely on random data access have to be restructured to enable stream processing.

Conclusion

While using high level synthesis can provide significant advantages for rapid development and design space exploration, it is no substitute for careful design. It is still essential to consider the hardware being built, and treat the HLS tool as a hardware description language and not as software.

It is necessary to restructure the algorithm to enable HLS to identify and exploit the parallelism. In conclusion, HLS offers many benefits over conventional RTL implementation for FPGA based design. However, simply using a higher-level language does not alleviate the need for appropriate hardware design.

Case Study: Our experience using HLS for Image Processing Application

In our real-time experience with one of our clients in the high-speed sorting OEM Industry, the customer wanted to enhance the image processing algorithms to meet the needs of the market. The original design included microcontrollers that were only capable of performing simple thresholding techniques restricting the quality of inspection. We implemented an image processing algorithm with a FPGA processor exploiting parallelism to achieve more complex image processing within the same cycle time. These algorithms were developed and tested using C++ before using HLS in Vivado toolchain to convert into HDL for porting into the FPGA processor. Click here (share case study link) to read the case study in detail.

Reference: Article titled ‘The advantages and limitations of high-level synthesis for FPGA based image processing’ written by Donald G. Bailey, School of Engineering and Advanced Technology, Massey University, Palmerston North, New Zealand

Case Study: Embedded Vision Implementation Using FPGA Processor