1. Objectives

We are going to discover a working environment allowing to use the HLS synthesis tools of the Intel company. The treated design will be a very simple example. The main objective of this lab is to understand the workflow, to learn how to explore the results of the HLS synthesis and to test some first modifications of an existing code.

2. The simple_counter design

Download the tar archive containing the design and extract the file in a work directory. After extraction you should have the following files:

File(.s) Comment

simple_counter.h, simple_counter.cpp

The source code of a simple counter

main.cpp

The source code of a testbench of the counter

Makefile

Automation of the different phases of the synthesis and verification

We will now take time to examine the contents of each file

2.1. simple_counter.h

This file contains the definition of the interfaces of a component that should behave as a counter.

  • From the point of view of the C++ language, a component is just a kind of function.

  • All the constructs needed for the HLS compiler are defined by the include file HLS/hls.h

The expected counter, should be able to generate a new value at each clock cycle. Classicaly in languages such as Verilog or VHDL, such a counter should have:

  • a clock input

  • a data output (the value of the counter)

  • an eventual reset input to force the counter to 0.

From the point of view of HLS:

  • the clock is implicit : no need for any declaration

  • the reset action is implicit and defined in the C/C++ variable initialisations: no need for any declaration

  • the only interesting value is the counter current value defined by the returned data from the component

The returned data from the component is unsigned int. It means that the counter will be a 32 bits counter

2.2. simple_counter.cpp

This is the core of the code of the counter. The counter code (seen as a function) will be called several times by the testbench part. Each time the counter should add 1 to the previous value of the counter.

In order to keep the value between each evaluation of the counter component, the current count value has to be stored as a static variable.

2.3. main.cpp

The main.cpp code contains the testbench. The principle of the testbench is to:

  • first, launch SIZE evaluations of the component and store the SIZE results in an array

  • second: read the SIZE results and compare them with expected results.

During the other labs, most testbenches will follow this kind of construction.

The first phase begin with a loop using the function ihc_hls_enqueue. This function has the following arguments:

  • First argument : a pointer to store the return value of the component function

  • Second argument: a pointer to the component function

  • Other arguments : the original arguments of the function (if any)

As a result, SIZE calls to the simple_counter component are prepared.

Then the call to ihc_hls_component_run_all function will ask the execution of the SIZE calls to the simple_counter component.

The end of the program is classical code used to compare the results to expected results.

2.4. Test of the C++ code of the component

During this test we just want to test if the C/C code of the component generate the right result. The code will be compiled with a classical C compiler. This generic test doesnt need information related to the hardware or frequency target.

For the compilation, run the following command:

make test-gpp

You will see a classical C++ compilation command followed by a message explaining how to run the resulting code.

g++ simple_counter.cpp main.cpp  -std=c++17 -I"/comelec/softs/opt/altera/current/hls/include" -L"/comelec/softs/opt/altera/current/hls/host/linux64/lib" -lhls_emul -o test-gpp

You shoud have a new executable file in your current directory named test-gpp.

Execute the code using the provided command:

make run-test-gpp

Yous should receive the message PASSED meaning that the testbench didn’t detect any error.

Up to now, we didn’t start any synthesis task, but this step is needed as it is the first step to fullfill from a designer point of view: does the written C++ code generate the right result ? It is equivalent to the fronted RTL design phase in Verilog and VHDL: writing a RTL code and debugging it up to a right simulation.

2.5. HLS synthesis phase

We will now use the same code, with the Intel compiler i++ for HLS synthesis. The command call is embedded in the Makefile. We will launch the synthesis and examine the messages.

Run the following command:

make test_fpga

The first message expose the executed command:

i++ simple_counter.cpp main.cpp  -v   -march=5CSEBA6U23I7 -o test-fpga -ghdl=1

We give the compiler informations relative to the specific FPGA of your DE1 board (-march=…​), and we ask the synthesizer to prepare the Modelsim simulator to save the generated waveforms up to level one of hierarchy (option -ghdl=1)

Even for this very small example, the synthesis phase, followed by the modelsim testbench generation size takes more than 1 minute…

Examine the several messages, explaining the differents phases of the compilation.

A new directory test-fpga.prj as been created in your working directory. This directory contains 4 subdirectories:

  • The component directory contains the Verilog generated for the simple_counter component.

  • The reports directory cointains the synthesis reports (as html pages)

  • The verification directory is dedicated to the Modelsim simulation of the component and testbench.

  • The quartus directory contains a Quartus project used to synthesise the generated Verilog code.

2.6. Post synthesis simulation phase

You should have a new executable file in your current directory named test-fpga.

Execute the Verilog simulation using the provided command:

make run-test-fpga

Yous should receive the message PASSED meaning the Verilog testbench didn’t detect any error in the Verilog component.

It’s time now to examine the obtained results.

2.7. Exploration of the HLS synthesis results: the reports

Launch the following command:

make show-syn-results

It should raise a browser window containing reports.

The first opened tab is a summary of the results.

Select the Throughput Analysis/Loop Analysis menu. Then select system in the Loop List. A small table gives precious information on the generated code:

  • Block Scheduled II gives the initiation interval of the bloc. It should be 1. It means that the counter may increment hopfuly it’s value at each cycle…

  • Block Scheduled fMAX An estimation of the maximum achievable frequency of the bloc (remember that we have not yet done any Verilog synthesis)

  • Latency The output of the counter arrives 2 cycles after the increment command.

Select the Area Analysis/Area Analysis menu. A small table gives precious informations on the estimated complexity of the component in terms of:

  • ALUT : number of needed lookup table of the FPGA

  • FF : number of needed D flip-flops

We will further examine complexity using another metric : ALM (Adaptative Logic Modules). In a CycloneV FPGA, an ALM contains 2 ALUTs and 4 DFFs.

2.8. Exploration of the HLS synthesis Verilog generated code

Remember that we want to generate a simple counter… Examine the contents of the file components/simple_counter/simple_counter_internal.v. It contains the interface declaration of the generated RTL module.

The list of the I/Os is:

  • clock : we now have an explicit clock input

  • resetn : we now have an explicit reset input

  • start : this input signal is used by the initiator to ask the module to compute a new value

  • busy : the target set this value to one to block any new request from the initiator.

  • done : the target set this values when the computation is finished (one cycle)

  • stall : used to bloc computation inside the target

  • return_data : a 32 bit vector (coherence with the unsigned int declaration)

HLS generated modules use a predefined protocol for the initiator/target interaction whatever the complexity of the generated module. The following text is an excerpt from Intel HLS reference documentation

component

Examine the contents of the components/simple_counter/ip/ directory…

HLS synthesis generates a lot of code. If you want to optimize the generated code, the only solution is to modify the initial C/C++ code or the synthesis pragmas. There is no real simple way to modify the generated Verilog code.

2.9. Examination of the simulation results

In the working directory, Execute the following command :

make show-waves

A Modelsim window is opened, showing the testbench and the generated module. Select the signals of the simple_counter_inst module and export them to the waveform window.

  • Zoom to the full window. You will see that the testbench generates a resetn event during a few cycles, then asks the counter to count using start.

  • Examine the busy signal : The counter is never busy: it is due to the initiation interval which is only 1 cycle.

  • Examine the done signal : It’s an exact copy of the start signal: once again it is correlated with the value of the initiation interval.

  • Examine the returndata signal : It has a one cycle delay with the done signal, and is incremented at each request from the initiator

  • Examine the stall signal: It is alway 0. It means that the simple generated testbench doesnt’t try to bloc the target.

Examine the clock period: It has nothing to do with the maximum frequency. This simulation is a purely functional simulation, not correlated with the final frequency of the real implementation.

2.10. Final Verilog synthesis for the target module

We will now perform the Quartus synthesis, in order to have a better estimation of the area and speed of the design.

In the working directory execute the following command:

make test-fpga-quartus

After synthesis, the reports have been updated. You must reload the reports in your browser and examine the summary: - In the CLock Frequency Summary you get the maximum frequency obtained by the Quartus fitter - In the Quartus Estimated Resource Utilization Summary you get the number of ALM and REG used.

2.11. Conclusion

The pure Verilog equivent of this component is:

module simple_counter(input logic clk,
    input logic resetn,
    input logic enable,
    output logic [31:0] count) ;

always @(posedge clk or negedge resetn)
if (!resetn)
    count <= '0 ;
else if (enable)
    count <= count + 1;

As a first conclusion HLS synthesis should not be used for trivial cases…