1. Objectives
We are going to discover a working environment allowing to use the HLS synthesis tools of the Intel company. The treated design will be a very simple example. The main objective of this lab is to understand the workflow, to learn how to explore the results of the HLS synthesis and to test some first modifications of an existing code.
2. The simple_counter design
Download the tar archive containing the design and extract the file in a work directory. After extraction you should have the following files:
File(.s) | Comment |
---|---|
simple_counter.h, simple_counter.cpp |
The source code of a simple counter |
main.cpp |
The source code of a testbench of the counter |
Makefile |
Automation of the different phases of the synthesis and verification |
We will now take time to examine the contents of each file
2.1. simple_counter.h
This file contains the definition of the interfaces of a component that should behave as a counter.
-
From the point of view of the C++ language, a component is just a kind of function.
-
All the constructs needed for the HLS compiler are defined by the include file HLS/hls.h
The expected counter, should be able to generate a new value at each clock cycle. Classicaly in languages such as Verilog or VHDL, such a counter should have:
-
a clock input
-
a data output (the value of the counter)
-
an eventual reset input to force the counter to 0.
From the point of view of HLS:
-
the clock is implicit : no need for any declaration
-
the reset action is implicit and defined in the C/C++ variable initialisations: no need for any declaration
-
the only interesting value is the counter current value defined by the returned data from the component
The returned data from the component is unsigned int. It means that the counter will be a 32 bits counter
2.2. simple_counter.cpp
This is the core of the code of the counter. The counter code (seen as a function) will be called several times by the testbench part. Each time the counter should add 1 to the previous value of the counter.
In order to keep the value between each evaluation of the counter
component, the current count value has to be stored as a static
variable.
2.3. main.cpp
The main.cpp code contains the testbench. The principle of the testbench is to:
-
first, launch SIZE evaluations of the component and store the SIZE results in an array
-
second: read the SIZE results and compare them with expected results.
During the other labs, most testbenches will follow this kind of construction.
The first phase begin with a loop using the function ihc_hls_enqueue. This function has the following arguments:
-
First argument : a pointer to store the return value of the component function
-
Second argument: a pointer to the component function
-
Other arguments : the original arguments of the function (if any)
As a result, SIZE calls to the simple_counter component are prepared.
Then the call to ihc_hls_component_run_all function will ask the execution of the SIZE calls to the simple_counter component.
The end of the program is classical code used to compare the results to expected results.
2.4. Test of the C++ code of the component
During this test we just want to test if the C/C code of the component generate the right result. The code will be compiled with a classical C compiler. This generic test doesnt need information related to the hardware or frequency target.
For the compilation, run the following command:
make test-gpp
You will see a classical C++ compilation command followed by a message explaining how to run the resulting code.
g++ simple_counter.cpp main.cpp -std=c++17 -I"/comelec/softs/opt/altera/current/hls/include" -L"/comelec/softs/opt/altera/current/hls/host/linux64/lib" -lhls_emul -o test-gpp
You shoud have a new executable file in your current directory named test-gpp.
Execute the code using the provided command:
make run-test-gpp
Yous should receive the message PASSED meaning that the testbench didn’t detect any error.
Up to now, we didn’t start any synthesis task, but this step is needed as it is the first step to fullfill from a designer point of view: does the written C++ code generate the right result ? It is equivalent to the fronted RTL design phase in Verilog and VHDL: writing a RTL code and debugging it up to a right simulation.
2.5. HLS synthesis phase
We will now use the same code, with the Intel compiler i++ for HLS
synthesis. The command call is embedded in the Makefile
. We will
launch the synthesis and examine the messages.
Run the following command:
make test_fpga
The first message expose the executed command:
i++ simple_counter.cpp main.cpp -v -march=5CSEBA6U23I7 -o test-fpga -ghdl=1
We give the compiler informations relative to the specific FPGA of your
DE1 board (-march=…
), and we ask the synthesizer to prepare the
Modelsim simulator to save the generated waveforms up to level one of
hierarchy (option -ghdl=1)
Even for this very small example, the synthesis phase, followed by the modelsim testbench generation size takes more than 1 minute…
Examine the several messages, explaining the differents phases of the compilation.
A new directory test-fpga.prj as been created in your working directory. This directory contains 4 subdirectories:
-
The component directory contains the Verilog generated for the simple_counter component.
-
The reports directory cointains the synthesis reports (as html pages)
-
The verification directory is dedicated to the Modelsim simulation of the component and testbench.
-
The quartus directory contains a Quartus project used to synthesise the generated Verilog code.
2.6. Post synthesis simulation phase
You should have a new executable file in your current directory named test-fpga.
Execute the Verilog simulation using the provided command:
make run-test-fpga
Yous should receive the message PASSED meaning the Verilog testbench didn’t detect any error in the Verilog component.
It’s time now to examine the obtained results.
2.7. Exploration of the HLS synthesis results: the reports
Launch the following command:
make show-syn-results
It should raise a browser window containing reports.
The first opened tab is a summary of the results.
Select the Throughput Analysis/Loop Analysis menu. Then select system in the Loop List. A small table gives precious information on the generated code:
-
Block Scheduled II gives the initiation interval of the bloc. It should be 1. It means that the counter may increment hopfuly it’s value at each cycle…
-
Block Scheduled fMAX An estimation of the maximum achievable frequency of the bloc (remember that we have not yet done any Verilog synthesis)
-
Latency The output of the counter arrives 2 cycles after the increment command.
Select the Area Analysis/Area Analysis menu. A small table gives precious informations on the estimated complexity of the component in terms of:
-
ALUT : number of needed lookup table of the FPGA
-
FF : number of needed D flip-flops
We will further examine complexity using another metric : ALM (Adaptative Logic Modules). In a CycloneV FPGA, an ALM contains 2 ALUTs and 4 DFFs.
2.8. Exploration of the HLS synthesis Verilog generated code
Remember that we want to generate a simple counter… Examine the contents of the file components/simple_counter/simple_counter_internal.v. It contains the interface declaration of the generated RTL module.
The list of the I/Os is:
-
clock : we now have an explicit clock input
-
resetn : we now have an explicit reset input
-
start : this input signal is used by the initiator to ask the module to compute a new value
-
busy : the target set this value to one to block any new request from the initiator.
-
done : the target set this values when the computation is finished (one cycle)
-
stall : used to bloc computation inside the target
-
return_data : a 32 bit vector (coherence with the unsigned int declaration)
HLS generated modules use a predefined protocol for the initiator/target interaction whatever the complexity of the generated module. The following text is an excerpt from Intel HLS reference documentation

Examine the contents of the components/simple_counter/ip/ directory…
HLS synthesis generates a lot of code. If you want to optimize the generated code, the only solution is to modify the initial C/C++ code or the synthesis pragmas. There is no real simple way to modify the generated Verilog code.
2.9. Examination of the simulation results
In the working directory, Execute the following command :
make show-waves
A Modelsim window is opened, showing the testbench and the generated module. Select the signals of the simple_counter_inst module and export them to the waveform window.
-
Zoom to the full window. You will see that the testbench generates a resetn event during a few cycles, then asks the counter to count using start.
-
Examine the busy signal : The counter is never busy: it is due to the initiation interval which is only 1 cycle.
-
Examine the done signal : It’s an exact copy of the start signal: once again it is correlated with the value of the initiation interval.
-
Examine the returndata signal : It has a one cycle delay with the done signal, and is incremented at each request from the initiator
-
Examine the stall signal: It is alway 0. It means that the simple generated testbench doesnt’t try to bloc the target.
Examine the clock period: It has nothing to do with the maximum frequency. This simulation is a purely functional simulation, not correlated with the final frequency of the real implementation.
2.10. Final Verilog synthesis for the target module
We will now perform the Quartus synthesis, in order to have a better estimation of the area and speed of the design.
In the working directory execute the following command:
make test-fpga-quartus
After synthesis, the reports have been updated. You must reload the reports in your browser and examine the summary: - In the CLock Frequency Summary you get the maximum frequency obtained by the Quartus fitter - In the Quartus Estimated Resource Utilization Summary you get the number of ALM and REG used.
2.11. Conclusion
The pure Verilog equivent of this component is:
module simple_counter(input logic clk,
input logic resetn,
input logic enable,
output logic [31:0] count) ;
always @(posedge clk or negedge resetn)
if (!resetn)
count <= '0 ;
else if (enable)
count <= count + 1;
As a first conclusion HLS synthesis should not be used for trivial cases…