HLS Lab 2 : An audio filter on the DE1-Soc Board.

1. Objectives

We will use the audio capabilities of the DE1-SoC board in order the implement an audio filter. The DE1-SoC contains a Wolfson WM8731 audio codec connected to the CycloneV fpga.

The FPGA will receive the stream of audio samples, execute a low pass filter, and stream out the filtered samples. Using headphones connected to the line out connector of the board, students will be able to verify the behavior of the board.

The objective of the lab is to learn how to use the AVALON Stream protocol using HLS synthesis for audio data streaming. Furthermore, the NiosII processor included in the design should be able to start and stop the filter using an access to the filter through Avalon Memory Map protocol.

The HLS code and the whole project will be build almost from scratch, in six steps…

2. STEP0: The initial archive

Download the tar archive containing some initial files and directories. Extract the files in a work directory. After extraction you should have several directories related to each step of the lab.

3. STEP1: A pure software project

Open a terminal and change to the directory named software_demo. We already prepared some code for you.

If you execute make you get the list of the predefined targets of the Makefile.

Do not compile anythink yet !!
Connect your headphones to the headphones output of your workstation.

3.1. Playing an original audio sequence

Execute the following command:

make play_original_wave

You should here a short piano sequence. If not, ask your professor to help you to configure your session.

CHECK AND REMEMBER the frequency parameter of the audio sequence, you will use it for other steps.

3.2. PLaying a modified audio sequence

Execute now the following command:

make play_noisy_wave

You should hear the same audio sequence with an added 8kHz continuous tone added.

3.3. Creating a FIR low-pass filter

We want now to create a low-pass filter in order to remove the 8kHz continuous tone.

Open a web page on the following website. This is online FIR filter generator. On the lower-left of the screen, we will choose the necessary parameters for our filter:

Choose a sampling frequency (sampling freq.) EQUAL to the frequency parameter of the reference audio frequency.
Select 256 for the number of taps of the filter (desired #taps)
Select 7000 Hz as the maximum (to) frequency of the passband.
Select 7750 Hz as the minimum (from) frequency of the stopband.
Select the maximum frequency of the stopband to be equal to half the sampling frequency.
When finished hit the DESIGN FILTER button to generate the filter

In the top-left of you web page, select the Impulse Response page to see the 256 taps of the filter.

We will now get the generated source code of the filter

Open the skeletons of files SampleFilter.cpp and SampleFilter.h with your favorite editor.
Select the source code page (in the top-left corner of the current page
In the left pane select Number format as integer
Choose a Fixed Point precision of taps as 32
Do not click on download source
Copy and paste the codes of the main window in the respectives files, then save the files.

Keep the web page open, in order to be able to recreate the code in case of errors or for further steps.

3.4. Testing the generated filter

We have prepared main.cc, a test program for the filter. This test program is pure software and as nothing to do with HLS. As you will have to modify this program for HLS, we will take time to examine his contents:

Inclusion of the definitions for a wave library wavelib used to read or write .wav files
Inclusion of the definitions for the generated filter (SampleFIlter.h)
Code of the audio filter audio_filter using routines defined in the generated source files.
Code of the main file, with the following steps
- The modified audio sequence is read and stored in a C++ vector named content.
- The content array is divided into 2 arrays used to store the left and right channel of the sequence.
- The audio_filter is called foreach sample of the sequence to modify the content of the left and right channels.
- The left and right output channels are merged.
- The end of the code is dedicated to the creation of the output file.

Take some time to examine the filter part of the code, and the associated code in SampleFilter.h and SampleFilter.cpp

First of all, a dedicated struct type SampleFilter is used to store the temporary values needed by the filter.
Then the filtering itself uses
- SampleFilter_put to send a new sample to the filter
- SampleFilter_get to get the result for this new sample

For the program compilation, Execute the following command:

make test

For the filtered sequence generation, Execute the following command:

make gen_filtered_wave

To check the generated sequence, Execute the following command:

make play_filtered_wave

We have now a reference software that will be used for the hardware design.

4. STEP2: A Reference passthrough project

We will now adapt an existing DE1-SoC project for audio processing.

In your terminal console, change to DE1_SoC_filter directory
Launch quartus and open the DE1_SoC.qpf project using the File/Open project menu.
Check the Hierarchy button and select files in order to access to the list of files of the project
Double-click on the cpu_system.qsys file to launch Platform designer

The cpu_system architecture contains a basic NiosII processor system. We will now add specific IPs used to acccess to the outside audio codec.

4.1. The Audio IP

In the IP catalog, search for the Audio IP. This IP is in the Audio & Video folder of the University Program folder.

Add the audio IP to your system.
In the configuration window, change the Interface Settings to Streaming
select a Data Format of 24 bits.

The Block diagram shows the different interfaces of the block:

an external interface that will be connected to the audio codec.
Two Avalon Streaming sinks for left and right channel
Two Avalong Streaming sources for left and right channel

The Avalon Streaming interface is a simplified Avalon interface used to transfert continuous streams of data. This interface doesn’t need any address scheme, and use a simple handshake between the source and the sink.

Has we want to generate code using HLS synthesis we should be able to generate the necessary hardware without knowing the exact behavior of this interface.

However, in case of incorrect behavior, you may access to the interface definition in the following documentation. The relevent chapter is chapter 5.

We will now configure the audio IP in a passthrough mode.

In the System Contents window:

Connect the clk input of the audio IP to clk of clk_0
Connect the reset input of the audio IP to the clk_reset of clk_0
Connect the avalon_left_channel_source to the avalon_left_channel sink
Connect the avalon_right_channel_source to the avalon_right_channel sink
Export the external_interface conduit and name it audio

4.2. The Audio CLock

The external Audio codec needs a specific clock signal. This clock signal is generated by a specific IP.

In the IP catalog search for an Audio Clock for DE-series Boards in the Clock folder of the University Program

Add the clock IP to the system
Do not change any parameter
Connect the ref_clk and ref_reset signals to system clocks and reset
Export the audio_clk signal to the outside world and name it audio_clk
Connect the reset_source signal to the reset input of the audio IP.

4.3. The Audio configuration

The external Audio codec needs to be configured (the FPGA will send somme parameters to the codec chip using the I2C protocol)

Find, in the IP catalog an IP named Audio and Video Config (in the Audio and Video folder of the University Program folder)

Add the Audio and Video Config IP
Change the DE Board parameter to DE1-SoC
Check the Auto Initialize Devices button (we want a static initialization)
Modify the Audio In Path to Line In to ADC
Check that the Bit Length is 24 bits.
Check that the Sampling rate is the same as the sampling rate of the reference audio sequences.
Insert the component
Connect the clk input of the component to the system clock
Connect the reset input to the general reset signal as well as the reset_source of the audio_pll component.
Export the external interface conduit, and name it audio_cfg
Leave the avalon_av_config unconnected.

4.4. Quartus project update

Save the Platform Designer project (menu File/Save)
Generate the HDL code (menu Generate/Generate HDL )

We must now adapt the cpu_system instanciation in the top level design (at the end of the DE1_SoC.sv file).

Edit the DE1_Soc.sv SystemVerilog source.
Find, in the cpu_system folder the cpu_system_inst.v file and adapt the DE1_Soc.sv file according to this template:
- The real external signal names for audio are defined after the /// AUD /// comments in the interface definition of module DE1_SoC
- The real external signal names for audio_cfg are defined in after the /// FPGA /// comments in the interface definition of module DE1_SoC
- Recompile the Quartus project.

4.5. Test on the DE1-SoC board

For the programmation of the FPGA:

Change to the control-soft directory Execute the following command:

make fpga-config

Connect the headphones output of your workstation to the line in input of the DE1-SoC board using a cable
Connect the headphones to the line out of the DE1-SoC board.
Launch one of the sequences of chapter 3, it should work…
And now close Quartus tool.

5. STEP3: Adaptation of the reference source to HLS synthesis

Copy the source file main.cpp of software_demo directory to HLS directory and edit this new main.cpp file

5.1. Creation of the audio_filter component

We will use the Algorithmic C Datatypes to define the audio sample values. For this purpose we must include the following code in the header of the file. (just before the code)

#include "HLS/hls.h"
#ifdef __INTELFPGA_COMPILER__
#include "HLS/ac_int.h"
#else
#include "ref/ac_int.h"
#endif

We then define the datatype of the samples received and send by the audio IP. These samples are 24 bits signed datas. According to the documentation we add the following line:

typedef ac_int<24,true> audio_sample ;

We want now to define a component, having inputs and outputs of type audio_sample and following the Avalon Stream protocol. For this purpose, we use the templated types stream_in and stream_out (defined by Intel HLS). The stream_in type is used for sink streams and the stream_out type is used for source streams

add the following definitions to create the new types audio_stream_in and audio_stream_out

typedef ihc::stream_in<audio_sample, ihc::bitsPerSymbol<8>> audio_stream_in ;

typedef ihc::stream_out<audio_sample, ihc::bitsPerSymbol<8>> audio_stream_out ;

We are now ready to create the audio filter component.

Modify the declaration of the audio_filter function using the following definition.

component void audio_filter(
     audio_stream_in & audio_left_in,
     audio_stream_in & audio_right_in,
     audio_stream_out & audio_left_out,
     audio_stream_out & audio_right_out) {

In Intel HLS tools, a stream channel may accessed by classical write and read functions (see Intel HLS reference manual Tables 10 and 12 for details). We will use blocking calls.

In a first step we will try to stick to the code generated by the web site, but we must have in mind that the HLS tools have best results for specific coding styles…

keep the declaration of the static buffers for the left and right filters
and modify the internal code as follows:

// Get a sample from the left channel input
audio_sample left_sample = audio_left_in.read();
// Send de sample to the filter
SampleFilter_put(&left_filter,left_sample) ;
// Get the filtered sample from the filter
left_sample = SampleFilter_get(&left_filter);
// Send the filtered sample to the left channel output
audio_left_out.write(left_sample);

// Get a sample from the right channel input
audio_sample right_sample = audio_right_in.read();
// Send de sample to the filter
SampleFilter_put(&right_filter,right_sample) ;
// Get the filtered sample from the filter
right_sample = SampleFilter_get(&right_filter);
// Send the filtered sample to the right channel output
audio_right_out.write(right_sample);

5.2. Adaptation of the TestBench (main function)

From the Intel HLS tools point of view, the testbench should not directly call the filter as in the original program. The testbench is organized using a kind of fifo processing in order to simulate hardware parallelism. Three explicit phases are needed:

Enqueuing all the needed calls to the component.
Executing all the enqueud calls.
Getting all the results.

The needed functions for that purpose are ihc_hls_enqueue_noret function and ihc_hls_component_run_all function (See HLS Ref table 42.)

As the audio samples use streams, we need to declare internal streams in the main function. Add the following declarations for the 4 needed streams at the beginning of the main program:

audio_stream_in to_audio_left_in ;
audio_stream_in to_audio_right_in ;
audio_stream_out from_audio_left_out;
audio_stream_out from_audio_right_out;

We want to send all the audio samples to the defined streams. For that purpose, in the Split Left channel and Right channel phase, suppress the definitions of left_content and right_content and modify the loop as follows:

for (auto sample: content) {
    parity = !parity ;
    if(parity) {
        to_audio_left_in.write(sample*65536) ;
    } else {
        to_audio_right_in.write(sample*65536) ;
    }
}

Now replace the call to the audio_filter in the filtering loop by the enqueings. Modify the loop as follows:

for(std::size_t i = 0; i < content.size()/2; ++i) {
        ihc_hls_enqueue_noret(&audio_filter,
            to_audio_left_in,
            to_audio_right_in,
            from_audio_left_out,
            from_audio_right_out
        ) ;
    }

Just after this loop add a call to ihc_hls_component_run_all to launch the computations:

ihc_hls_component_run_all(&audio_filter) ;

Then modify the last loop used to construct the output waveform using the output streams from the component:

for(std::size_t i = 0; i < content.size()/2; ++i) {
    float left_sample, right_sample ;
    left_sample = from_audio_left_out.read() ;
    right_sample = from_audio_right_out.read() ;
    content_out.push_back( left_sample/65536.0 ) ;
    content_out.push_back( right_sample/65536.0 ) ;
}

5.3. Functionnal test of the component

Execute the following command to compile the testbench.

make test-gpp

Correct any remaning error and then Execute the testbench itself

make gpp_gen_filtered_wave

Play the filtered audio sequence in order to check if the several transformations of the code have not modified the results. We should have added some means to compare the generated files…

make play_gpp_filtered_wave

5.4. HLS Synthesis

We try a first synthesis of the component. Execute the following command:

make test-fpga

Before examination of the reports, we will launch a simulation on a short audio sequence (the post synthesis simulation may be rather long…). Execute the following command:

make fpga_gen_short_filtered_wave

We will now examine the results of the simulation. Execute the following command:

make show-waves

In the QuestaSim window, place all signals of the audio_filter instance on the Wave window.
Force the window to show the full simulation (hit the key F in the Wave window)
In the Wave window, select the signal audio_left_in_data
- change its radix to decimal
- change its format to Analog
Do the same thing for the audio_left_out_data signal

You have a view of the original and filtered signal. At the beginning of the simulation, all datas stored in the filter are 0, so it takes time to obtain a stable output waveform

Zoom between a valid cycle of the audio_left_in_ready and a valid cycle of the audio_left_out_valid signal. The component, compute the results during this time interval. You see that the audio_left_out_data is not stable during this interval. In fact, we see all the intermediate results of the computation

This time interval is the Initiation Interval of the component. Rather than counting the cycles in the simulation, we should be able to find it in the synthesis reports. Execute the following command:

make show-syn-results

In the browser, select Throughput Analysis/Verification statistics.
Examine the value of II

Taking into account the clock frequency of the main clock of the Quartus project (50 MHz) and the Sampling rate choosen for the audio configuration IP, you should see that the component doesn’t fullfill the constraints of the audio sequence.

5.4.1. Realistic constraints

In the Summary window of the report, find, the Compiled Estimated Frequency of your design. It should be far more than the 50MHz real clock frequency of the board. We may lower the constraints of the design using a target frequency equal to the real clock frequency of the board.

In the main.cpp file, add the following attrubute, just before the definition of the component:

hls_scheduler_target_fmax_mhz(50)

Recompile the design, relaunch the simulation of the short sequence and check the new results.

You should have a new value of the II compatible with the constraints of the audio sequence. If not, you may try the following enhancement, if yes, you may skip it.

5.4.2. How to have a lower Initiation Interval

FIR filters are based on arithmetic loops. Loop unrolling is a simple way to decrease the II.

The main loop in our filter is in the SampleFilter_get function defined in the SampleFilter.cpp file in the Software_demo directory.

The syntax for unrolling a loop is:

#pragma unroll N

Where N is an optional and positive integer value. If N is omitted, the loop is fully unrolled. Remember that unrolling loops duplicates hardware ressources. You should always use the minimum N necessary.

Choose a value for N
Add the pragma, in the SampleFilter_get function, just before the for loop.
Redo all the synthesis and simulation steps and check that the II is small enough.

6. STEP4: Integration of the audio-filter IP into the Quartus project.

Change to the DE1_SoC_filter directory and launch the following command to add you component to the Platform designer tool.

make add_filter_component

Launch Quartus , open your projet, and open the Platform designer tool in order to edit the cpu_system.

You should have an HLS directory containing audio_filter in the IP Catalog.

Add the IP audio_filter to your design.
Connect the clock and reset of the filter instance to the main clock and reset signals
Disconnect the Avalon passtrough connections of the audio IP
Connect the filter between the Avalon sink and sources of the audio IP
Export the call Conduit to outside world, and rename the port call
Leave the return Conduit un connected
Save the design and generate the HDL code.

Find, in the cpu_system folder the cpu_system_inst.v file and adapt the DE1_Soc.sv file according to this template. Two new signals should be added.

The valid signal should be connected to the constant 1’b1 (the filter is permanently enabled)
The stall signal should be left unconnected.
Recompile the Quartus project, and redo the tests as in 4.5 chapter

You should have now a running audio-filter on your board. Try to play the noisy audio sequence in order to check that the filter is running.

7. STEP5 : Driving the filter from the NiosII cpu

We will now try do drive the filter component from the NiosII processor. It’s only a for a demonstration purpose, so the action will be simple: we want to add a simple control register named bypass. The behavior should be the following:

If bypass equals true, the component should simply copy the input samples to the output.
If bypass equals false, the component should filter the samples.

7.1. Modification of the component

We will add a new boolean input named bypass to the component audio filter. We must also, ask the HLS tool, to transform this input in register that can be accesses using an Avalon-MM bus. For that purpose add the following line to the component I/O declarations:

    hls_avalon_slave_register_argument bool bypass,

Modify the code of the component , in order to generate an output defined by the value of the bypass variable.

We should also, modify the testbench part (the main program). In the loop used to enqueue samples to the filter, just add the new input. If you choose to select the bypass mode, the code should look like:

   ihc_hls_enqueue_noret(&audio_filter,
            true,
            to_audio_left_in,
            to_audio_right_in,
            from_audio_left_out,
            from_audio_right_out
        ) ;

Test this new version of the code (don’t forget to connect the headphones directly to your workstation). You may test the program using two different values of the bypass variable. The final wave should be filtered… or not.

make test-gpp
make gpp_gen_filtered_wave
make play_gpp_filtered_wave

Synthesize the component, generate the simulation executable, simulate the component and examine the waveforms:

make test-fpga
make fpga_gen_short_filtered_wave
make show-waves

You should see that the testbench as generated some Avalon-MM signals.
If you examine the begining of the simulation (around time 20ns) , you should see that a value is written to the component (signal avs_cra_write_data).
It should be the bypass value transmitted to the component.
You may verify that by runing a simulation using a different value of bypass.

7.2. Update of the hardware system

In the DE1_Soc_filter directory, update the component list:

make add_filter_component

Then edit the cpu-system with platform-designer. The filter component has a new connection avs_cra of type Avalon Memory Mapped Slave

In the column related to the Base addresses of Avalon agents, lock all allready defined addresses.
Then connect avs_cra of the filter component to the data_master bus of the NIOS processor
Then regenerate the addresses using the menu *System/Assign Base Addresses
Remember the address obtained for you filter.
Save the cpu-system and regenerate HDL code.
In Quartus recompile the design.

7.3. Test of the new hardware.

Reload the FPGA with the new design
Modify the test program (in the control-soft) directory, in order to check the bypass register effect.
Try to see if it works…

8. STEP6: Filter Optimization :

A ONE PAGE MAX REPORT REQUESTED FOR THIS STEP*: Report due to Sunday December 12.

This step is a free work on filter optimization. Your main objective should be to minimize the hardware resources needed by the filter.

The main questions are

Does HLS generate an optimal number of multipliers, taking into account the constraints ?
Does HLS generate an optimal number of Memory blocks, taking into account the constraints ?
HLS should share coefficient memory between the left and right channel, is it the case ?
From a "theorical" point of view, what are the ressources needed to generate two 256 taps filters running à 50 Mhz and generating a new sample every X clock cycles ?
Does the coding style of the original filter create bottlenecks for the scheduling ?
Examine carefully the different reports, try to understand them…
Read the documentations:
- reference manual
- best practices

The samples sources described in the "Best Practices" manual are placed in the following directory /comelec/softs/opt/altera/current/hls/examples/tutorials/best_practices

And try some experiments, for example
- Minimisation of memories : Fusion of the two left and right loops (rewriting the filter functions with common coefficient access.
- Readability of the code: Rewrite the filter routines using ac_fixed type rather than using explicit shifts of integer data
- Ad-hoc floating point computation: try to use hls_float data types, for example the bfloat19 (19 bits float representation) data type.
- …