Embedded AI Student Projects

Jan 6th, 2020 11:51 pm

Student projects on Embedded AI in the past year:

Yolo on a Chip. Photo

Differential Scanning Techniques for Detection of Security Vulnerabilities

Jun 1st, 2018 11:51 pm

A technique to discern security bugs in a SoC well in advance of tapeout.

Pros: Simulation based technique very similar to assertion based verification(ABV).

Cons: Not Static. (i.e like lint) need to have a simulation up and running. (which is not so easy for big SoCs)

Here is our invited talk on the same topic at DAC 2018:

Through the Coherency Port: The Adventures of Alice

May 3rd, 2018 11:51 pm

This time Alice (remember Alice and Bob from the cryptography lesson) falls down the coherency port, and imagines all the mischievious things she can do in the kingdom of Linux Kernel. Well Alice is actually an imaginary SoCFPGA trojan.

More details in this talk here at DAC 2018: A Security Vulnerability Analysis of SoCFPGA Architectures

Accelerator Design With OpenCL

Mar 19th, 2018 2:29 pm

The Objective of this ATHENS one week course is to introduce the students to the concepts of programming with OpenCL. Recently there is a trend in Computer Architecture towards heterogeneous systems (HSA) where accelerators like FPGAs, GPUs are integrated on the same die as Chip Multi-Processors. Compute intensive tasks are then offloaded to these accelerators. OpenCL (Open Computing Language) is an industry standard language for parallel programming which is adopted by industry leaders such as Intel, Xilinx, ARM for programming accelerators (i.e Intel FPGAs, ARM Mali GPUs). After following this course a student should be able to :

Write basic OpenCL programs (both host program and kernel) for FPGAs.
Write basic OpenCL programs for programming GPUs.
Be familiar with notions of optimization for performance.

Program:

Day 1 : Introduction to OpenCL API, and Host Program.

Day 2. Practical work with ARM MALI OpenCL SDK.

Day 3: Hands On experience: Programming GPUs with ODROID XU4 Boards.

Day 4: Practical work with Xilinx SDSoC.

Day 5: Hands On Experience: Programming FPGAs with Pynq-Z1 boards.

Prerequisites

Computer Architecture, VLSI, C/C++

Course exam

The students will be marked based on

Practical Work
Quiz at the end of the course.

Course Material

Traffic Sign Recognition With Convolutional Neural Networks on a SoCFPGA

Sep 6th, 2017 11:51 pm

My student Amnay got the first prize in “Machine Learning” category in the EU Innovate design contest, 2017. Here is the video he prepared for the contest.

And here is Amnay himself (2nd from left). Photo

Multi-Valued Routing in FPGAs

Sep 24th, 2016 6:31 pm

As the performance of a processor based system depends largely on the available memory bandwidth, the performance of a gate array (FPGA) is intertwined with its interconnect speed and density. 70% of the FPGA area is thus interconnect switches and buffers. This is not a surprise, because memory and interconnect are nothing but two sides of the same coin. While one bit of memory transfers one bit of information from point A in time to point B in time, a 1 bit wire transfers 1 bit of information from point A to point B in space.

Flash memory chips already use multivalued (MLC) flash transistors to increase density. In this research we try increasing interconnect density by using multi-valued routing wires, i.e 4 voltage levels to encode 2 bits of information. Things get much more technical from here as handling 4-valued signals with binary switches is not evident. Check out this article to see how we try to implement this with FDSOI transistors (not possible with ordinary CMOS).

Tic-Tac-Toe Quantique CPU vs. FPGA

Jun 27th, 2016 6:51 pm

Obviously FPGA won the game. ;-)

GAGA: A SoCFPGA Cluster for Fun and Profit

May 8th, 2016 6:34 pm

With some help from our students (Zhengyu Xu) and Karim (Karim Ben Kalia) we have put up a Heterogeneous Computing Cluster, called GAGA (GPU And Gate Array) cluster. Yeah, sorry for the name. While this is not the first cluster of this type, GAGA is a bit different. It is an embedded super-computing cluster, i.e the whole software stack is rebuilt given the application, and it runs only one application at a time. In the spirit of embedded computing, everything (Linux kernel, MPI, OpenCL, FPGA hardware) is tuned to a technical context.

Hardware

DE1-SoC Boards from Terasic, with cyclone V SoCFPGA from Altera (ARM Cortex A9 (Dual core) Hard Processor System 925 MHz, 85K LUT FPGA, 1GB DDR3 SDRAM.
ODROID-XU board from Hardkernel [3], with Exynos Octa SoC (ARM Cortex A15/A7 (quad core) and PowerVR/Mali embedded GPUs, 2GB LPDDR3).
HP Procurve 2530G PoE+ switch. (24 1Gb ports, 4 SFP ports, Switching Capacity 56 Gbps, PoE power capability 195 Watts,

Software

Only the libraries necessary for a given technical context goes on-board. The whole executable software can be contained within 10MBytes, which allows fast booting, wake-up and more resources dedicated to applications.

Software Stack	Build System
Linux Kernel	Kernel Config
InitRamFS	Buildroots
MPICH	GCC X-Compile
FPGA OpenCL Runtime	Altera OpenCL SDK
GPU OpenCL Runtime	PowerVR GPU Compute
Custom Libraries e.g BLAS	GCC X-Compile