Student projects on Embedded AI in the past year:
Differential Scanning Techniques for Detection of Security Vulnerabilities
A technique to discern security bugs in a SoC well in advance of tapeout.
Pros: Simulation based technique very similar to assertion based verification(ABV).
Cons: Not Static. (i.e like lint) need to have a simulation up and running. (which is not so easy for big SoCs)
Here is our invited talk on the same topic at DAC 2018:
Through the Coherency Port: The Adventures of Alice
This time Alice (remember Alice and Bob from the cryptography lesson) falls down the coherency port, and imagines all the mischievious things she can do in the kingdom of Linux Kernel. Well Alice is actually an imaginary SoCFPGA trojan.
More details in this talk here at DAC 2018: A Security Vulnerability Analysis of SoCFPGA Architectures
Accelerator Design With OpenCL
The Objective of this ATHENS one week course is to introduce the students to the concepts of programming with OpenCL. Recently there is a trend in Computer Architecture towards heterogeneous systems (HSA) where accelerators like FPGAs, GPUs are integrated on the same die as Chip Multi-Processors. Compute intensive tasks are then offloaded to these accelerators. OpenCL (Open Computing Language) is an industry standard language for parallel programming which is adopted by industry leaders such as Intel, Xilinx, ARM for programming accelerators (i.e Intel FPGAs, ARM Mali GPUs). After following this course a student should be able to :
Write basic OpenCL programs (both host program and kernel) for FPGAs.
Write basic OpenCL programs for programming GPUs.
Be familiar with notions of optimization for performance.
Program:
Day 1 : Introduction to OpenCL API, and Host Program.
Day 2. Practical work with ARM MALI OpenCL SDK.
Day 3: Hands On experience: Programming GPUs with ODROID XU4 Boards.
Day 4: Practical work with Xilinx SDSoC.
Day 5: Hands On Experience: Programming FPGAs with Pynq-Z1 boards.
Prerequisites
Computer Architecture, VLSI, C/C++
Course exam
The students will be marked based on
Practical Work
Quiz at the end of the course.
Course Material
Traffic Sign Recognition With Convolutional Neural Networks on a SoCFPGA
My student Amnay got the first prize in “Machine Learning” category in the EU Innovate design contest, 2017. Here is the video he prepared for the contest.
And here is Amnay himself (2nd from left). Photo
Multi-Valued Routing in FPGAs
As the performance of a processor based system depends largely on the available memory bandwidth, the performance of a gate array (FPGA) is intertwined with its interconnect speed and density. 70% of the FPGA area is thus interconnect switches and buffers. This is not a surprise, because memory and interconnect are nothing but two sides of the same coin. While one bit of memory transfers one bit of information from point A in time to point B in time, a 1 bit wire transfers 1 bit of information from point A to point B in space.
Flash memory chips already use multivalued (MLC) flash transistors to increase density. In this research we try increasing interconnect density by using multi-valued routing wires, i.e 4 voltage levels to encode 2 bits of information. Things get much more technical from here as handling 4-valued signals with binary switches is not evident. Check out this article to see how we try to implement this with FDSOI transistors (not possible with ordinary CMOS).
Tic-Tac-Toe Quantique CPU vs. FPGA
Obviously FPGA won the game. ;-)
GAGA: A SoCFPGA Cluster for Fun and Profit
With some help from our students (Zhengyu Xu) and Karim (Karim Ben Kalia) we have put up a Heterogeneous Computing Cluster, called GAGA (GPU And Gate Array) cluster. Yeah, sorry for the name. While this is not the first cluster of this type, GAGA is a bit different. It is an embedded super-computing cluster, i.e the whole software stack is rebuilt given the application, and it runs only one application at a time. In the spirit of embedded computing, everything (Linux kernel, MPI, OpenCL, FPGA hardware) is tuned to a technical context.
Hardware
DE1-SoC Boards from Terasic, with cyclone V SoCFPGA from Altera (ARM Cortex A9 (Dual core) Hard Processor System 925 MHz, 85K LUT FPGA, 1GB DDR3 SDRAM.
ODROID-XU board from Hardkernel [3], with Exynos Octa SoC (ARM Cortex A15/A7 (quad core) and PowerVR/Mali embedded GPUs, 2GB LPDDR3).
HP Procurve 2530G PoE+ switch. (24 1Gb ports, 4 SFP ports, Switching Capacity 56 Gbps, PoE power capability 195 Watts,
Software
Only the libraries necessary for a given technical context goes on-board. The whole executable software can be contained within 10MBytes, which allows fast booting, wake-up and more resources dedicated to applications.
Software Stack | Build System |
---|---|
Linux Kernel | Kernel Config |
InitRamFS | Buildroots |
MPICH | GCC X-Compile |
FPGA OpenCL Runtime | Altera OpenCL SDK |
GPU OpenCL Runtime | PowerVR GPU Compute |
Custom Libraries e.g BLAS | GCC X-Compile |
Calculating a Billion Digits of PI on a FPGA Cluster
Modelling of Digital Circuits With Bayesian Networks
More details in this talk here at ICFPT 2011: Timing Speculation in FPGAs: Probabilistic Inference of Data Dependent Failure Rates