Ubiquitous AI (FPGA and Beyond)
Artificial intelligence (AI) is the next big wave in computing, and transforming our society on a spectacular scale. Computing systems that sense, reason, and act can accelerate solutions to the world’s grand challenges in healthcare, finance, security, energy, science and other areas. However, today’s computers are not efficiently designed for artificial intelligence. In our research, we are interested in developing collaborative hardware/software solutions for AI. In particular, we focus on xPU — a class of new computer architectures tailored for AI applications (speech, image, video, vision, etc.). The key differentiator of our research is that rather than focused on developing specific application-oriented solutions, we are more interested in fundamental computer architecture innovations which can transform a wide array of computing systems ranging from edge computing devices (IoT, wearable, mobile, and other embedded systems) to warehouse scale computers (data centers and Cloud).
AI Cloud Computing
Our goal is to develop custom cloud solutions for efficiently deploying AI in the cloud, including computer vision, natural language processing and speech recognition. In particular, we are developing a high-performance and energy efficient hardware deep-neural network (DNN) engine running on customizable chips known as field-programmable gate arrays (FPGAs). In our previous project, we demonstrated a high-performance OpenCL-based FPGA accelerator on an Altera Arria 10 GX1150 board and achieved 866 Gop/s floating point performance at 370MHz working frequency and 1.79 Top/s 16-bit fixed-point performance at 385MHz. Our work achieves the highest performance and energy efficiency world-wide for convolutional neural network (CNN) compared to state-of-the art OpenCL FPGA CNN implementations.
AI Edge Computing
“Larger is better” is the ruling maxim in deep learning (DL) world. The deep layered structure and large model sizes achieve incredible accuracy for supporting the extraction of more complex and high-level features in a wide spectrum of applications, ranging from computer vision to machine translation and natural language processing in a cloud computing environment. However, they also pose severe challenges on the efficient deployment of DNN models on resource-constrained edge computing devices, which have limited processing power and data storage capacity. In this research, our goal is to develop optimization techniques to compress or transform large deep neural network models for edge computing. Towards this goal, we will take a holistic and methodical approach that spans theory, algorithm and architecture to tackle this problem. Such approach will in turn benefit cloud computing as well.
Big Data Systems
Big data comes in diverse types and sizes. Quite often the data we need to process is connected in nature. For example, in a social media application (e.g., Facebook), we have entities like Users, Status, Comments, Likes, etc. that need to be managed and processed as a single logical unit of data. Typically, this type of data is organized as graph, which requires different approaches (including both software and hardware) to run analytics on, compared to traditional data processing. To this end, we leverage the exceptional random access performance of emerging Hybrid Memory Cube (HMC) technology that stacks multiple DRAM dies on top of a logic layer, combined with the flexibility and efficiency of FPGA to address these challenges. We achieved 166 million edges traversed per second (MTEPS) using GRAPH500 benchmark on a random graph with a scale of 25 and an edge factor of 16, which significantly outperforms CPU and other FPGA-based large graph processors.
We are developing intelligent systems (including both hardware and software) to accelerate science discovery and computer architecture research. In particular, our system comprised of an automated in-house built IC testing, characterization and emulation system based on equipment donated by many companies, capable of performing material-level, device-level, circuit-level, and system-level testing with more flexibility (FPGA-controlled) and accuracy (fully customized) than many state-of-art industry testing facilities (memory or SoC testers).
Advanced Computing Concepts: Non Von Neumann Architectures Enabled by Emerging Technology
Reconfigurable Architecture and Programming Models
In this project, we aim to develop a reconfigurable memory-oriented computing fabric, namely Liquid Silicon (or L-Si) by leveraging the monolithic 3D stacking capability of emerging nonvolatile memory technologies (RRAM, PCM, STT RAM). L-Si addresses several key fundamental limitations of state-of-the-art reconfigurable architectures including FPGA, CGRA, etc. in supporting data-intensive applications (e.g., machine learning and neural networks) through a series of innovations. It, for the first time, fully extends the configuration capabilities of existing reconfigurable architectures (FPGA, CGRA), providing users more flexibility in customizing hardware for specific applications, with higher performance and energy efficiency.
We are developing an in-memory processing accelerator inspired by the concept of ternary content addressable memory (TCAM) and enabled by emerging memory technology i.e., PCM and RRAM. In particular, we designed and fabricated a fully-functional heterogeneous chip for the first time, providing >10x cell area reduction compared to homogenous CMOS-based design at the same technology node, setting the record of the highest density to date. The fabricated chip can reliably operate at very low voltage (750mV). It becomes an attractive solution for many data-intensive applications e.g., genome matching in bioinformatics and network intrusion detection, etc.
Other Areas of Interest
- Cyber Security: Real-time Intrusion Detection
- Near/In-Memory Processing
- Intelligent Storage
- Storage Class Memory