CBLL HOME
VLG Group
News/Events
Seminars
People
Research
Publications
Talks
Demos
Datasets
Software
Courses
Links
Group Meetings
Join CBLL
Y. LeCun's website
CS at Courant
Courant Institute
NYU
Lush
Lush

NeuFlow: Embedded Hardware for Real-Time Vision


NeuFlow is a new concept in computer architecture that is particularly well-suited for tasks in which the same set of operations is applied to a large number of data items, particularly to a stream of data. Our instatiation of NeuFlow is geared towards the kind of operations that occur in computer vision and image procesing systems. In particular, our implementations of the NeuFlow concept can run Convolutional Network algorithms at very high speed.

A NeuFlow architecture can be seen as a grid of processing elements (tiles) that are connected to their neighbors through FIFOs and can be configured to perform a number of operations (e.g. multiplication, addition, scalar function through an interpolated lookup table, etc). Once the grid is configured, a stream of data can be pumped through it a maximum speed, without any instruction control. It is reminiscent of some dataflow architecture concepts from the 70's and 80's.

For example, a NeuFlow grid can be configured to perform a 2D convolution followed by a non-linear function, and a pooling subsampling, as is required for feature extraction methods such as SIFT, HOG, HMAX, and Convolutional Networks.

Video

What Is NeuFlow?

NeuFlow is a kind of dataflow architecture composed of a grid of processing tiles that connected to their nearest neighbors, as well as to a global data bus. Each tile can be programmed to perform a particular operation with one or two operands, such as a multiplication, addition, division, non-linear scalar operations (linearly interpolated from a set of control points), etc. Tiles are connected to their neighbors through programmable FIFOs, Once the grid of tiles is set up, streams of data can be pumped through the grid at maximum speed. This eliminates the need for such things as instruction caching and decoding, branch prediction, etc.

The design is written in a parameterized Verilog program that can be instanciated/targetted for a variety of platforms, including Xilinx's Virtex-4 and Virtex-6.

A full-custom ASIC design derived from the FPGA version is currently being finalizaed. The projected performance data given in the table below, and the high-level diagram is shown here for a 65nm IBM CMOS technology.

Publications

145. Clément Farabet, Berin Martini, Polina Akselrod, Selçuk Talay, Yann LeCun and Eugenio Culurciello: Hardware Accelerated Convolutional Neural Networks for Synthetic Vision Systems, Proc. International Symposium on Circuits and Systems (ISCAS'10), IEEE, 2010, \cite{farabet-iscas-10}. 153KBDjVu
325KBPDF
487KBPS.GZ

144. Yann LeCun, Koray Kavukvuoglu and Clément Farabet: Convolutional Networks and Applications in Vision, Proc. International Symposium on Circuits and Systems (ISCAS'10), IEEE, 2010, \cite{lecun-iscas-10}. 189KBDjVu
266KBPDF
339KBPS.GZ

143. Clément Farabet, Cyril Poulet and Yann LeCun: An FPGA-Based Stream Processor for Embedded Real-Time Vision with Convolutional Networks, Fifth IEEE Workshop on Embedded Computer Vision (ECV'09), IEEE, Kyoto, October 2009, \cite{farabet-ecv-09}. 326KBDjVu
2095KBPDF
5475KBPS.GZ

138. Clément Farabet, Cyril poulet, Jefferson Y. Han and Yann LeCun: CNP: An FPGA-based Processor for Convolutional Networks, International Conference on Field Programmable Logic and Applications, IEEE, Prague, September 2009, \cite{farabet-fpl-09}. 240KBDjVu
672KBPDF
2456KBPS.GZ

Press and Links

NuFlow in the press

Following a press release in Spetember 2010 from our friends at Yale:

Links

.