NeuFlow is a new concept in computer architecture that is particularly
well-suited for tasks in which the same set of operations is applied
to a large number of data items, particularly to a stream of data.
Our instatiation of NeuFlow is geared towards the kind of operations
that occur in computer vision and image procesing systems.
In particular, our implementations of the NeuFlow concept can
run Convolutional Network algorithms at very high speed.
A NeuFlow architecture can be seen as a grid of processing elements
(tiles) that are connected to their neighbors through FIFOs and can be
configured to perform a number of operations (e.g. multiplication,
addition, scalar function through an interpolated lookup table, etc).
Once the grid is configured, a stream of data can be pumped through it
a maximum speed, without any instruction control. It is reminiscent of
some dataflow architecture concepts from the 70's and 80's.
For example, a NeuFlow grid can be configured to perform a 2D
convolution followed by a non-linear function, and a pooling
subsampling, as is required for feature extraction methods such as
SIFT, HOG, HMAX, and Convolutional Networks.
Video
What Is NeuFlow?
NeuFlow is a kind of dataflow architecture composed of a grid of
processing tiles that connected to their nearest neighbors, as well as
to a global data bus. Each tile can be programmed to perform a
particular operation with one or two operands, such as a
multiplication, addition, division, non-linear scalar operations
(linearly interpolated from a set of control points), etc. Tiles are
connected to their neighbors through programmable FIFOs, Once the grid
of tiles is set up, streams of data can be pumped through the grid at
maximum speed. This eliminates the need for such things as instruction
caching and decoding, branch prediction, etc.
The design is written in a parameterized Verilog program that can be
instanciated/targetted for a variety of platforms, including Xilinx's
Virtex-4 and Virtex-6.
A full-custom ASIC design derived from the FPGA version is currently
being finalizaed. The projected performance data given in the table
below, and the high-level diagram is shown here for a 65nm IBM CMOS
technology.
Publications
145.
Clément Farabet, Berin Martini, Polina Akselrod, Selçuk Talay, Yann LeCun and Eugenio Culurciello: Hardware Accelerated Convolutional Neural Networks for Synthetic Vision Systems, Proc. International Symposium on Circuits and Systems (ISCAS'10), IEEE, 2010, \cite{farabet-iscas-10}.
Yann LeCun, Koray Kavukvuoglu and Clément Farabet: Convolutional Networks and Applications in Vision, Proc. International Symposium on Circuits and Systems (ISCAS'10), IEEE, 2010, \cite{lecun-iscas-10}.
Clément Farabet, Cyril poulet, Jefferson Y. Han and Yann LeCun: CNP: An FPGA-based Processor for Convolutional Networks, International Conference on Field Programmable Logic and Applications, IEEE, Prague, September 2009, \cite{farabet-fpl-09}.