Education
Publications and Reports
 Adaptive
Inhomogeneous PDE Solvers for Complex Geometries
H. Langston, L. Greengard, and D. Zorin
In Preparation, 2012
 
A FreeSpace Adaptive FMMBased PDE Solver in
Three Dimensions
H. Langston, L. Greengard, and D. Zorin
Communications in Applied Mathematics and
Computational Science, vol. 6, no. 1, 2011, pp. 79–122
We present a kernelindependent, adaptive fast
multipole method (FMM) of arbi trary order
accuracy for solving elliptic PDEs in three
dimensions with radiation and periodic boundary
conditions. The algorithm requires only the
ability to evaluate the Green's function for the
governing equation and a representation of the
source distribution (the righthand side) that
can be evaluated at arbitrary points. The
performance is accelerated in three ways. First,
we construct a piecewise polynomial
approximation of the righthand side and compute
farfield expansions in the FMM from the
coefficients of this approximation. Second, we
precompute tables of quadratures to handle the
nearfield interactions on adaptive octree data
structures, keeping the total storage
requirements in check through the exploitation
of symmetries. Third, we employ sharedmemory
parallelization methods and loadbalancing
techniques to accelerate the major algorithmic
loops of the FMM. We present numerical examples
for the Laplace, modified Helmholtz and Stokes equations.
  A
massively parallel adaptive fastmultipole method on
heterogeneous architectures
I. Lashuk, A. Chandramowlishwaran, H. Langston,
T.A. Nguyen, R. S. Sampath, A. Shringarpure,
R. W. Vuduc, L. Ying, D. Zorin, and G. Biros
Proceedings of SC09 ACM/ICEE SCXY Conference Series,
pp. 111, 2009, Best Paper Finalist
We present new scalable algorithms and a new
implementation of our kernelindependent fast
multipole method (Ying et al. ACM/IEEE SC '03),
in which we employ both distributed memory
parallelism (via MPI) and shared
memory/streaming parallelism (via GPU
acceleration) to rapidly evaluate twobody
nonoscillatory potentials. On traditional
CPUonly systems, our implementation scales well
up to 30 billion unknowns on 65K cores
(AMD/CRAYbased Kraken system at NSF/NICS) for
highly nonuniform point distributions. We
achieve scalability to such extreme core counts
by adopting a new approach to scalable MPIbased
tree construction and partitioning, and a new
reduction algorithm for the evaluation
phase. Taken together, these components show
promise for ultrascalable FMM in the petascale
era and beyond.
  Cash:
Distributed Cooperative Buffer Caching
C. Decoro, H. Langston, J. Weinberger
Technical Report, 2004
Modern servers pay a heavy price in block access
time on diskbound workloads when the working set
is greater than the size of the local buffer
cache. We provide a mechanism for cooperating
servers to coordinate and share their local
buffer caches. The coordinated buffer cache can
handle working sets on the order of the
aggregate cache memory, greatly imptoving
performance on diskbound workloads. This
facility is provided with minimal communication
overhead, no penalty for local cache hits, and
without any explicit kernel support.
  A
New Parallel KernelIndependent Fast Multipole
Method
L. Ying, G. Biros, H. Langston, and
D. Zorin
Proceedings of SC03 ACM/ICEE SCXY Conference Series,
pp. 111, 2003, Best Student Paper, Gordon Bell
prize finalist
We present a new adaptive
fast multipole algorithm and its parallel
implementation. The algorithm is
kernelindependent in the sense that the
evaluation of pairwise interactions does not rely
on any analytic expansions, but only utilizes
kernel evaluations. The new method provides the
enabling technology for many important problems in
computational science and engineering. Examples
include viscous flows, fracture mechanics and
screened Coulombic interactions. Our MPIbased
parallel implementation logically separates the
computation and communication phases to avoid
synchronization in the upward and downward
computation passes, and thus allows us to fully
exploit computation and communication
overlapping. We measure isogranular and fixedsize
scalability for a variety of kernels on the
Pittsburgh Supercomputing Center's TCS1
Alphaserver on up to 3000 processors. We have
solved viscous flow problems with up to 2.1
billion unknowns and we have achieved 1.6 Tflops/s
peak performance and 1.13 Tflops/s sustained performance.

Research Projects and Codes
3D KernelIndependent FMMBased FreeSpace and Periodic Volume Solver
This volume code will be posted soon!

Brain Tumor and Heart Image Registration Experiments
Images, movies and results of work with Rahul
Sampath and George Biros. Being processed and will
be posted soon.

Kernel Independent 3D Fast Multipole Method
This is the original KIFMM3d code
of Lexing
Ying, for
which documentation is provided here.

Fluid
Dynamics Visualization
Scientific visualization performed with George
Biros and Lexing Ying,
using their embedded integral solver for the Stokes
equations. Full videos and pictures for some
results as appeared at SC03.

Line Integral
Convolution (LIC) code
Working with Denis Zorin, George Biros and Lexing Ying to
visualize 2d fluid dynamics simulations, specifically Navier Stokes
problems in both steady and unsteadystates.
Resulting images include 2d unsteady cavity problems
for different Reynold's numbers. Code is attached
as well as some movies formed from static images.

Vortex Methods
Code and images/movies for simulating fastmoving
fluids areound abjects in two and three
dimensions. Code currently being processed for
distribution and will be posted soon.

Fast Poisson Solver in 2D
Fast Fourier Transform based Poisson equation
solver in a regular two dimensional domain with
inhomogeneous Dirichlet boundary conditions. All
code in Matlab.

HotShell
A customizable Unix shell, designed to sit on top
of an existing shell with augmented commands in
Perl and Korn shell scripting languages.

FileDerived Motion Machine
A motion machine for animating an articulated
figure, in this case a human figure, derived with
simple cubes. Examples for specific textfile
inputs and JAVA code attached.

MotionCapture Experiments
Simple motion capture and image processing
experiments and projects.

