# Alan Siegel

You are welcome to look through some selected work in computer science, mathematics and education. The current topics include technical articles plus two closely related publications about mathematics education and the TIMSS Videotape of Classroom Study of teaching in Japan, Germany, and the U.S. The technical publications include work in the areas of plane geometry, probability (medians and Heoffding bounds), the mathematical analysis of closed hashing, and the theory of fast hash functions.

Education

1. Understanding and misunderstanding the Third International Mathematics and Science Study: what is at stake and why K-12 education studies matter, Proceedings of the International Congress of Mathematicians (ICM2006), Volume III: Invited Lectures, M. Sanz-Sole, J. Soria, L.L. Varona, and J. Verdera, Ed., pp. 1599--1630 (2006).
2. Telling Lessons from the TIMSS Videotape. (The paper is here.) This analysis of the TIMSS video tape and the ensuing studies is written to let you decide for yourself what made the teaching so special.  But don't take my word for it; read the paper and do your own thinking.  If you still have doubts, go review the tapes and check out the references.  After all, that's what I did. This paper appeared as Chapter 5 in Testing Student Learning, Evaluating Teaching Effectiveness, Williamson Evers and Herbert Walberg, Ed. pp. 161--193 (2004).

Please note: there are a few links to restore below, and I have to make an alternative page for those of you who do not have an ISO-8859-1 compliant character set loaded into your browsers.  (I also used a few overlines, which might not be in the standard.)

Here is what you should see.  Well, the top row should be Greek characters and a few math characters.

 μ τ λ κ π p S Σ Σ √13 ∫ mu tau lambda kappa pi pi Sigma Sigma Sigma sqrt(13) integral

I'll eventually post an HTML version of the TIMSS paper above. But for right now, you are on your own.    My apologies. A.S.

Area in the plane

We resolve a number of area optimization questions about polygons and related figures.  Some of the problems have a history that goes back 10, 30 and even 50 years.  The first proof is not new. The paper is intended to give an entertaining and intuitive proof of the isoperimetric inequality in the plane. The idea originates with Lawler.

Simple polygons (i.e., polygons that do not intersect themselves).

Self-intersecting polygons.

Arrangements of segments and the area they encompass.

Open problems.

Performance analyses for closed hashing
in a model that supports real computation

For closed hashing,  performance estimates are normally based on simplified analyses of idealized programs that cannot be implemented.  We show that the same results can be achieved for programmable hash functions that run in constant time---provided certain caching resources are available.

The performance of closed hashing with limited randomness.

The theory of fast hash functions.

Probability

Medians of discrete random variables

1. While tail estimates have received significant attention in the probability, statistics, discrete mathematics, and computer science literature, the same cannot be said for medians of probability distributions, and for good reason.

• First, there are only a few published results in this area, and even they are not at all well known (but should be).
• Second, there seems to be very little mathematical machinery for determining the medians of probability distributions.
• Third (and consequently), median estimates have not been commonly used in the analysis of algorithms, apart from the kinds of analyses frequently used for Quicksort, and the provably good median approximation schemes typified by efficient selection algorithms.
This paper addresses these issues in the following ways.
• First, a beginning framework is presented for establishing median estimates.
It is strong enough to prove, as simple corollaries, the two or three non-trivial median bounds (not so readily identified) in the literature.
• Second, several new median results are presented, which are all, apart from one, derived via this framework.
• Third, applications are offered: median estimates are shown to simplify the analysis of a variety of probabilistic algorithms and processes.
Chernoff-Hoefding bounds (new material plus handbook-like coverage)
1. Let X be a sum of real valued random variables and have a bounded mean E[X]. The generic Chernoff-Hoeffding estimate for large deviations of X is: Prob{X-E[X]>a} < min{l>0}e(-l(a+E[X]))E[el X], which applies with a > 0 to random variables with very small tails. At issue is how to use this method to attain sharp and useful estimates. We present a number of Chernoff-Hoeffding bounds for sums of random variables that can have a variety of dependent relationships and that might be heterogeneously distributed.

2. Chernoff-Hoeffding Bounds for Applications with Limited Independence (The paper is here.) with J.P. Schmidt and A. Srinivassan.

It is all in the title: your favorite tail estimates for sums of random variables that only exhibit limited independence, such as those produced by implementations of hash functions for computers.

Detailed summaries

Area in the plane
We resolve some area optimization questions about polygons and related figures.  The first paper presents a known proof of a classical isoperimetric inequality for the circle. The inequality states that among all regions in the plane with a given perimeter, the disk has the greatest area. There is a related result for polygons, which states that among all polygonal regions with a given set of side lengths, those whose vertices lie on a circle have the greatest area. As the ancient Greek geometers knew very well, this fact is readily derived from the isoperimetric inequality for the circle. But as has been noted by the Russian geometer Yaglom, a direct Euclidean proof of this basic fact about polygons has been missing. The second paper below presents a Euclidean construction that establishes this inequality directly. Let P and Q be n-sided polygonal regions with the same side lengths, and suppose that the vertices of Q lie on a circle. The algorithm partitions Q into 2n pieces that are then rearranged to cover all of P (with possible overlaps and other excesses). It follows that the area of Q is at least as large as that of P.

Simple polygons (i.e., polygons that do not intersect themselves).

1. We offer a completely elementary proof of the following basic fact:

Among all regions in the plane with a given perimeter p, the circle has the greatest area.
Direct and to the point: no compactness, no calculus, no errors.
A brief history of the problem is included, along with a discussion about Steiner and why his approach requires a compactness argument.
2. We give a new and substantial generalization of A .  The naive global analysis of A is extended to derive the following covering property:  Imagine walking along the consecutive edges of a simple but not necessarily convex polygon P and drawing a ray from each vertex. Suppose you circulate about P in a clockwise direction, and pick a new direction for each ray that represents a clockwise rotation of the preceding ray. Suppose the sum of all rotational increments adds up to 360o.
Whenever the rays from two consecutive vertices intersect, let them induce the triangular region defined by the two vertices and the point of intersection.

We show there is a fixed a such that if each ray is rotated by a, then the triangular regions induced by the redirected rays cover the interior of P.
This covering implies the standard isoperimetric inequalities in two dimensions, as well several new inequalities.
The proof is technically elementary, since it does not even use calculus.  Unfortunately, a correct proof is non-trivial, and the applications use a little more math.
Self-intersecting polygons.
1. The following has been the subject of incremental improvements over a course of 50 years.
Take a self-intersecting polygon P, and consider the bounded components of the plane when the polygonal path P is removed (i.e., R2 / P). For each such component, we have, for example:

1. Its area, and the area of its convex hull.
2. Various notions of how may times the curve P winds around the component.
The problem is to establish as strong a bound as possible where the smaller side comprises an expression built from the values described in i. and ii. above, and the upper bound is the area of one of the following:
1. The circle with the same perimeter as P.
2. A polygon that is inscribed in a circle and has sides with the same lengths as P.
3. A polygon that is simple, convex, and is built from a rearrangement of the edges in P, where only translations (and no rotations) are used.
Let P be a non-simple polygon in R2 with segments that are directed by a traversal along its vertices. Let E be the collection of located directed segments of P, but with some segments split into several pieces as necessary.
Let Z be a multiset of oriented simple polygons (cycles) whose edges comprise the same collection of directed located segments as those in E.
For any x in ConvHull(P), let w(x) be the maximum of 1 and the number of cycles in Z that contain x in their convex hulls.
For any oriented simple cycle C in Z, let W+(C) be the number of cycles in Z that have the same orientation as C and contain C in their convex hulls. Let W_(C) be the number of cycles in Z that are oriented opposite from C and contain C in their convex hulls.
Let k = 22 - 2, and let W(C)= W+(C) + kW_(C).
Let a be the area as described in iii., which is the smallest of the three choices.
We show that:

∫x e ConvHull(P)w2(x)dx+ SC e ZW(C)(Area(ConvHull(C))-Area(C)) < a.

Previous bounds used the multiplier w(x) rather than w2(x) and set W=0, or set W=0 and replaced a with p2/ 4p, where p= Arclength(P). The former bound is not strong, and the latter is an elementary consequence of the Brunn-Minkowski inequality.
With the exception of k, no subexpression can be increased by a constant factor, and k cannot exceed (13 - 1)/~ .869. We also offer an enhanced bound where p2/ 4p replaces a.

Arrangements of segments and the area they encompass.
1. Around 30 years ago, Fejes Tóth posed the following problem.
Let E be an arbitrary arrangement of line segments in the plane. Let A be the sum of the areas of the bounded components of the plane when the edges in E are removed (i.e., of R2 / E). View the segments as having fixed lengths but as relocatable.

Prove that A is maximized precisely when E forms a polygon with the greatest possible area.
The theorem is obviously true; the point of the problem is to devise a sound method to handle the lack of structure imposed by arbitrary arrangements of segments.
2. Some Dido-type Inequalities. (The paper is here.) (Back to Area Hashing Medians)

In 1989, A. and K. Bezdek posed the following problem:
Let E be an arbitrary arrangement of line segments in the plane. Suppose that E as a pointset is connected. View the segments as having fixed lengths but as relocatable.

Show that the area of the convex hull of E is maximized when E forms a polygonal path that is tightly inscribed in a semicircle whose endpoints align with the endpoints of the path.
We establish the bound, and generalize it as follows.
The requirement that the edge set be pointwise connected is replaced by a much weaker condition. The polygonal path used to get an upper bound for the area is replaced by a different path to get a smaller upper bound. The path is built from a rearrangement of the edges in P, where only translations (and no rotations) are used. It must be convex, and have a total rotational increase of consecutive edge directions that adds up to at most 180o.
Open problems.  These results raise more questions than answers.
1. One obvious set of questions concerns generalizations to higher dimensions.
2. Another:  What is the right value for k? We suspect that it is (13 - 1)/3.
3. Others are based on the following observation.
The convex hull seems to be ill-suited for defining adjusted notions of area. The objective is to include the area of pockets but not too much more. In the Bezdeks' problem, the dilemma of catching too much area was fixed by the excessive restriction that the collection of segments be pointwise connected.  With the Umbra operation as defined below, the area that should be caught by the convex hull still is, but the connectivity requirement can be completely dropped to get the ``the right'' generalization. The same issue arises in the Dido problem of Fejes Tóth. Using the convex hull just creates needless obstacles. The problem reappears when formulating sharper bounds for non-simple polygons. So:
Let F be a finite collection of segments located in the plane.
Define Umbra(F) to be the set of points x such that every line through x intersects some segment in F.  This should be good for the Bezdeks.
Define Encloak(F) to be the set of points x such that every line through x intersects at least two segments in F. This should be good for Fejes Tóth.
Why stop? Keep counting. This should be good for self-intersecting polygons, and might even give a way to unite the result with the Fejes Tóth problem.

Performance analyses for closed hashing in a model that supports real computation

In closed hashing, which is also called hashing with open addressing, data keys are inserted into a table T one-by-one.  The hash function p(x,j) is used to compute the table address of key x as follows:  x is inserted into the first vacant table location, in the sequence T[p(x,1)], T[ p(x, 2)], T[ p(x, 3)], . . . . This form of hashing does not use pointers or auxiliary storage. Search for a key x is achieved by testing the table locations in the same sequence until x is located or a vacant location is found, whence it will follow that x is not in T.  For specificity, let the keys belong to a finite universe U, and let T have the indices 0..n-1. The performance, as a function of the load factor a, is defined to be the expected number of locations that must be examined to insert the next key when the table contains an keys.

Uniform hashing is an idealized model of hashing that assumes the collection of random variables p(x, 1), p(x, 2), . . . for all x in U is jointly independent and uniformly distributed over [0, n-1].
Linear probing is more realistic. It defines p(x, j) = d(x)-j+1 (mod n), where d(x), for x in U, are a family of independent uniformly distributed random variables over the range [0, n-1].

Double hashing defines p(x, j) = f(x)+(j-1)d(x) (mod n), where the table size n is prime, (f(x), d(x)) is uniformly distributed over [0, n-1]X[1, n-1], and the families f(x) and d(y), for (x, y) in UXU, are jointly independent.

Tertiary Clustering defines an idealized model that we also analyze. It was probably invented to formalize a model of hashing that is more realistic than uniform hashing, more efficient than linear probing and more tractable than double hashing.

Prior work

Uniform hashing is trivial to analyze. Linear probing is much more difficult, and the exact analysis originates with Knuth.  The analysis of double hashing evolved in sporadic spurts over the course of 20 years. The first milestone was the analysis by Guibas and Szemerédi, which showed that its performance is asymptotically equivalent to uniform hashing for load factors below 0.3. Subsequently, Lueker and Molodowitch showed that the equivalence holds for any fixed load below1.0. Ajtai, Guibas, Komlós and Szemerédi also discovered this fact around the same time.

What's missing from these analyses.

The most serious issue is that all of the hash functions are by definition unprogrammable because of the assumptions about full independence.  Real hash functions are subroutines that are initialized by some number of random seeds that are then used to compute a deterministic function of the seeds and the hash key. Thus, hash functions are really pseudo-random functions whose sole source of randomness comes from the seeds. As far as anyone knows, none of the prior analyses can be adapted to this kind of restricted randomness.

These issues are resolved in the following Part I and Part II papers, plus one more that is listed later:

The basic result is:

Asymptotically, the use of programmable hash functions with limited (but sufficient) numbers of random seeds has the same performance as idealized pure random hash functions. Moreover, our proof techniques even improve and extend some of the results for purely random hash functions.

These works offer the first analyses for closed hashing and sublinear numbers of seeds. Together with the paper discussed next, the results give an affirmative answer the question:  Can the performance results predicted by the idealized analyses be achieved for programmable hash functions that can be evaluated in constant time? This study is also the only analysis where one proof method covers a number of different hashing schemes. The analysis is all about controlling the exponential blowup of error bounds that occurs from inclusion-exclusion arguments.

The commonality and generality in the proofs are a consequence of using the statistical characteristics of the hash functions as opposed to their specific implementation features. For example, in the case of double hashing, the two random variables p(x,j) and p(x,i) are statistically independent provided i is unequal to j.  From this perspective, the performance analysis cannot distinguish among double hashing with limited independence, full independence, nor uniform hashing with full or limited independence.  Consequently, we "only" have to show that if the limited independence is large enough, then the performance, for a fixed load factor a<1, has some very complicated formulation based on inclusion-exclusion semantics, that, apart from a large number of error contributions that sum to O(1/n), are all the same expression. Since we know that this expression equals 1/(1- a) +O(1/n) for uniform hashing with full independence, we do not have to evaluate the beastly mess.

A minor consequence is that the error bound for double hashing is driven down to O(1/n), which is sharper than the O(1/n) or so of Lueker and Molodowitch.

Lastly, the Lueker- Molodowitch proof deserves a special acknowledgment. The work is simply beautiful and worth reading. A thumbnail sketch of their work and how its proof schema influenced this study will be prepared at a later date.

The bad news is that all of these limited independence results use c log n-wise independence. So the next question is:

What computational resources are necessary for a hash function p(x) to exhibit clog n-wise independence and be computable in constant time?  The answer is in:

A family of functions F that map [0,m]->[0,n], is said to be h-wise independent if any h points in [0,m] have an image, for randomly selected f in F, that is uniformly distributed. This paper gives both probabilistic and explicit constructions of (ne)-wise independent functions, for suitably small constant positive e, that can be evaluated in constant time for the standard random access model of computation.  As a consequence, many probabilistic algorithms can for the first time be shown to achieve their expected asymptotic performance for a feasible model of computation. While the issue of fast, provably sound hashing has been recognized as important for decades, this work represents the first progress toward solving the problem.

We show that the speed/degrees-of-freedom tradeoff for such provably sound hash functions is actually a tradeoff between the independence h and the caching storage plus precomputation.  Loosely put, any h-wise independent hash function that uses fewer than h operations, needs no more than 2h random seeds, which is wonderful. But such a program also must have, for a suitable small but positive constant d< 1, a cache of z = nd pseudorandom precomputed seeds, which can be computed from the 2h random seeds.  The program will use the hash key to locate a few of the cached pseudorandom seeds, which can then be combined to produce the hash value. Of course, the location of the next seed to read can be a function of the seeds already read as well as the key, and our lower bound includes this possibility.
We offer one lower bound with two interpretations.

Lower bound: Let F be an algorithm that hashes keys from a domain [0,D] into a range [0,R]. Suppose that the algorithm works by adaptively reading T pseudorandom seeds from a cache of z words in [0, R].  Suppose that F is h-wise independent, with a commonly used non-uniformity in the distribution that is bounded by an error parameter m.
Then:

z(z-1)(z-2). . .(z-T+1) > (h-2)(h-3). . .(h-T)|D|(1-m/ |R|).

Notice how this bound collapses when T goes from h-1 to h. It says that either T is h or the cache size z must be about as large as D1/T words.

There are two additional observations that must be said.  First, the bound shows that m does not appear to be an important parameter, and our constructions show that this extra freedom is useless. Second, when D>>R, it shows that the storage costs can be high.  As a consequence, we are obliged to quantify a new kind of relaxation in the statistical characteristics of F.  The resulting error is provably insignificant in terms of its influence on the performance of probabilistic algorithms.  However, this change allows D to be replaced by R in this lower bound, and our probabilistic constructions suggest that the resulting bound for T might be achievable to within a factor of 2.

The difficult part of the algorithm is in determining which cache seeds to read.  In addition to our probabilistic existence arguments, we also explicit constructions that give formally comparable results but which, essentially, increase the cache size by a constant factor and increase the running time by a very large constant factor.

Lastly, we show that the problem of finding the right (i.e. truly efficient) graphs is equivalent to defining expander-like bipartite graphs with n input nodes, nd output nodes, for d < 1, and (small) constant input degree.  In addition, the graph should be represented by a program/data set of size nd or less, and the program should be able to compute the adjacency list for an input vertex in constant time.  Weaker problems of this type are open. There has been progress on these questions, but at a very slow pace.

In summary, the problem of implementing fast, highly random hash functions is formally solved so that the theoretical model is provably sound and usable.  The prospects for really implementable functions are enhanced by the identification of the right problem to solve.  On the other hand, the problem seems to be very difficult.

Applications are given for closed hashing and for the randomization necessary in a pipelined implementation of Ranade's PRAM emulation scheme on an N X log N butterfly network that uses N log N switches but only one column of N processors and memory modules. The formal performance results are the same as Ranade's scheme, which requires NlogN processors.  In terms of the processor-time characteristics, this implementation gives is an optimal emulation of a PRAM.

Medians of discrete random variables
1. Applications of median specific median bounds.

1. Computing tail estimates for functions of weakly dependent random variables.
2. The analysis of probabilistic processes.
A fairly direct analysis of the log log n + O(1) time expected performance of interpolation search.
Bounds applicable to double hashing.
Bounds used to prove optimal performance for a pipelined version of Ranade's PRAM emulation scheme.
2. Systematic methods for computing medians of families of random variables.

The approach uses naive analytic symmetrization to replace messy asymptotics with simple global analysis that is sort of a study of shapology. As an elementary example, let F(x) be the cumulative distribution function of a non-negative random variable with mean μ. Then F(x)+F(2μ-x) is symmetric on [0, 2μ], and almost flat. A proof that its average value on [0, 2μ] exceeds 1, along with an analysis of its critical points can show that 2F(μ)>1.
3. Specific median results for new families of random variables. Some of the pure probability results are easier to explain by example than with formal definitions of the underlying distributions.
(In the following results, we always assume that various values are integers. If not, the median will be one of the two integers nearest to the value in question.)
1. First, the main prior result that should be better known: You have 1001 coins that are independent but might not be fair or have identical probabilities of success. You know that the expected number of heads is 37. A weak interpretation of the theorem [due to Jogdeo and Samuels] reads:

When the mean is an integer for sums of Bernoulli Trials it is also the median.

We use self-contained systematic methods to establish this bound and to prove a number of new estimates that do not follow from the Jogdeo-Samuels results.

1. As before, you have 1001 independent heterogeneous coins and know that the expected number of heads is the integer m. You perform the following experiment (designed to eliminate outliers): Toss all the coins. Accept the answer if the number of heads is within r of m, where r is fixed. Otherwise repeat the experiment until the outcome is in the desired interval.

Then regardless of what r you choose, m is still the median.

2. You have 1001 heterogeneous red coins and 30001 heterogeneous green coins. The expected number of green heads is the integer g, and the expected number of red heads is the integer r. You repeatedly toss the coins until there are exactly g + r heads.

Then the median number of red heads is r.

3. Weighted selection: You have an Urn that contains R red balls and G green balls. Each red ball weighs r grams, and each green ball weights g grams. Let x + y balls be drawn from the urn, where x = R(1-e-rt) and y = G(1-e-gt) for some value of t. Suppose that x and y are integers.  The balls are selected one-by-one without replacement.  The probability of choosing a ball at the next round is its fraction of the total weight among the balls that remain unselected.

Then the median number of red balls selected is x.

4.  Jogdeo and Samuels gave a very sharp formulation of their bound i., and we offer a corresponding version for 1.  In brief, the J-S bound is this:

Let X be a sum of independent, arbitrarily distributed Bernoulli Trials with mean E[X]=m where m is an integer. Let Prob{X = m} = p.
Then [J-S]: | Prob{X < m} + p/2 - ½| < p/6, and this bound is tight.
This bound is a generalization of a comparable statement for the Poisson distribution, which was conjectured and partially established by Ramanujan. Our methods do not give an independent proof of this result. However, we present (via different methods) an independent result that is comparable in structure and strength. Let X be a sum of independent Bernoulli Trials with mean E[X] = m. Suppose that m is an integer. Let Y be the resulting random variable that has the conditional probability distribution as defined in 1., so that |Y- m| < r for some fixed r.  Let Prob{Y = m} = p.
Then |Prob{Y < m} + p/2 - ½| < p/4, and this bound is tight.
(Back to Area Hashing Medians)