\documentclass[11pt]{article}
\begin{document}
%%%%% written in LaTeX %%%
\addtolength{\oddsidemargin}{-1.25cm}
\addtolength{\textwidth}{3cm}
\addtolength{\textheight}{4cm}
\addtolength{\topmargin}{-2cm}
\newcommand{\een}{\end{enumerate}}
\newcommand{\noi}{\noindent}
\newcommand{\ra}{\rightarrow}
\newcommand{\sig}{\sigma}
\newcommand{\eps}{\epsilon}
\newcommand{\del}{\delta}
\newcommand{\Del}{\Delta}
\newcommand{\TREE}{\mbox{TREE}}
\newcommand{\lam}{\lambda}
\newcommand{\ah}{\alpha}
\newcommand{\bqo}{\begin{quote}}
\newcommand{\eqo}{\end{quote}}
\newcommand{\lnai}{\lim_{n\ra\infty}}
\newcommand{\gam}{\gamma}
\newcommand{\ben}{\begin{enumerate}}
\newcommand{\beq}{\begin{equation}}
\newcommand{\eeq}{\end{equation}}
\newcommand{\bec}{\begin{center}}
\newcommand{\ece}{\end{center}}
\newtheorem{conjecture}{Conjecture}
\newtheorem{definition}{Definition}
\newtheorem{define}{Definition}
\newtheorem{theorem}{Theorem}[section]
\newtheorem{claim}{Claim}[theorem]
\newtheorem{prop}[theorem]{Proposition}
\newtheorem{remark}[theorem]{Remark}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{guess}[theorem]{Conjecture}
\bec {\Large\bf POTPOURRI}\\ Joel Spencer \ece
Some of our best work never appears in journal form. It is
in notes sent to colleagues and students. Here I have collected
some notes I have made over the years, with some current day
annotation. I have always enjoyed quotations and have sprinkled
some of my favorites through the text.
\section{ Maximal TriangleFree Graphs and Ramsey $R(3,k)$}
{\em Current Day Annotation} These notes were written
in 1995. Since 1961 the best lower bound on $R(3,k)$ had been
$ck^2\ln^{-2}k$. Building on a paper of Erd\H{o}s, Winkler and
Suen I was able to show that $c$ could be made arbitrarily
large. Why didn't I publish? Only a few weeks later Jeong-Han
Kim found that $R(3,k)=\Omega(k^2\ln^{-1}k)$, matching the upper
bound of Ajtai, Koml\'os and Szemer\'edi, so that
$R(3,k)=\Theta(k^2\ln^{-1}k)$. The ideas in these notes,
studying the random greedy trianglefree algorithm, were cited by
Tom Bohman in his analysis of this process and his alternate
proof of Kim's result.
\subsection{Results}
\bqo
Working with Paul Erd\H{o}s was like taking a walk in the hills.
Every time when I thought that we had achieved our goal and
deserved a rest, Paul pointed to the top of another hill and
off we would go.
-- Fan Chung
\eqo
We describe a random dynamic algorithm that creates a graph $G$ on
a vertex set $V=\{1,\ldots,n\}$. The $2$-sets $e\subset V$ are
called {\em pairs}. To each pair $e$ assign,
independently and uniformly, a real $x_e\in [0,n^{1/2}]$. (We
further assume the $x_e$ are distinct, this occurs with probability
one.) We call $x_e$ the birthtime of $e$. Begin at time zero with
$G$ empty. Let time increase. When an edge $e$ is born add it
to $G$ if and only if that does not create a triangle in $G$.
If $e$ is added to $G$ we say $e$ is accepted, otherwise rejected.
Let $G_c$ be $G$ at time $t=c$ and $G^f$ be the final $G$, at
time $t=n^{1/2}$. Let $Z_c,Z^f$ be the number of edges of $G_c,G^f$
respectively. All these are random variables, dependent on the
choices of the $x_e$. We will show:
\\ $\bullet$ For all $L$ there exist $c,n_0$ so that for $n>n_0$
\beq\label{one} E[Z_c] \geq L\frac{n^{3/2}}{2} \eeq
\\ $\bullet$ For all $\eps>0$ there exist $c,n_0$ so that for $n>n_0$
\beq\label{Ram} \Pr[\alpha(G_c)\geq \eps n^{1/2}(\ln n)] < 1 \eeq
In particular, there exists a graph $G=G_c$ which is trianglefree and
has no independent set of size $\eps n^{1/2}(\ln n)$. That is,
the Ramsey Function $R(3,k)>n$ for $k=\eps n^{1/2}(\ln n)$. Reversing,
for all $M>0$ if $k$ is sufficiently large then
\beq\label{newr}
R(3,k) > M \frac{k^2}{\ln^2k}
\eeq
improving Paul Erd\H{o}s's classic 1961 lower bound on $R(3,k)$.
\par Fix a pair $e=\{i,j\}$. We say $e$ survives at time
$c$ if there is no $k\neq i,j$ with $\{i,k\},\{j,k\}\in G_c$. Let $f_n(c)$
be the probability that $e$ survives at time $c$ given $x_e=c$. This
is independent of the particular $e$. In an infinitesmal time range
$c$ to $c+dc$ there is probability $n^{3/2}dc/2$ that some edge $e$ is
born and probability $n^{3/2}f_n(c)dc/2$ that an edge is accepted.
Thus
\beq\label{aa}E[Z_c]=\frac{n^{3/2}}{2}F_n(c) \eeq
where we define
\beq\label{aa1} F_n(c)=\int_0^c f_n(t)dt \eeq
We shall give an explicit function
$f(c)$ so that
\beq\label{two} \lnai f_n(c) = f(c) \eeq
and further the limit is uniform in that for every $C,\eps>0$ there exists $n_0$
so that $|f_n(c)-f(c)|<\eps$ for all $n>n_0$ and all $0\leq c\leq C$.
We'll further show, by explicit integration, that
\beq\label{three} \int_0^{\infty} f(c) = \infty \eeq
Lets show that this implies \ref{one}.
Pick $C$ so that $\int_0^C f(c)dc > L+1$.
Pick $n_0$ so that for $n>n_0$ and $0\leq c\leq C$
we have $|f_n(c)-f(c)| \int_0^C f(c)dc -1 > L \]
\subsection{A Branching Process}\label{branch}
To define $f(c)$ we consider a branching process beginning with a root ``Eve'' with
birthdate $c$. Eve gives birth to ordered twins, with birthdates $x,y$. The set
of ``twinbirthdates'' $(x,y)$ is given by a Poisson distribution with unit
density over $[0,c]\times[0,c]$. That is, for any $0\leq x,y0$, condition on
$x_e=c$, and consider $f_n(c)$. Define the {\em relevant history}
of $e$ to be a set $T$ of edges defined as follows. $e\in T$.
If $\{u,l\}\in T$ and $x_{u,v},x_{l,v}< x_{u,l}$ then $\{u,v\},
\{l,v\}\in T$. We can find $T$ by a breadth first search, we
search an edge $\{u,l\}$ already in $T$ by checking whether any $v$
satisfy the condition and if so adding those edges to $T$. We
call the relevant history {\em normal} if every time such a $v$
is found it is a vertex that has not yet appeared in any of the
edges of $T$. When the relevant history is normal we give
$T$ a twintree structure, letting $\{u,v\},\{l,v\}$ be twins
of $\{u,l\}$, with $\{u,v\}$ the firstborn if and only if
$uc$. Now for each $u\not\in REP$ and each edge
$i=\{top(i),bot(i)\}$ let $B_{u,i}$ be the ``bad'' event that
$x_{u,v}0$ be arbitrarily small
and let $FIN$ be a finite family of twintrees so that the branching
process yields a $T\in FIN$ with probability at least $1-\frac{\eps}{2}$.
(E.g., $FIN$ could be all twintrees with at most some large number $D$ of
edges.) Now use \ref{lim} to pick $n_0$
so that for $n>n_0$ and each of the finite number of $T\in FIN$
\[ |f_n(T,c)-f(T,c)|< \frac{\eps}{2|FIN|} \]
Then $f_n(c)$ is at least the probability that there is a normal
relevant history with twintree $T\in FIN$ with the root surviving
and that is at least $f(c)-\eps$. Also $1-f_n(c)$ is at least
the probability that there is a normal relevant history with twintree
$T\in FIN$ with the root not surviving and that is at least
$1-f(c)-\eps$. As $\eps$ was arbitrary this yields \ref{two}.
\par The required uniformity over $c\in [0,C]$ for \ref{two} is
easy to check. From \ref{EN} given $\eps>0$ we may pick $FIN$
that works for every $c\in [0,C]$ simultaneously. An examination
of the proof of \ref{lim} gives that the limit is approached
uniformly for $c\in [0,C]$.
\subsection{A Differential Equation}
\bqo Its a thing that non-mathematicians don't realize. Mathematics is
actually an esthetic subject almost entirely. -- John Conway
\eqo
Here we find $f(c)$ as the solution to a differential equation.
Consider Eve with birthdate $c+\Delta c$. For Eve to survive
she must have no twins both surviving with twinbirthdate
$(x,y)\in [0,c]^2$ nor twins both surviving with twinbirthdate
$(x,y)\in X$ where we set $X=[0,c+\Delta c]^2-[0,c]^2$.
The Poisson nature of Eve's births make these independent
events. Thus
\beq\label{ww} f(c+\Delta c) = f(c)(1-A) \eeq
where $A$ is the probability Eve does have twins, both surviving,
twinbirthdate $(x,y)\in X$. We first
bound $0\leq A\leq 2\Delta c + (\Delta c)^2$, the latter being
an upper bound on the probability Eve has twins with twinbirthdates
in this interval. By itself, this implies that $f$ is continuous
and nonincreasing. Then $f$ is integrable. We define the integral
\beq F(u)=\int_0^u f(t)dt \eeq
Let $Z$ be the number
of Eve's twins with twinbirths in $X$, both surviving. Then $E[Z]$
is simply the integral of $f(x)f(y)$ over $(x,y)\in X$. Splitting
$X$ into three rectangles and using Fubini's Theorem
\[ E[Z]=2[F(c+\Delta c)-F(c)]\cdot F(c)+ [F(c+\Delta c)-F(c)]^2 \]
For $c$ fixed we do asymptotics with $\Delta c\rightarrow 0$. As $f$
is nonincreasing the last term is at most $(f(c)\Delta c)^2=o(\Delta c)$.
By continuity (and the fundamental
theorem of calculus!)
\[ F(c+\Delta c)-F(c) \sim f(c)(\Delta c) \]
so that
\[ E[Z] \sim 2f(c)F(c)(\Delta c) \]
Consider $Z$ as $A$ plus the sum over $i\geq 2$ of $i-1$ times the probability
Eve has $i$ twinbirths in $X$, both surviving. Even neglecting the
both surviving requirement this sum is $O((\Delta c)^2)$. Thus
\[ A \sim 2f(c)F(c)(\Delta c) \]
so that \ref{ww} becomes
\[ \frac{f(c+\Delta c)-f(c)}{\Delta c} \sim -2f^2(c)F(c) \]
which beomes (in $F$) the second order differential equation
\beq\label{de}
F{{'}{'}}(c) = -2(F'(c))^2F(c) \eeq
At $c=0$ we have the initial conditions
\beq\label{ini} F(0)=0, f(0)=F'(0)=1 \eeq
\par Fortuitously (?!) this differential equation has the
precise implicit solution
\beq\label{sol}
c = \int_0^{F(c)} e^{t^2}dt \eeq
which does indeed have the property that $\lim_{c\rightarrow\infty}
F(c)=\infty$. This gives \ref{three} and therefore \ref{one}.
\\ {\em Remark and Conjecture.} Let $G^f,Z^f$ be the final $G$ and
its number of edges as defined in our opening paragraph. Note that
while the use of independent $x_e$ proved to be a handy analytic
tool we could equally well have defined $G^f$ as follows. Randomly
order the ${n\choose 2}$ pairs. Begin with $G=\emptyset$.
Add each edge to $G$ if it would not create a triangle. Then $G^f$
in the final value of $G$. What is the usual value of $Z^f$? As
$Z^f\geq Z_c$ we've shown that $E[Z^f]$ grows faster than $n^{3/2}$.
{\em We conjecture that $Z=\Theta(n^{3/2}(\ln n)^{1/2})$ almost
always.} We know that for $c$ fixed $E[Z_c]\sim F(c)n^{3/2}/2$.
A simple analysis of \ref{sol} gives that
\beq\label{asymF} F(c) \sim (\ln c)^{1/2} \eeq
asymptotically as $c\rightarrow\infty$. If we ``plug in'' the
final value $c=n^{1/2}$ this would give the conjecture. We
emphasize that this is not a valid argument, the limiting relation
between $f_n(c)$ and $f(c)$ held only for $c$ a constant, albeit an
arbitrarily large one, not for $c$ a function of $n$. We also note
that the results of the next section indicate that, at least to some
extent, $G_c$ can be regarded as the random graph $G(n,p)$ with $p$
chosen so that the two models have the same expected number of edges.
If this applied to $G^f$ and if the expected number of edges in $G^f$
were $n^{3/2}(\ln n)^{1/2}$ then the simple argument of the next
section would give that almost surely $\alpha(G^f)n$ or,
reversing variables. $R(3,k)=\Omega(k^2(\ln k)^{-1})$. This
would match the upper bound of Ajtai, Koml\'os and Szemer\'edi.
\\ {\em Remark.} We've shown $G_c$ has expected size $F(c)n^{3/2}/2$.
N. Alon has given an intuitive justification for this. {\em Suppose}
$G_c$ behaved like a random graph with $p=F(c)n^{-1/2}$. By time
$c+dc$ an additional $\frac{1}{2}n^{3/2}dc$ pairs are born. The
probability that a pair has a common neighbor in $G(n,p)$
is $(1-p^2)^{n-2}\sim \exp[-F(c)^2]$. Thus it would be reasonable
to expect $\exp[-F(c)^2]\frac{1}{2}n^{3/2}dc$ pairs to be
accepted. This would give $F(c+dc)=F(c)+\exp[-F(c)^2]dc$. Taking
$dc$ infinitesmal this gives a differential equation with solution
\ref{sol}.
\subsection{Ramsey $R(3,k)$}
Our object here is to show \ref{Ram}. For intuitive guidance
in view of \ref{one} lets consider instead of $G_c$
the usual random graph $G\sim G(n,p)$ with $p=Ln^{-1/2}$
Let $k=\eps n^{1/2}(\ln n)$. There are ${n\choose k}a-u] \geq e^{-\lam u}\left[g(\lam)-e^{-\eps u}[g(\lam+\eps)+
g(\lam-\eps)]\right]
\eeq
In application we select $\lam=\lam_0$ so as to minimize (or nearly
minimize) $g(\lam)$. Then we select $\eps$ fairly small. As $g$
was minimized at (or near) $\lam_0$ we should have $g(\lam\pm\eps)$
fairly close to $g(\lam)$. We select $u$ with, say, $g(\lam\pm\eps)/g(\lam)
< \frac{1}{4}e^{\eps u}$. Now the $g(\lam\pm\eps)$ terms have limited
effect and we would have $\Pr[X>a-u] \geq e^{-\lam u}g(\lam)/2$. Hopefully,
this will be fairly close to the upper bound $g(\lam)$.
Lets try (\ref{onex})
with the standard normal $N$ where we know the Laplace Transform
$f(\lam) = e^{\lam^2/2}$. Let $a$ be large. We set $\lam=\lam_0=a$ and
$g(\lam)=e^{-a^2/2}$, the Chernoff Bound. Here, rather conveniently,
$g(\lam\pm \eps)/g(\lam) = e^{\eps^2/2}$. Thus
\[ \Pr[N\geq a-u] \geq g(\lam)e^{-au}[1-2e^{-\eps u}e^{\eps^2/2}] \]
Suppose we take $\eps=2$ and $u=2$. This gives
\[ \Pr[N\geq a-2] \geq e^{-a^2/2}e^{-2a}[1-2e^{-2}] \]
Note that we have only used the three values $f(a-2),f(a),f(a+2)$ of
the Laplace Transform to derive this bound. This compares to the
upper bound $\Pr[N\geq a-2] \leq e^{-(a-2)^2/2} = \Theta(e^{-a^2/2}e^{2a})$.
So the bounds are off by a factor of $\Theta(e^{4a})$. This is not great
but for $a$ large it does give the correct asymptotics for the logarithm
of the large deviation.
In many applications one does not have the precise values of the
Laplace Transform $f(\lam)$. Suppose, however, that we have reasonably
good estimates {\em in both directions} on $f(\lam)$. Then (\ref{onex})
will give a lower bound for $\Pr[X>a-u]$ by using an upper bound for
$g(\lam)$ and lower bounds for $g(\lam\pm\eps)$.
The applications work particularly well when there is some parameter
$n$ and the Laplace Transform is exponential in $n$. A standard
example is to take $X=S_n$ (the sum of $n$ random $\pm 1$) and
parametrize $a=n\ah$ for some fixed $\ah\in (0,1)$. The Laplace
Transform
\[ E[e^{\lam S_n}]=E[e^{\lam X_1}]^n = (\cosh(\lam))^n \]
So that \[ g(\lam)=e^{h(\lam)n} \] where we set
$h(\lam)=\ln(\cosh(\lam))-\ah\lam$.
Here there is a $\lam=\lam_0$ (which can be computed explicitly using
Calculus) where $h$ is minimized and the Chernoff Bound gives
\[ \Pr[S_n >n\ah] < e^{nh(\lam)} \]
For the lower bound we set in (\ref{onex}) $u=n\del$ where $\del$
is arbitrarily small. Because (critically) $h$ has its minimum
at $\lam$ we have $h'(\lam)=0$ so for $\eps$ small
\[ h(\lam\pm\eps) \leq h(\lam) + \frac{K}{2}\eps^2 \]
Here we take $K$ so that $|h{{'}{'}}(s)|\leq K$ for all $s$ in an interval
$I$ around $\lam$ that contains $[\lam-\eps,\lam+\eps]$.
Then
\[ \ln\left[\frac{e^{-\eps u}g(\lam\pm\eps)}{g(\lam)}\right] \leq
n[-\eps\del + \frac{K}{2}\eps^2] \]
We select $\eps$ positive with $-\eps\del + \frac{K}{2}\eps^2<0$.
Now the terms $e^{-\eps u}g(\lam\pm\eps)$ are exponentially small
compared to $g(\lam)$ and so (\ref{onex}) gives
\[ \Pr[S_n>n(\ah-\del)] \geq e^{-\lam \del n}g(\lam)(1-o(1)) \]
or
\beq \label{twox}
\ln[\Pr[S_n>n(\ah-\del)]] \geq n[h(\lam)-\lam\del] -o(1)
\eeq
From this we would like to deduce:
\[ \frac{1}{n}\ln[\Pr[S_n>n\ah]] = h(\lam) \]
That is,
the Chernoff bound is, up to a $1+o(1)$ factor in the exponent, correct.
Indeed, this is the case and the following is a fairly general setting
in which one can match the Chernoff Bounds with a logarithmically
asymptotic lower bound.
Let $Z_n$
be any sequence of random variables. Define
\[ F(\lam) = \lim_{n\ra\infty} \frac{1}{n}\ln E[e^{\lam Z_n}] \]
noting that the limit might not exist. Let $a$ be a real number.
\noindent {\tt Theorem:} Suppose that there exists a
$\lam\geq 0$ and an open interval $I$ containing $\lam$ such that
\\ $\bullet$ $F(s)$ exists and has a first and second derivative for all
$s\in I$.
\\ $\bullet$ $F'(\lam)=a$
\\ $\bullet$ The function $F'$ is strictly increasing
over $I$.
\\ $\bullet$ There exist $K$ such that $|F^{{'}{'}}(s)|\leq K$ for all $s\in I$.
\\ Then
\[ \lim_{n\ra\infty}\frac{1}{n}\ln[\Pr[Z_n>an]] = F(\lam)-a\lam \]
This may also be written $\Pr[Z_n>an] = e^{n(F(\lam)-a\lam +o(1))}$.
\noindent {\tt Note:} For any $Z$ the function $F(s)=\ln E[e^{sZ}]$ has
$F{{'}{'}}(s)\geq 0$ so that $F'$ is increasing. The $F$ in the theorem is
defined by a limit and $F'$ is needed to be strictly increasing, but this
does occur in many natural cases.
The upper bound is the Chernoff Bound as
\[ \Pr[Z_n>an] = \Pr[e^{\lam Z_n}>e^{\lam an}] \leq E[e^{\lam Z_n}]
e^{-\lam an} =
e^{n(F(\lam)-a\lam +o(1))} \]
For the lower bound we want to apply the bounds above.
First note that since $F'$ is continuous (as it is differentiable)
and monotone over $I$ it has a continuous inverse $H$ defined over
some interval $J$ containing $a$. Note $H(a)=\lam$.
Let $u$ be a positive real sufficiently small that
$H(a+u)\pm \frac{u}{K} \in I$. All sufficiently small $u$ satsify
this since $\lim_{u\ra 0}H(a+u)\pm \frac{u}{K} = H(a)=\lam$.
Set $a^*=a+u$
and $\lam^*=H(a^*)$ so that $F'(\lam^*)=a^*$.
We define
\[ g_n(s)=E[e^{sZ_n}]e^{-sa^*} \]
Inequality (\ref{onex}) becomes (noting that $an=a^*n-un$)
\[ \Pr[Z_n>an] \geq e^{-\lam^*a^*n}[g_n(\lam^*)-e^{-\eps un}[
g_n(\lam^*+\eps)+g_n(\lam^*-\eps)] \]
We select $\eps = \frac{u}{K}$. Our selection of $u$
assures us that $\lam^*\pm \eps$ belong to $I$. We have
\[ \lim_{n\ra\infty} \frac{1}{n}
\ln\left[\frac{e^{-\eps un}g_n(\lam^*+\eps)}
{g_n(\lam^*)}\right] = -\eps u + F(\lam^*+ \eps)-F(\lam^*)- \eps a^* \]
We have selected $\lam^*$ so that $F'(\lam^*)=a^*$. Since
$|F{{'}{'}}(s)|\leq K$ in the interval $I$ Taylor Series bounds
\[ | F(\lam^*+ \eps)-F(\lam^*)- \eps a^*| \leq \frac{K}{2}\eps^2 \]
Our choice of $\eps$ (chosen to minimize the quadratic though
any sufficiently small $\eps$ would do) gives that
\[ -\eps u + F(\lam^*+ \eps)-F(\lam^*)- \eps a^* \leq -\frac{u^2}{2K} \]
Thus $e^{-\eps n}g_n(\lam^*+\eps)/g_n(\lam^*)$ drops exponentially
quickly. We only use that for $n$ sufficiently large the ratio is
less than $0.25$. The same argument shows that for $n$ sufficiently
large $e^{-\eps n}g_n(\lam^*-\eps)/g_n(\lam^*)< 0.25$. For such
$n$ we then have
\[ \Pr[Z_n>an] \geq \frac{1}{2}e^{-\lam^*a^*n}g_n(\lam^*) \]
This lower bound is $\exp[n(F(\lam^*)-\lam^*a^*+o(1))]$.
Now consider $F(\lam^*)-\lam^*a^*$ as a function of $u$. As
$u\ra 0$, $\lam^*=H(a+u) \ra H(a)=\lam$. As $F$, being differentiable, is
continuous $F(\lam^*)\ra F(\lam)$. Clearly $a^*=a+u\ra a$ and therefore
$\lam^*a^*\ra \lam a$. Let $\eps_1$ be an arbitrary positive integer.
As
\[ F(\lam^*)-\lam^*a^* \ra F(\lam)-\lam a \]
there exists a positive $u$ such that
\[ F(\lam^*)-\lam^*a^* \geq F(\lam)-\lam a - \eps_1 \]
This gives a lower bound on $\Pr[Z_n>an]$ of $\exp[n(F(\lam)-\lam a -
\eps_1 + o(1))]$. As $\eps_1$ is arbitrary we deduce
$\Pr[Z_n > an] \geq \exp[n(F(\lam)-\lam a + o(1))]$, which completes
the argument.
\section{Percolating Thoughts}
\bqo I have no home, the world is my home. -- Paul Erd\H{o}s \eqo
{\em Current Day Annotation:} These notes were sent out on
December 30, 2001. The letter format certainly allows a free
form of expression. Many of the conjectures have been shown
in the past nine years in joint work with Nick Wormald and now
in current work with Milyun Kang and Will Perkins, and through
work of many others as well.
\noindent Dear Friends,
As the first year of the new millennium comes to a close I invite you
to put aside thoughts of the state of the world. Let's talk math!
At the Pozna\'n meeting last summer
Svante Janson showed me the following intriguing explanation (not proof!)
for why the Erd\H{o}s-R\'enyi explosion, creating the giant component,
occurs at $t\frac{n}{2}$ edges with $t=1$. Let $t$, thought of as time,
be when $t\frac{n}{2}$ edges have been put into the random graph. For any
$G$ on $n$ vertices let $X=X(G)=\frac{1}{n}\sum_v |C(v)|$ where $C(v)$ is
the component containing $v$. So $X$ is the expected size of the component
containing any fixed $v$, a natural notion in percolation.
Equivalently, letting $C_i, 1\leq i\leq s$ denote the components,
$X= \frac{1}{n}\sum_{i=1}^s |C_i|^2$. Add a random edge giving $G^+$.
For $iK\}$ saying which edge to select.
Our first algorithm is of that type with $K=1$: Select the first edge if
$a=b=1$, otherwise the second edge.
\noindent {\tt Conjecture 3} Any Size Algorithm has a critical value $t_0$ such
that at $t_0-\epsilon$ the largest component is $O(\ln n)$ while at
$t_0+\epsilon$ the largest component is $\Omega(n)$.
\\ {\tt Conjecture 4} Further, at $t_0+\epsilon$ the second largest component
is $O(\ln n)$.
Given a Size Algorithm a restriction to $K$ is a Bounded Size Algorithm with
that $K$ that agrees with the Size Algorithm when $a,b,c,d\leq K$. There can
be many such restrictions, as the restriction does not determine what to
do when, for example, $a,b,c\leq K$ and $d>K$.
\noindent {\tt Conjecture 5} For any Size Algorithm and any positive $\delta$
there exists $K_0$ such that all restrictions with $K\geq K_0$ have critical
value within $\delta$ of the critical value of the original algorithm.
Any Bounded Size Algorithm yields, as in the example, differential equations
for the proportion $y_i(t)$ of vertices in components of size $i$ for $1\leq i\leq K$.
These are all nice functions with no critical points and $\lim_{t\rightarrow\infty}
y_i(t)=0$ for all $i$. We can show that the random algorithm will stay close
to those values. Further there is a differential equation for $f(t)$, the
expected size of the component of a random vertex as earlier defined and this
differential equation has a blowup at some value $t^*$.
\noindent {\tt Conjecture 6} The blowup point $t^*$
is the critical point $t_0$ which satisfies Conjecture 3.
\noindent {\em HornBlowing:} We (Nick Wormald and I) have shown that at
$t^*-\epsilon$ the largest component has size $O(\ln^{O(1)}n)$. Thus the
giant component has not yet appeared. We have tried some algorithms -
for example, the product and minimin rules described above with $K=128$ and had
the computer find the blowup point $t^*$. We have found algorithms with
$t^*$ as large as $1.78$. Thus, in terms of Achlioptas's original problem,
Carole can stave off the giant component at least until $1.78$ or, in the
other notation, until $0.89n$ edges have been accepted.
Given a rule, say the product rule, we have a number of percolation questions.
Let $t_0$ be the critical value and let $g(i)$ be the proportion of vertices in
components of size $>i$ at the critical value.
\noindent {\tt Question 1} What are the asymptotics of $g(i)$?
In the usual Erd\H{o}s-R\'enyi evolution $t_0=1$ and $g(i)$ is the probability
that the Poisson mean one birth process has size bigger than $i$, the asymptotics
are $g(i)\sim ci^{-1/2}$. Some early computer studies indicate that this is
not the asymptotics for $g(i)$ for the product rule, though we are not even sure
what to conjecture here.
\noindent {\tt Question 2} What is the proper scaling for the critical window?
\\ {\tt Question 3} How large is the largest component inside the critical
window.
For the Erd\H{o}s-R\'enyi evolution the scaling is $ \frac{n}{2}+\lambda n^{2/3}$
when written in terms of the number of edges. When $\lambda\rightarrow\infty$
a dominant component has emerged whose size is much
greater than the second largest component whereas when $\lambda
\rightarrow\infty$ the largest components are nearly
the same size. For any fixed $\lambda$ the largest components have size $\Theta(n^{2/3})$. Is there a similar scaling $t_0n+\lambda n^{\gamma}$
with the largest components of size $\Theta(n^{\eta})$ for
the product rule? For the minimin rule? Is the value $\gamma$ giving
the size of the scaling window the same for different rules? Can we nicely
describe the random process inside the scaling window? Lots and lots of
questions here. No answers. Yet!
\\ -- Joel
\section{Stirling's Formula}
\bqo If you take a number and double it and double it again and then double
it a few more times, the number gets bigger and bigger and goes higher
and higher and only arithmetic can tell you what the number is when
you quit doubling.
-- from {\em Arithmetic} by Carl Sandburg
\eqo
{\em Current Day Annotation:} Notes for students.
Surely the most beautiful asymptotic formula in all of
mathematics is Stirling's Formula:
\beq \label{stirling} n! \sim n^ne^{-n}\sqrt{2\pi n} \eeq
How do the two most important fundamental constants
of mathematics, $e$ and $\pi$, find their way into
an asymptotic formula for the product of integers?
We give two very different arguments (one will not
show the full formula) that, between them, illustrate
a good number of basic asymptotic methods.
\subsection{Asymptotic Estimation of an Integral}
Consider the integral
\beq \label{stirxn} I_n = \int_0^{\infty}x^ne^{-x}dx \eeq
A standard result of Freshman Calculus, done by Integration
by Parts, is that
\beq \label{xx20} I_n = n! \eeq
Our problem now is to estimate the
integral of (\ref{stirxn}).
\bqo $\bullet$ Asymptotically, Integrals are Dominated
by the largest value of the function being integrated. \eqo
Let us set
\beq\label{aa111}
y=y_n(x)=x^ne^{-x}\mbox{ and } z=z_n(x)=\ln y = n\ln x - x \eeq
Setting $z'=nx^{-1}-1=0$ we find that $z(x)$ (and hence $y(x)$) has a
maximum at $x=n$.
Let's compare $y(n)=n^ne^{-n}$ with values of $y(x)$ when $x$ is
``near" $n$. For example, take $x=1.1n$.
\beq\label{aa2} y(1.1n) = (1.1n)^ne^{-1.1n}= y(n)(1.1e^{-0.1})^n \eeq
But $1.1e^{-0.1}=0.9953\cdots$. While this number is close to one it
is a constant less than one and so $y(1.1n)$ is exponentially smaller
than $y(n)$. Values near $1.1n$ will make a negligible contribution to
the integral.
Let's move closer and try $x=n+1$. Now
\beq\label{aa3} y(n+1) = (n+1)^ne^{-n-1}= y(n)(1+\frac{1}{n})^ne^{-1} \eeq
As $(1+\frac{1}{n})^n\sim e$, $y(n+1)\sim y(n)$ and so values near $x=n+1$
do contribute substantially to the integral.
Moving from $x=n$ in the positive direction (the negative is similar) the
function $y=y(x)$ decreases. If we move out $1$ (to $x=n+1$) we do not
yet ``see" the decrease while if we move out $0.1n$ (to $x=1.1n$) we
decrease is so strong that the function has effectively disappeared.
(Yes, $y(1.1n)$ is large in an absolute sense but it is small relative
to $y(n)$.) How do we move out from $x=n$ so that we can effectively
see the decrease in $y=y(x)$? This is a question of {\em scaling}.
\bqo $\bullet$
Scaling is the art of asymptotic integration \eqo
Let's look more carefully at $z(x)$ near $x=n$.
Note that an additive change in $z(x)$ means a multiplicative change
in $y(x)=e^{z(x)}$.
We have
$z'(x)=nx^{-1} - 1=0$ at $x=n$. The second derivative
$z^{{'}{'}}(x) = -nx^{-2}$ so that $z"(n)=-n^{-1}$. We can
write the first terms of the Taylor Series for $z(x)$
about $x=n$:
\beq\label{aa4}
z(n+\eps) = z(n) - \frac{1}{n}\eps^2 + \cdots \eeq
This gives us a heuristic explanation for our earlier
calculations.
When $\eps=1$ we have
$\frac{1}{n}\eps^2\sim 0$ so $z(n+\eps)=z(n)+o(1)$ and
thus $y(n+\eps)\sim y(n)$. When $\eps=0.1n$ the
opposite is indicated as $\frac{1}{n}\eps^2$ is large.
The middle ground is given when $\eps^2$ is on the
order of $n$, when $\eps$ is on the order of $\sqrt{n}$.
We are thus led to the scaling $\eps=\lam\sqrt{n}$, or
\beq\label{aa5} x = n+\lam\sqrt{n} \eeq
We formally make this substitution in the integral (\ref{stirxn}).
Further we take the factor $y(n)=n^ne^{-n}$ outside the integral
so that now the function has maximal value one. We have scaled
both axes. The scaled function is
\beq\label{aa6} g_n(\lam) = \frac{y(n+\lam\sqrt{n})}{y(n)}
= (1+\lam n^{-1/2})^ne^{-\lam n} \eeq
and we find
(noting that $dx = \sqrt{n}d\lam$)
\beq\label{aa7}
\int_0^{\infty}x^ne^{-x}dx = n^ne^{-n}\sqrt{n}
\int_{-\sqrt{n}}^{+\infty}g_n(\lam)d\lam
\eeq
The Taylor Series with error term gives
\beq\label{aa8} \ln(1+\eps) = \eps - \frac{1}{2}\eps^2 + O(\eps^3) \eeq
as $\eps\ra 0$.
Let $\lam$ be an arbitrary but fixed real number. Then $\lam n^{-1/2}\ra 0$
so that
\beq\label{aa9} n\ln(1+\lam n^{-1/2}) -\lam n^{1/2}= \lam n^{1/2} -\frac{1}{2}\lam^2
+ o(1) -\lam n^{1/2} = -\frac{1}{2}\lam^2 + o(1) \eeq
and
\beq \label{aa10} g_n(\lam) \ra e^{-\lam^2/2} \eeq
That is, when properly scaled, the function $y=x^ne^{-x}$ looks like
the bell shaped curve!
Now we would {\em like} to say
\beq \label{aa11} \lim_n
\int_{-\sqrt{n}}^{+\infty}g_n(\lam)d\lam =
\int_{-\infty}^{\infty}e^{-\lam^2/2}d\lam = \sqrt{2\pi} \eeq
Interchanging limits in the integration of a sequence of functions
requires justification. We quote a classic result:
\noi {\bf Arzela's Theorem:} Let $f_n$ be a sequence of Riemann
integrable functions on an interval $[a,b]$. Suppose $f$ is
also Riemann integrable on $[a,b]$ and
$f_n(\lam)\ra f(\lam)$ for each $\lam\in [a,b]$. Suppose further
there is a constant $K$ so that $|f_n(x)|\leq K$ for all $n$ and
all $x\in [a,b]$. Then $\lim_n\int_a^bf_n(x)dx = \int_a^b f(x)dx$.
In our examples, however, the limits of integration are either
infinity or approaching infinity in $n$. We use the following
extension:
\noi {\bf Extended Arzela's Theorem:} Let $f_n$ be a sequence of Riemann
integrable functions on the real line. Suppose
$f_n(\lam)\ra f(\lam)$ for each real $\lam$. Suppose $f$ is Riemann
integrable on the real line and that $\int_{-\infty}^{\infty}f(x)dx$
exists. Suppose further that for all $L$
there is a constant $K$ so that $|f_n(x)|\leq K$ for all $n$ and
all $x\in [-L,+L]$. Suppose further that for all $\eps>0$ there
exists an $L$ and an $n_0$ so that for all $n\geq n_0$
\beq\label{aa12} |\int_L^{+\infty} f_n(x)dx| < \eps \mbox{ and }
|\int_{-\infty}^{-L} f_n(x)dx| < \eps \eeq
Then $\lim_n\int_{-\infty}^{\infty}f_n(x)dx =
\int_{-\infty}^{\infty}f(x)dx$.
In our instance the functions $g_n(\lam)$ have domain $[-\sqrt{n},\infty)$
but we can extend them to the full real line by simply defining
$g_n(\lam)=0$ for $\lam<-\sqrt{n}$. As we have normalized by dividing
$y_n(n+\lam \sqrt{n})$ by its maximal value $y_n(n)$ we have
$g_n(\lam)\leq 1$ for all $n$ and all $\lam$. (Clearly, all functions
are nonnegative as well.)
It remains to bound the ``tail" of the functions $g_n$. Here we can
employ rough upper bounds for the integrals as we just need to show
that they approach zero appropriately. The technical difficulty is
that the estimate of $\ln(1+\eps)$ by $\eps-\frac{1}{2}\eps^2$ is
only valid for $\eps$ small and we require bounds that work for all
$\eps$. The following specific bounds are often useful:
\beq\label{aa13} \ln(1+\eps) \leq \eps - \frac{1}{2}\eps^2
\mbox{ when } -1 < \eps \leq 0 \eeq
\beq\label{aa14} \ln(1+\eps) \leq \eps - \frac{1}{4}\eps^2
\mbox{ when } 0 < \eps \leq 1 \eeq
\beq \label{aa15}
\ln(1+\eps) \leq 0.7\eps \mbox{ when } \eps > 1 \eeq
Applying (\ref{aa13}) to (\ref{aa6}) we see that for $0>\lam>-\sqrt{n}$
\beq\label{aa16}
\ln g_n(\lam)\leq n(\lam n^{-1/2}-\frac{1}{2}\lam^2n^{-1})-\lam\sqrt{n}
= -\frac{1}{2}\lam^2
\eeq
for all $n$ so that
\beq\label{aa17} \int_{-\infty}^{-L}g_n(\lam)d\lam \leq
\int_{-\infty}^{-L}e^{-\lam^2/2}d\lam \eeq
which does go to zero with $L$. That is, on the negative side
the function $g_n(\lam)$ is uniformly bounded by the
bell shaped curve.
The positive side is slightly more complex. We split
$\int_{L}^{\infty}g_n(\lam)d\lam = I_1+ I_2$ with
\beq\label{aa18} I_1 =
\int_{L}^{\sqrt{n}}g_n(\lam)d\lam \mbox{ and }
I_2= \int_{\sqrt{n}}^{\infty} g_n(\lam)d\lam \eeq
From (\ref{aa14},\ref{aa15}) respectively we have
\beq\label{aa19}
I_1 \leq \int_L^{\infty} e^{-\lam^2/4}d\lam \eeq
\beq\label{aa20}
I_2 \leq \int_{\sqrt{n}}^{\infty} e^{-0.3n\lam}d\lam \eeq
Both integrals go uniformly to zero and hence
so does their sum.
Hence the conditions for the extended Arzela's Theorem
are met, (\ref{aa11}) has been justified, $I_n$ has been
asymptotically evalueated and Stirling's
Formula has been proven.
Observe that $I_2$ is actually {\em extremely} small, on
the order of $\exp[-\Theta(n^{3/2})]$. We employed
a rather crude bound (\ref{aa15}) to bound it. This
embodies two general principles
\bqo $\bullet$ Crude upper bounds can be used for negligible terms
as long as they stay negligible.\eqo
\bqo $\bullet$ Terms that are extremely small often require quite
a bit of work. \eqo
\subsection{Approximating Sums by Integrals}
\bqo I was interviewed in the Israeli Radio for five minutes and I said that more
than 2000 years ago, Euclid proved that there are infinitely many primes.
Immediately the host interrupted me and asked ``Are there still infinitely
many primes?''
-- Noga Alon \eqo
Our object here will be to estimate the logarithm of $n!$
via the formula
\beq\label{ab1} S_n:= \ln(n!) = \sum_{k=1}^n\ln(k) \eeq
The notion is that $S_n$ should be close to the integral of
the function $\ln(x)$ between $x=1$ and $x=n$. We set
\beq \label{ab2}
I_n:= \int_1^n\ln(x)dx = x\ln(x)-x]^n_1 = n\ln(n)-n+1 \eeq
Let $T_n$ be the value for the approximation of the integral
$I_n$ via the trapezoidal rule, using step sizes one. So
\beq\label{ab3}
T_n = \frac{1}{2}\ln(1)+\sum_{k=2}^{n-1}\ln(k) + \frac{1}{2}\ln(n)
= S_n - \frac{1}{2}\ln(n) \eeq
Set $E_n=I_n-T_n$, the error when approximating the integral of $\ln(x)$
by the trapezoidal rule. For $1\leq k\leq n-1$ let $S_k$ denote
the ``sliver" of area under the curve $y=\ln(x)$ for $k\leq x\leq k+1$
but over the straight line between $(k,\ln(k))$ and $(k+1,\ln(k+1)$.
The curve is over the straight line as the curve is concave. Then
$E_n= \sum_{k=1}^{n-1}\mu(S_k)$ where $\mu$ denotes the area.
The $\mu(S_k)$ are all positive so the sequence $E_n$ is an increasing
one. Now we bound $\mu(S_k)$ from above. As $\ln(x)$ is concave its
derivative is decreasing and so is always between $\frac{1}{k+1}$ and
$\frac{1}{k}$ in $k\leq x\leq k+1$. Thus
\beq\label{ab4}
\ln(k) + \frac{1}{k+1}(x-k) \leq \ln x \leq \ln(k) + \frac{1}{k}(x-k)
\eeq
Between $x=k$ and $x=k+1$ the curve $y=\ln(x)$ therefore lies below
the straight line $y=\ln(x)+\frac{1}{k}(x-k)$. As
$\ln(k+1)\leq \ln(k) + \frac{1}{k}$ ($x=k+1$ in (\ref{ab4})) the
trapezoidal line from $(k,\ln(k)$ to $(k+1,\ln(k+1))$ lies over
the straight line $y=\ln(x)+\frac{1}{k+1}(x-k)$. Thus the sliver
$S_k$ is contained in the triangle, call it $\Delta_k$,
created by the these two lines
cut off at $x=k$ (where they meet) and $x=k+1$. Consider the
line segment from $(k+1,\ln(k)+\frac{1}{k+1})$ to
$(k+1,\ln(k)+\frac{1}{k})$ as the base of $\Delta_k$.
The base is length $\frac{1}{k} -\frac{1}{k+1}$ and the altitude is one.
Thus
\beq\label{ab5} \mu(S_k)\leq \mu(\Delta_k) \leq \frac{1}{2}(\frac{1}{k}-
\frac{1}{k+1}) \eeq
We return to the errors $E_n$. We have
\beq\label{ab6} \sum_{k=1}^{\infty}\mu(S_k) \leq
\sum_{k=1}^{\infty}\frac{1}{2}(\frac{1}{k}-\frac{1}{k+1}) = \frac{1}{2}
\eeq
Critically, the sum converges. As the $E_n$ form an increasing sequence
there must be a real number $c$, $0\leq c\leq \frac{1}{2}$,
so that $E_n\ra c$ as $n\ra\infty$.
That is, we may write $E_n=c-o(1)$ and $I_n=T_n+c+o(1)$ and so
\beq\label{ab7}
T_n = I_n-c-o(1) = n\ln(n)-n+1-c-o(1) \eeq
\beq\label{ab8}
S_n = T_n+\frac{1}{2}\ln(n) = n\ln(n)-n+\frac{1}{2}\ln(n)+1-c-o(1) \eeq
Taking exponentials of both sides we find
\beq\label{ab9}
n! \sim n^ne^{-n}+\sqrt{n}+e^{1-c} \eeq
where the constant $e^{1-c}$ lies between $\sqrt{e}$ and $e$. This
method does not give the
actual value of the constant which as we have seen, is $\sqrt{2\pi}$.
\section{Notes on Asymptotics}
{\em Current Day Annotation:} Notes for students.
Lets start with the Taylor Series
\beq \ln(1-\eps)=-\eps - \frac{\eps^2}{2} - \frac{\eps^3}{3} \ldots \eeq
valid for $|\eps|<1$ though we will only be interested in $\eps$ small
positive. This is too much information so we cut it down in a variety
of ways:
\beq \ln(1-\eps) \sim -\eps \mbox{ when } \eps=o(1) \eeq
and with error terms
\beq \ln(1-\eps) = -\eps + O(\eps^2) \eeq
\beq \ln(1-\eps) = -\eps - \frac{\eps^2}{2} + O(\eps^3) \eeq
\beq \ln(1-\eps) = -\eps - \frac{\eps^2}{2} - \frac{\eps^3}{3} + O(\eps^4) \eeq
These will suffice for our purposes.
Now lets examine the asymptotics of ${n\choose k}$ when $n,k\ra\infty$.
We write:
\beq {n\choose k} = \frac{(n)_k}{k!} \sim \frac{n^ke^k\sqrt{2k\pi}}{k^k}A
\eeq
where we set
\beq A:= \frac{(n)_k}{n^k} = \prod_{i=0}^{k-1} (1-\frac{i}{n}) \eeq
So if we get $A$ we get the binomial coefficient.
It is more convenient to work with
\beq B:= \ln A =
\sum_{i=0}^{k-1} \ln(1-\frac{i}{n}) \eeq
For $k=o(n)$ we have
\beq B \sim
\sum_{i=0}^{k-1} -\frac{i}{n}\sim -\frac{k^2}{2n} \eeq
and thus we can write
\beq A = e^{-\frac{k^2}{2n}(1+o(1))} \eeq
This does not give the full asymptotics of $A$ as the $1+o(1)$ is
in the exponent. We go further as follows:
\beq B=
\sum_{i=0}^{k-1} -\frac{i}{n}+ O(i^2n^{-2})= -\frac{k^2}{2n}
+ O(k^3n^{-2}) \eeq
So {\em if} $k=o(n^{2/3})$, $B=-\frac{k^2}{2n}+ o(1)$ and we
have the asymptotic formula
\beq A = e^{-\frac{k^2}{2n}}(1+o(1)) \eeq
In particular:
\beq \mbox{ If } k=o(n^{1/2}) \mbox{ then } A\sim 1 \eeq
\beq \mbox{ If } k\sim cn^{1/2} \mbox{ then } A\sim e^{-\frac{c^2}{2}} \eeq
{\em If} $k=o(n^{3/4})$ we go to the next approximation:
\beq B=
\sum_{i=0}^{k-1} -\frac{i}{n}-\frac{i^2}{2n^2} + O(i^3n^{-3})= -\frac{k^2}{2n}
-\frac{k^3}{6n^2} +
O(k^4n^{-3}) \eeq
and the error term is $o(1)$ so that we have the asymptotic formula
\beq A = e^{-\frac{k^2}{2n}}e^{-\frac{k^3}{6n^2}}(1+o(1)) \eeq
In particular (this case will come up a number times)
\beq \mbox{ If } k\sim cn^{2/3} \mbox{ then } A\sim
e^{-\frac{k^2}{2n}}e^{-\frac{c^3}{6}} \eeq
BTW, the inequality
\beq \ln(1-\eps)< -\eps \mbox{ or, equivalently } 1-\eps < e^{-\eps} \eeq
is valid for all $\eps\in (0,1)$ and can be pretty handy.
\section{Summing Over Primes $\leq n$}
\bqo $317$ is a prime, not because we think so, or because our minds are shaped in
one way rather than another, but {\em because it is so}, because mathematical
reality is built that way.
-- G.H. Hardy
\eqo
{\em Current Day Annotation:} Notes for students.
Let $f(x)$ be some reasonable (e.g.: $f(x)=x^{-1}$ or $f(x)=\sqrt{x}$) function
on the positive integers.
Here we see how to asymptotically (as $n\ra\infty$) evaluate
$\sum_{p\leq n}f(p)$,
where the sum is restricted to {\em primes} $p\leq n$. As usual, we
let $\pi(x)$ denote the number of primes $\leq x$. We shall use the
Prime Number Theorem
\[ \pi(x) \sim \frac{x}{\ln x} \]
The key is the following identity:
\[ \sum_{p\leq n}f(p)=\sum_{x=2}^n f(x)(\pi(x)-\pi(x-1)) =
f(n)\pi(n) + \sum_{x=2}^{n-1}\pi(x)(f(x)-f(x+1)) \]
Lets look at the particular (and important) case $\sum_{p\leq n}\frac{1}{p}$.
So $f(x)=\frac{1}{x}$. The first term $f(n)\pi(n)\sim \frac{1}{n}\frac{n}
{\ln n}= o(1)$, which we shall soon see is negligible. The second term
\[ \sum_{x=2}^{n-1}\pi(x)(f(x)-f(x+1)) \sim \sum_{x=2}^{n-1}\frac{x}{\ln x}
\frac{1}{x(x+1)} \sim \sum_{x=2}^{n-1} \frac{1}{x\ln x} =
\ln\ln n + O(1) \]
Thus
\[ \sum_{p\leq n}\frac{1}{p} = \ln\ln n + O(1) + o(1) = \ln\ln n + O(1) \]
Lets take another example: $\sum_{p\leq n} p^2$, so $f(x)=x^2$. Here the
first term $n^2\pi(n) \sim\frac{n^3}{\ln n}$ which we shall soon see is {\em not}
negligible. The second term
\[ \sum_{x=2}^{n-1}\pi(x)(f(x)-f(x+1)) \sim \sum_{x=2}^{n-1}\frac{x}{\ln x}
(-2x) \sim -\frac{2}{3}\frac{n^3}{\ln x} \]
The last sum is not immediate. Take, say, $A=n\ln^{-1}n$ and split the sum
into $2\leq x < A$ and $A \leq x \leq n-1$. We bound the first part from
above by ignoring the denominator $\ln x$, so it is $\leq \sum_2^A (-2x^2)
= O(A^3) = o(n^3\ln^{-1}n)$ while in the range of the second part the
denominator $\ln x \sim \ln n$ so that the sum is asymptotic to
$\ln^{-1}n \sum_A^{n-1}(-2x^2) \sim -\frac{2}{3}n^3\ln^{-1}n$.
Altogether $\sum_{p\leq n} p^2 \sim \frac{1}{3}n^3\ln^{-1}n$.
This method is actually far more general. Let $S$ be an infinite set
of positive integers and let $\pi_S(n)$ be the number of $y\in S$
with $1\leq y\leq n$. Let $f(n)$ be some reasonable function on the
positive integers.
The general problem is to find the asymptotics (as $n\ra\infty$) of
\[ \sum_{x\leq n, x\in S} f(x) \]
When $1\not\in S$ (otherwise add the term $f(1)$) the precise formula is
\[ \sum_{x\leq n, x\in S} f(x) = f(n)\pi_S(n) +
\sum_{x=2}^{n-1}\pi_S(x)(f(x)-f(x+1)) \]
Frequently, an asymptotic formula for $\pi_S(n)$ and the estimate
$f(x)-f(x+1)$ by $-f'(x)$ will yield the desired asymptotics.
\section{Paul Erd\H{o}s}
\bqo To me, it does not seem unlikely that on some shelf of the universe
there lies a total book. I pray the unknown gods that some man -
even if only one man, and though it have been thousands of years
ago! - may have examined and read it. If honor and wisdom and
happiness are not for me, let them be for others. May heaven
exist, though my place be in hell. Let me be outraged and annihilated,
but may Thy enormous Library be justified, for one instant, in one
being.
-- from The Library of Babel by Jorge Luis Borges
\eqo
{\em Current Day Annotation} Reprinted with the kind permission
of the J\'anos Bolyai Society, this piece originally appeared
in a volume commemorating Paul Erd\H{o}s's eightieth birthday.
Erd\H{o}s was, is, and will be the center of my mathematical
life, as he is for so many others.
\begin{center} {\Large FOR THE CLASS OF $'68$} \end{center}
\begin{quote}
The search for truth is more precious than its possession
{\em Einstein} \end{quote}
Paul's memory for dates always amazes. ``It was in the Journal
of the London Math Society, 1949, '' he'll say, and there it
will be. For one anecdote though I too recall the date, April
1970, as my firstborn had a fetal role. Paul was the principal
lecturer at a meeting of the New York Academy of Sciences. He
and his nonagenarian mother had a suite at a New York hotel.
When my bride MaryAnn and I arrived there was already a goodsized
group of mathematicians hard at work. Paul's mother, diligently
learning her fourth language, English, took MaryAnn into the
other room and I joined the mathematical conversation. Or
rather, conversations, as the ten of us formed three distinct
subgroups in (if memory serves) Number Theory, Set Theory, and
Combinatorics. Three discussions were occuring simultaneously,
conjectures and theorems were flying thick and fast. Paul was
at the apex of this trialogue, leading and contributing to all
groups at once. It was, one well recalls, a heady moment for
a budding young Combinatorialist. We'd been at this for perhaps
half an hour and I confess to having forgotten the ladies in
the other room. But Paul had not. He suddenly turned and called
to his mother in rapid Hungarian. In her conversation with MaryAnn
there was some problem with her English and Paul was explaining
the correct usage. It appears there was yet a fourth simultaneous
conversation.
\par Those were tumultuous times. In my land the Vietnam war
enraged: Amerika the villain. The revolution of 1989 arrived
in Eastern Europe but history slipped for a generation, our
generation. French youth had the gall to try to change the
world. ``Do your own thing'' was the admonition that resonated
so powerfully. Resist Authority. Nonconformity was the
supreme virtue. This was the backdrop for our first
collaborations with Uncle Paul. But while others spoke
constantly of it, nonconformity was always Paul's
modus operandi. He had no job; he worked constantly.
He had no home; the world was his home. Possessions were
a nuisance, money a bore. Paul lived, lives, on a web of trust,
travelling ceaselessly from Center to Center spreading his mathematical
pollen. ``Prove and Conjecture!'' was, and is, his constant
refrain.
\par Were we, in those halcyon days, students of Uncle Paul. I
think the word inadequate and inaccurate. Better to say that we
were {\em disciples} of Paul Erd\H{o}s. We (and the list is
long indeed) had energy and talent. Paul, through his actions
and his theorems and his conjectures and every fibre of his
being, showed us the Temple of Mathematics. The Pages of the
Book were there, we had only to open them. Did there, for every
$k,r>0$, exist a graph $G$ which when $r$-edge colored necessarily
yielded a monochromatic $K_k$ and yet had clique number merely $k$
itself? We had no doubts - the answer was either Yes or No.
The answer was in The Book. Pure thought, our thought, would
allow its reading.
\par With maturity we've learned that The Book did not open at random.
Paul was showing us the way. The conjectures were structured, the
Pages were forming Sections and Chapters. Now its custodianship
passes to us. ``Future Directions of $X$ Theory'' are our choice
to make. Can we give to our students the passion that Paul gave
to us. Paul is a unique point, imitation will necessarily fall
short. We can give our $\epsilon$, it is an effort well worth
making.
\par Now Paul: let $A_i\subset\{1,\ldots,n^2/2\}$, $1\leq i\leq m$,
be random $n$-sets. With $m=cn^22^n$, in your Acta. Math. Hungarica
paper in 1964, Property $B$ almost surely failed. Suppose instead
$m