Usual coordinates:
Friday, 2:00PM
Room 1314
Warren Weaver Hall
251 Mercer Street
Spring 2016 Schedule
Upcoming talks
Thursday April 28
12:30PM
WWH 905
Gregory Valiant (Stanford)
Title: When your big data seems too small: accurate inferences beyond the empirical distribution
Abstract: We discuss two problems related to the general challenge of making
accurate inferences about a complex distribution, in the regime in
which the amount of data (i.e the sample size) is too small for the
empirical distribution of the samples to be an accurate representation
of the underlying distribution. The first problem we consider is the
following basic learning task: given independent draws from an unknown
distribution over a discrete support, output an approximation of the
distribution that is as accurate as possible in L1 distance (ie total
variation distance). Perhaps surprisingly, it is
often possible to "de-noise" the empirical distribution of the samples
to return an approximation of the true distribution that is
significantly more accurate than the empirical distribution, without
relying on any prior assumptions on the distribution. We present an
instance optimal learning algorithm which optimally performs this
de-noising for every distribution for which such a de-noising is
possible. One curious implication of our techniques is an algorithm
for accurately estimating the number of new domain elements that would
be seen given a new larger sample, of size up to n*log n.
(Extrapolation beyond this sample size is provable information
theoretically impossible, without additional assumptions on the
distribution.) While these results are applicable generally, we
highlight an adaptation of this general approach to some problems in
genomics (e.g. quantifying the number of unobserved protein coding
variants).
The second problem we consider is the task of accurately estimating
the eigenvalues of the covariance matrix of a (high dimensional
real-valued) distribution--the "population spectrum". (These
eigenvalues contain basic information about the distribution,
including the presence or lack of low-dimensional structure in the
distribution and the applicability of many higher-level machine
learning and multivariate statistical tools.) As we show, even in the
regime where the sample size is linear or sublinear in the
dimensionality of the distribution, and hence the eigenvalues and
eigenvectors of the empirical covariance matrix are misleading,
accurate approximations to the true population spectrum are possible.
This talk is based on three papers, which are joint works with Paul
Valiant, with Paul Valiant and James Zou, and with Weihao Kong.
Friday April 29
1:00PM
WWH 905
Ran Raz (Weizmann Institute)
Title: Fast Learning Requires Good Memory: A Time-Space Lower Bound for Parity Learning
Abstract: We prove that any algorithm for learning parities requires either a memory
of quadratic size or an exponential number of samples. This proves a recent conjecture
of Steinhardt, Valiant and Wager and shows that for some learning problems a large
storage space is crucial.
More formally, in the problem of parity learning, an unknown string $x \in \{0,1\}^n$
was chosen uniformly at random. A learner tries to learn $x$ from a stream of samples
$(a_1, b_1), (a_2, b_2)... $, where each $a_t$ is uniformly distributed over $\{0,1\}^n$
and $b_t$ is the inner product of $a_t$ and $x$, modulo 2. We show that any algorithm
for parity learning, that uses less than $n^2/25$ bits of memory, requires an exponential
number of samples.
Previously, there was no non-trivial lower bound on the number
of samples needed, for any learning problem, even if the allowed memory size is $O(n)$
(where $n$ is the space needed to store one sample).
We also give an application of our result in the field of bounded-storage cryptography.
We show an encryption scheme that requires a private key of length $n$, as well as time
complexity of $n$ per encryption/decryption of each bit, and is provenly and unconditionally
secure as long as the attacker uses less than $n^2/25$ memory bits and the scheme is used at
most an exponential number of times. Previous works on bounded-storage cryptography assumed
that the memory size used by the attacker is at most linear in the time needed for encryption/decryption.
Friday May 6
2:00PM
WWH 1314
Michael Saks (Rutgers University)
Title: TBA
Abstract: TBA
Friday May 13
Simons Meeting
Thursday May 19
2:00PM
WWH 1314
Igor Shparlinski (UNSW)
Title: Modular Noisy Polynomial Interpolation and Approximation
Abstract: Motivated by applications to the recently introduced key establishment
protocol HIMMO, we consider the problems of interpolation and approximation
of a polynomial f \in F_p[X], from strings of most significant bits of its evaluation
f(x) (mod p) for several points x. We show that the nature of the domain from
which the points x are chosen plays a crucial roles and analyses the case
when this domain is a short initial interval, which is relevant to HIMMO.
The idea algorithm is based on lattice basis reduction and thus is close to
the algorithm for solving the Hidden Number Problem introduced by Boneh
and Venkatesan in 1996. However its analysis requires several new number
theoretic results on the distribution of polynomials in small boxes which are
of independent interest.
Joint work with Oscar Garcia-Morchon, Ronald Rietman and Ludo Tolhuizen
List of previous talks
Friday February 26
2:00PM
WWH 1314
Siyao Guo (NYU)
Title: Threshold Secret Sharing Requires A Linear Size Alphabet
Abstract: We prove that for every n and 1 < t < n, any t-out-of-n threshold secret
sharing scheme for 1-bit secrets requires share size log(t + 1). Our bound
is tight when t = n − 1 and n is a power of a prime. Combined with a result
of Kilian and Nisan, we obtain a share size lower bound of max{log(n − t +
2), log(t + 1)} ≥ log((n + 3)/2) for every n and 1 < t < n.
Our proof introduces a new semidefinite programming framework for proving
share size lower bounds for general access structures. We show that our
analysis for threshold secret sharing is tight in this framework. In
contrast, we observe that the entropy-based framework of Csirmaz does not
give any non-trivial lower bound for threshold secret sharing of a 1-bit
secret.
Joint work with Andrej Bogdanov and Ilan Komargodski.
Friday March 4
2:00PM
WWH 1314
Pooya Hatami (IAS)
Title: A characterization of functions with vanishing averages over
products of disjoint sets
Abstract: Given \alpha_1,...,\alpha_m\in (0,1), we characterize all integrable
complex functions f on [0,1]^m satisfying \int_{A_1\times ...\times A_m} f
= 0 for any collection of disjoint sets A_1,...,A_m \subseteq [0,1] of
respective (Lebesgue) measures \alpha_1,...,\alpha_m.
We use this characterization to answer a few conjectures from [S. Janson
and V. Sos: More on quasi-random graphs, subgraph counts and graph limits].
joint work with Hamed Hatami and Yaqiao Li
Friday March 25
2:00PM
WWH 1314
Ankit Garg (Princeton)
Title: A deterministic polynomial time algorithm for non-commutative rational identity testing
Abstract: We study the non-commutative rational identity testing problem or the word problem
for the free skew field of non-commutative rational functions. We prove that an existing
algorithm due to Gurvits is actually a deterministic polynomial time algorithm for this
problem (over the field of rationals). Our analysis is simple, providing explicit bounds
on the ``capacity'' measure of completely positive operators, introduced by Gurvits.
This problem has a rich history and has appeared in various forms in a remarkable number
of mathematical and computational areas. Besides non-commutative algebra, it also has various
connections to computational complexity and de-randomization, commutative invariant theory,
quantum information theory and system theory. No special background will be assumed for this talk,
as it can be, the problem, algorithm and analysis can be framed in the language of linear algebra.
This is joint work with Leonid Gurvits, Rafael Oliveira and Avi Wigderson.
Friday April 8
2:00PM
WWH 1314
Noga Ron-Zewi (IAS)
Title: Towards Optimal Deterministic Coding for Interactive Communication
Abstract: We show an efficient, deterministic interactive coding scheme that
simulates any interactive protocol under random errors with nearly optimal parameters.
Specifically, our coding scheme achieves a communication rate of 1 – Õ(\sqrt{epsilon})
and a failure probability of exp(-n/ \log n), where n is the protocol length and each bit
is flipped independently with constant probability epsilon. Prior to our work, all nontrivial
deterministic schemes (either efficient or not) had a rate bounded away from 1.
Furthermore, the best failure probability achievable by an efficient deterministic scheme with
constant rate was only quasi-polynomial, i.e., of the form exp(-polylog(n)) (Braverman, ITCS 2012).
The rate of our scheme is essentially optimal (up to poly-logarithmic factors) by a result of Kol and Raz (STOC 2013).
A central contribution in deriving our coding scheme is a novel code-concatenation scheme,
a notion standard in coding theory that we adapt for the interactive setting. Essential
to our concatenation approach is an explicit, efficiently encodable and decodable linear
tree code of sufficiently good parameters. The fact that our tree code is linear, and in
particular can be made systematic, turns out to play an important role in our concatenation
scheme. We use this tree code as the ``outer code'' in the concatenation scheme.
The necessary deterministic ``inner code'' is achieved by a nontrivial derandomization
of a randomized interactive coding scheme of (Haeupler, STOC 2014).
Joint work with Ran Gelles, Bernhard Haeupler, Gillat Kol and Avi Wigderson
Monday April 18
11:00AM
WWH 505
Visu Makam (University of Michigan)
Title: Rank computation of linear matrices and applications to circuit complexity
Abstract: A matrix whose entries are linear polynomials is called
a linear matrix. Its rank (over the free skew field) can be computed
by substituting large enough generic square matrices for the variables.
However, a good bound on the size of the matrices was not known. We will
show that in fact generic matrices of size (n-1) x (n-1) suffice.
Our bounds have various ramifications to circuits and complexity.
We immediately get a randomized poly-time blackbox algorithm for
non-commutative rational identity testing. In fact, combining our
bounds with some constructions of Ivanyos, Qiao and Subrahmanyam
yields a (non-blackbox) deterministic algorithm. We can also show
that there is no polynomial sized formula for the non-commutative
determinant even if one allows division gates.
Friday April 22
2:00PM
WWH 312
Alex Andoni (Columbia University)
Title: Parallel Algorithms for Geometric Graph Problems
Abstract: Motivated by modern parallel computing models (think MapReduce), we
give a new algorithmic framework for geometric graph problems. Our
framework applies to problems such as the Minimum Spanning Tree (MST)
problem over a set of points in a low-dimensional space, which is
useful for hierarchical agglomerative clustering. Our algorithm
computes a $(1+epsilon)$-approximate MST in a *constant* number of
rounds of communication, while using total space and communication
proportional to the size of the data only.
Our framework turns out to have implications beyond the parallel
models as well. For example, we consider the Earth-Mover Distance
(EMD) problem, for which we obtain a new near-linear time algorithm as
well as the first streaming algorithm (assuming we can pre-sort the
points). Technically, our framework for EMD shows how to effectively
break up a "big Linear Programming problem" into a number of "small
Linear Programming problems", which can be later recombined using a
dynamic programming approach.
Joint work with Aleksandar Nikolov, Krzysztof Onak, and Grigory Yaroslavtsev.