If you would like to give a talk in our seminar, please email me at: Oded Regev

Link to subscribe to the Theory Seminar mailing list.

Theory Seminar

Usual coordinates:
Friday, 2:00PM
Room 1314
Warren Weaver Hall
251 Mercer Street

Calendar

Upcoming talks

Friday, May 11
Time : 2PM
Room 1314, Warren Weaver Hall

Pravesh Kothari(Princeton)

Title: The Sum-of-Squares Approach to Unsupervised Machine Learning

Abstract:

Influential works over the past decade have led to an essentially complete understanding of basic problems in unsupervised machine learning such as compressed sensing and matrix completion. The efficient algorithms that achieve the strongest possible guarantees for such problems are based on convex relaxations such as linear and semi-definite programming.

Such a detailed understanding, however, has not been possible for other fundamental problems such as clustering. This is in part because coming up with and analyzing the guarantees of the relevant convex relaxation has relied on sophisticated, creative ``leaps" that are problem-specific (such as guessing and analyzing a clever ``dual" solution). Applying such a blueprint to other settings of interest is thus not immediately clear.

The goal of this talk is to illustrate the ``sum-of-squares method": a recently developed approach that gives a natural and intuitive ``blueprint" for both coming up with and analyzing the ``right" convex relaxation for a number of problems in unsupervised machine learning, including the ones mentioned above. Key to this paradigm is a surprising reduction of efficient algorithm design to finding "simple" proofs of ``identifiability" - a proof that uses the given data to _verify_ that a purported solution is indeed the correct one. In recent years, this single algorithmic paradigm has yielded sharp guarantees for a number of basic problems in unsupervised learning.

In this talk, I will explain this approach by means of two examples for which we were able to make substantial progress using the sum-of-squares method: 1) clustering mixtures of spherical Gaussians and 2) estimating moments of distributions in the presence of adversarial outliers.

Based on joint works with Jacob Steinhardt (Stanford) and David Steurer (ETH Zurich).

List of previous talks

Thursday Feb 15
Time : 2PM
Room 3-50, Kaufman Management Center

Ola Svensson (EPFL)

Title: A Constant-factor Approximation Algorithm for the Asymmetric Traveling Salesman Problem

Abstract: We give a constant-factor approximation algorithm for the asymmetric traveling salesman problem. Our approximation guarantee is analyzed with respect to the standard LP relaxation, and thus our result confirms the conjectured constant integrality gap of that relaxation.

Our techniques build upon the constant-factor approximation algorithm for the special case of node-weighted metrics. Specifically, we give a generic reduction to structured instances that resemble but are more general than those arising from node-weighted metrics. For those instances, we then solve Local-Connectivity ATSP, a problem known to be equivalent (in terms of constant-factor approximation) to the asymmetric traveling salesman problem.

This is joint work with Jakub Tarnawski and László Végh.

Friday, March 9
Time : 2PM
Room 1314, Warren Weaver Hall

Euiwoong Lee(NYU)

Title: FPT-Approximation Algorithms for k-Cut and k-Treewidth Deletion

Abstract: Approximation algorithms and fixed-parameter tractable (FPT) algorithms have been two major ways to deal with NP-hardness of combinatorial optimization problems. The notion of FPT approximation can be naturally defined, and is actively studied recently. I will talk about my recent results on FPT approximation algorithms.

- k-cut is the problem where we want to remove the minimum number of edges such that the graph has at least k connected components. We give a (2 – epsilon)-FPT approximation algorithm for some epsilon > 0, improving upon a (non-FPT) 2-approximation.

- Treewidth k-deletion is the problem where we want to remove the minimum number of vertices such that the graph has treewidth at most k. We give an O(log k)-FPT approximation algorithm, which implies the same approximation factor for many other problems. This improves the previous best algorithm that gave f(k)-approximation for some (implicit) function f.

Based on joint works with Anupam Gupta, Jason Li, Pasin Manurangsi, Michal Wlodarczyk.

Friday, March 16
Time : 2PM
Room 1314, Warren Weaver Hall

Shay Moran(Princeton University)

Title: On the expressiveness of comparison queries

Abstract: Comparisons are a classical and well studied algorithmic tool that is used in a variety of contexts and applications.
I will present three examples that manifest the expressiveness of these queries in information theory, machine learning, and complexity theory.

20 (simple) questions. A basic combinatorial interpretation of Shannon's entropy function is via the "20 questions" game. This cooperative game is played by two players, Alice and Bob: Alice picks a distribution π over the numbers {1,…,n}, and announces it to Bob. She then chooses a number x according to π , and Bob attempts to identify x using as few Yes/No queries as possible, on average. An optimal strategy for the "20 questions" game is given by a Huffman code for π: Bob's questions reveal the codeword for x bit by bit. This strategy finds x using fewer than H(π)+1 questions on average. However, the questions asked by Bob could be arbitrary. We investigate the following question: Are there restricted sets of questions that match the performance of Huffman codes? We show that for every distribution π, Bob has a strategy that uses only questions of the form "x<c?" and "x=c?", and uncovers x using at most H(π)+1 questions on average, matching the performance of Huffman codes in this sense.

Active classification with comparison queries. This part concerns an extension of active learning in which the learning algorithm may ask the annotator to compare the distances of two examples from the boundary of their label-class. For example, in a recommendation system application (say for restaurants), the annotator may be asked whether she liked or disliked a specific restaurant (a label query); or which one of two restaurants did she like more (a comparison query). We focus on the class of half-spaces, and show that under natural assumptions, such as large margin or bounded bit-description of the input examples, it is possible to reveal all the labels of a sample of size n using approximately O(logn) queries.

Nearly optimal linear decision trees for k-SUM and related problems. We use the above active classification framework to construct near optimal linear decision trees for a variety of decision problems in combinatorics and discrete geometry. For example, for any constant k, we construct linear decision trees that solve the k-SUM problem on n elements using O(n log n) linear queries. Moreover, the queries we use are comparison queries, which compare the sums of two k-subsets; when viewed as linear queries, comparison queries are 2k-sparse and have only {−1,+1} coefficients. We give similar constructions for sorting sumsets A+B and for solving the SUBSET-SUM problem, both with optimal number of queries, up to poly-logarithmic terms.

Based on joint works with Yuval Dagan, Yuval Filmus, Ariel Gabizon, Daniel Kane, Shachar Lovett, and Jiapeng Zhang.

Tuesday March 27
Time : 11AM
Room : 1314, Warren
Weaver Hall

Vijay Vazirani (UC Irvine)

Title: Planar Graph Perfect Matching is in NC

Abstract: Is matching in NC, i.e., is there a deterministic fast parallel algorithm for it? This has been an outstanding open question in TCS for over three decades, ever since the discovery of Random NC matching algorithms. Within this question, the case of planar graphs has remained an enigma: On the one hand, counting the number of perfect matchings is far harder than finding one (the former is #P-complete and the latter is in P), and on the other, for planar graphs, counting has long been known to be in NC whereas finding one has resisted a solution!

The case of bipartite planar graphs was solved by Miller and Naor in 1989 via a flow-based algorithm. In 2000, Mahajan and Varadarajan gave an elegant way of using counting matchings to finding one, hence giving a different NC algorithm.

However, non-bipartite planar graphs still didn't yield: the stumbling block being odd tight cuts. Interestingly enough, these are also a key to the solution: a balanced odd tight cut leads to a straight-forward divide and conquer NC algorithm. The remaining task is to find such a cut in NC. This requires several algorithmic ideas, such as finding a point in the interior of the minimum weight face of the perfect matching polytope and uncrossing odd tight cuts.

Paper available at: https://arxiv.org/pdf/1709.07822.pdf
Joint work with Nima Anari.

Friday, May 4
Time : 2PM
Room 1314, Warren Weaver Hall

Sam Hopkins (Cornell University)

Title: Spectral Algorithms and SoS Proofs for Planted Problems

Abstract: High-dimensional statistical inference is a central concern of modern computer science, but theoretical understanding of which inference problems are computationally tractable and which are not lags far behind both classical worst-case algorithms/computational complexity and machine-learning practice. This talk will discuss recent efforts towards a more unified theory of algorithmic statistical inference based on meta-algorithms and the Sum of Squares (SoS) method. In particular, we will discuss the quest for problem-and algorithm-independent criteria for algorithmic intractability in the statistical inference setting. The technical portion of the talk will describe (and prove, time-permitting) a recent theorem along these lines characterizing SoS algorithms for some particularly special inference problems (often called planted problems) in terms of simpler spectral algorithms.

Based primarily on joint work with Pravesh Kothari, Aaron Potechin, Prasad Raghavendra, Tselil Schramm, and David Steurer, from FOCS 2017.