Community Structure in Large Social and Information Networks
Speaker: Michael W. Mahoney, Stanford University
Location: Warren Weaver Hall 1302
Date: November 12, 2010, 11:30 a.m.
Host: Mehryar Mohri
The concept of a community is central to social network analysis, and thus a large body of work has been devoted to identifying community structure. For example, a community may be thought of as a set of web pages on related topics, a set of advertisers in a similar economic market, or more generally as a set of nodes in a network more similar amongst themselves than with the remainder of the network. Motivated by difficulties we experienced at actually finding meaningful communities in very large real-world networks, we have performed a large scale analysis of a wide range of social and information networks. Our main methodology used local spectral methods and involved computing isoperimetric properties of the networks at various size scales---a novel application of ideas from statistics and scientific computation to internet data analysis, which required the development of new algorithmic tools, as well as the reinterpretation of the statistical basis underlying traditional spectral approximation algorithms.
Our empirical results suggest a significantly more refined picture of community structure than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small size scales, and at larger size scales, the best possible communities gradually ``blend in'' with the rest of the network and thus become less ``community-like.'' This behavior is not explained, even at a qualitative level, by any of the commonly-used network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are well-embeddable in a low-dimensional structure, and from small social networks that have served as testbeds of community detection algorithms. Possible mechanisms for reproducing our empirical observations will be discussed, as will implications of these findings for clustering, classification, and more general data analysis in modern large social and information networks.
Michael Mahoney is at Stanford University. His research interests focus on theoretical and applied aspects of algorithms for large-scale data problems in scientific and Internet applications. Currently, he is working on geometric network analysis; developing approximate computation and regularization methods for large informatics graphs; and applications to community detection, clustering, and information dynamics in large social and information networks. In the past, he has worked on randomized matrix algorithms and applications in genetics and medical imaging. He has been a faculty member at Yale University and a researcher at Yahoo, and his PhD was is computational statistical mechanics at Yale University.
Refreshments will be offered starting 15 minutes prior to the scheduled start of the talk.