Overview

We are part of the Computer Science Department at New York University. Our goal is to understand important societal, social and human indicators based on the network characteristics of how humans, physical and digital entities interact and capture the patterns and the potential causal relationships between these interactions. Our longer term vision is to build an automated analytics engine that can learn and infer socio-economic phenomena from highly diverse and noisy data sources, like news, social media, traffic etc.

We are also interested in understanding user behavior in online systems such as crowdsourcing platforms, recommender systems and review websites. One of our goals is to design efficient algorithms to identify similar users in recommender systems based on their stated preferences for different content. This can be really beneficial for improved user experience as a result of better recommendation and personalization. We have also designed algorithms to identify trustworthy workers in micro-task crowdsourcing platforms like Amazon Mechanical Turk, which enables the task requesters to filter out low quality responses.

Below is a list of the projects that we are working on currently:

Reputations in Crowdsourcing

We study the problem of aggregating noisy labels from crowd workers to infer the underlying true labels of binary tasks. Unlike most prior work which has examined this problem under the random worker paradigm, we consider a much broader class of adversarial workers with no specific assumptions on their labeling strategy. Our key contribution is the design of a computationally efficient reputation algorithm to identify and filter out these adversarial workers in crowd- sourcing systems. Our algorithm uses the concept of optimal semi-matchings in conjunction with worker penalties based on label disagreements, to assign a reputation score for every worker. We provide strong theoretical guarantees for deterministic adversarial strategies as well as the extreme case of sophisticated adversaries where we analyze the worst-case behavior of our algorithm. Finally, we show that our reputation algorithm can significantly improve the accuracy of existing label aggregation algorithms in real-world crowdsourcing datasets. Banner Image

Event Analytics from News Data

The goal is to understand important socio-economic indicators based on news and other data available on the Web. This involves using text mining techniques to extract events from online news articles and learning networks of real world events and show how such event networks can be used in the prediction of real-world social phenomena like drought, price variations and disease outbreaks. Our longer term vision is to build an automated analytics engine to learn and infer socio-economic phenomena from highly diverse and noisy data sources from the Web. Banner Image

Sentiment Analysis on Financial Articles

Here we try to analyze the effects of FOMC communications, like their meeting minutes, statements, press conferences etc on interest rates.

Satellite Image Analysis to Detect Changing Land Patterns

Changing patterns in agricultural land availability is one of the fundamental problems that impacts food security in developing regions like India. We implemented a tool that can analyze satellite images of a region to compute temporal changes in land patterns for categories like arable, developed, water bodies etc. Banner Image

Traffic

Our goal is to design mechanisms to detect the state of traffic congestion in and around critical congestion areas and also design simple preventive mechanisms to prevent critical congestion areas from hitting congestion collapse.

Education

Education projects are themed around collecting and organizing online contents for supplementing classroom teaching in developing countries and generating offline educational portals for regions with poor internet connectivity.

Big Data Group

Overview