We are part of the Computer Science Department at New York University. Our goal is to understand important societal, social and human indicators based on the network characteristics of how humans, physical and digital entities interact and capture the patterns and the potential causal relationships between these interactions. Our longer term vision is to build an automated analytics engine that can learn and infer socio-economic phenomena from highly diverse and noisy data sources, like news, social media, traffic etc.
We are also interested in understanding user behavior in online systems such as crowdsourcing platforms, recommender systems and review websites. One of our goals is to design efficient algorithms to identify similar users in recommender systems based on their stated preferences for different content. This can be really beneficial for improved user experience as a result of better recommendation and personalization. We have also designed algorithms to identify trustworthy workers in micro-task crowdsourcing platforms like Amazon Mechanical Turk, which enables the task requesters to filter out low quality responses.
Below is a list of the projects that we are working on currently:
We study the problem of aggregating noisy labels from crowd workers to infer the underlying true labels of binary tasks. Unlike most prior work
which has examined this problem under the random worker paradigm, we consider
a much broader class of adversarial workers with no specific assumptions on their
labeling strategy. Our key contribution is the design of a computationally efficient
reputation algorithm to identify and filter out these adversarial workers in crowd-
sourcing systems. Our algorithm uses the concept of optimal semi-matchings
in conjunction with worker penalties based on label disagreements, to assign a
reputation score for every worker. We provide strong theoretical guarantees for
deterministic adversarial strategies as well as the extreme case of sophisticated
adversaries where we analyze the worst-case behavior of our algorithm. Finally,
we show that our reputation algorithm can significantly improve the accuracy of
existing label aggregation algorithms in real-world crowdsourcing datasets.
Event Analytics from News Data
The goal is to understand important socio-economic
indicators based on news and other data available on the Web.
This involves using text mining techniques to extract events from online news articles and
learning networks of real world events and show how such
event networks can be used in the prediction of real-world social phenomena like drought, price
variations and disease outbreaks. Our longer term vision is to build an automated analytics engine
to learn and infer socio-economic phenomena from highly diverse
and noisy data sources from the Web.
Sentiment Analysis on Financial Articles
Here we try to analyze the effects of FOMC communications, like their meeting minutes, statements, press conferences etc on interest rates.
Satellite Image Analysis to Detect Changing Land Patterns
Changing patterns in agricultural land availability
is one of the fundamental problems that impacts food
security in developing regions like India. We implemented a tool that can analyze satellite images of a region to compute temporal changes in land patterns for categories like arable, developed, water bodies etc.
Our goal is to design mechanisms to detect the state of traffic congestion in and around critical congestion areas and also design simple preventive mechanisms to prevent critical congestion areas from hitting congestion collapse.
Education projects are themed around collecting and organizing online contents for supplementing classroom teaching in developing countries and generating offline educational portals for regions with poor internet connectivity.