Presence of large corpus of online articles provides a unique opportunity to understand and scrutinize the events happening in the real world. Event extraction from unstructured data such as news corpora can help many knowledge based systems including building predictive models, risk analysis applications, and enabling new policy making tools. We have defined a novel framework to define and extract events from a large news corpora. This new definition can enable news events be used in a variety of applications, such as predicting food price fluctuations and disease outbreak.

News Event Model

This is a new way of defining and extracting events from online news articles. This is generic and can capture different kinds of events which are not possible with the existing frameworks. To have a complete coverage of events from a news corpora the representation scheme of events needs to be more flexible and generic. Our model assumes that every single news article is about one single event and from this assumption it automatically tries to learn type of events from a large corpus of news articles. Our model intuitively defines an event as a combination of a central event the articles is describing and some subsidiary events that have some association with the central event and have been described in the articles. Based on the occurrences of such events in the entire corpus our model learns the different event types and the most likely subsidiary events. In addition, the model tries to combine the more probable locations and time for an event. Although many events are not dependent upon time and location, it tries to associate the most likely set of locations and time based on the past occurrences.

Event Phenomenon Graph

Event phenomenon graph is the precursor to building event-driven predicting models. Event phenomenon graph connects the response variable, such as food price or disease outbreak to the news events and try to find out the subset of events which shows some association with each other. This subset of news event is used to build the predictive model for the response variable. Building this graph is a unique task that connects unstructured news data with structured data of an observed phenomenon.

Event-driven Prediction

There are many socio-economic variables and indicators are sensitive to real world events. However, most of the models to understand their variability and to forecast their values do not consider this fact. One major application for the News Event Model is to build predictive models for a variety of external variables using news events as features.

Food Price Price Prediction

We used a corpus of seven years of news articles from India and daily food prices of 15 different crops across India. We used the news event model and to extract events from news, built event separate phenomenon graph for each crop to build event-driven predictive models to forecast for food price. Our experiments show that event-driven predictive model perform better than traditional models, such as ARIMA.

Disease Outbreak Prediction

Our event model has been successfully applied in predicting disease outbreaks in India. We used a similar news corpus and targeted four different diseases and showed that news events can be used effectively to understand outbreaks of certain diseases.


  • Sunandan Chakraborty, Lakshminarayanan Subramanian, (2016). "Extracting Signals from News Streams for Disease Outbreak Prediction", GlobalSIP 2016, Washnigton DC, USA
  • Sunandan Chakraborty, Ashwin Venkataraman, Srikanth Jagabathula, Lakshminarayanan Subramanian, (2016). "Predicting Socio-Economic Indicators using News Events", KDD 2016, San Francisco, USA