PANDEMIC GAME
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE
AND THE COMMITTEE ON GRADUATE STUDIES
OF NEW YORK UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
MASTER OF SCIENCE
Harshal Patil
May 2010

c Copyright by Harshal Patil 2010
All Rights Reserved
ii

I certify that I have read this dissertation and that, in my opinion, it
is fully adequate in scope and quality as a dissertation for the degree
of Master of Science.
(Prof. Dennis Shasha)
Principal Adviser
I certify that I have read this dissertation and that, in my opinion, it
is fully adequate in scope and quality as a dissertation for the degree
of Master of Science.
(Prof. Alberto Lerner)
Approved for the University Committee on Graduate Studies.
iii

Preface
This thesis describes a game developed in JavaScript whose goal it is to teach the
usage of concepts like correlation, bootstrapping and confidence intervals in real life
scenarios. The game player’s objective is to determine causes of the pandemic in the
shortest time possible, given the symptoms shown by subjects and partial information
about subjects’ exposure to possible causes.
iv

Acknowledgments
I would like to thank my research adviser, Professor Dennis Shasha, for inspiring this
work and especially for his guidance and patience along the way.
v

Contents
Preface
iv
Acknowledgments
v
1
Introduction
1
2
Background
2
2.1
Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2.2
Games Based Learning(GBL) . . . . . . . . . . . . . . . . . . . . . .
3
2.3
Pandemic Game Idea . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
3
Pandemic
4
3.1
Objective
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
3.2
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
3.3
Key Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
3.3.1
Correlation
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
3.3.2
Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
3.3.3
Confidence Interval . . . . . . . . . . . . . . . . . . . . . . . .
8
vi

4
Architecture and Design
9
4.1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
4.2
Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
4.2.1
Scores Module . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
4.2.2
Configuration Module
. . . . . . . . . . . . . . . . . . . . . .
10
4.2.3
Statistics Module . . . . . . . . . . . . . . . . . . . . . . . . .
10
4.2.4
Main Module . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
4.2.5
Correlation Module . . . . . . . . . . . . . . . . . . . . . . . .
10
5
User Interface
14
5.1
Initial Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
5.2
Actual Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
6
Implementation Details
17
7
User Tests
23
8
Conclusion
26
8.1
Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
8.2
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
8.3
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
Bibliography
29
vii

List of Figures
3.1
Calcualting Correlation . . . . . . . . . . . . . . . . . . . . . . . . . .
7
4.1
Use Case Diagram/Entity relationship diagram
. . . . . . . . . . . .
11
4.2
Configuration module . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
4.3
Statistics module . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
4.4
Main module
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
4.5
Correlation module . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
5.1
Initial Configuration Screen . . . . . . . . . . . . . . . . . . . . . . .
15
5.2
Game Play Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
6.1
Parameter Display
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
6.2
Entries After Generating Questions . . . . . . . . . . . . . . . . . . .
19
6.3
Correlation Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
6.4
Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
8.1
Related Games : Woods . . . . . . . . . . . . . . . . . . . . . . . . .
27
8.2
A Game State in Woods . . . . . . . . . . . . . . . . . . . . . . . . .
28
viii

Chapter 1
Introduction
Many students find the subject of statistics unintuitive. Thus it gradually becomes
uninteresting. To make things fun, this game tries to teach it using the good-old
technique of ’Game-Based Learning’.
In this game, the player’s goal is to discover the cause or causes of a pandemic in
the fewest game days possible by asking questions (these could be lab tests in fact)
of individuals who may or may not suffer symptoms. Initially little is known about
which conceivable cause each individual may have been exposed to. Thus, players
must use educated guesses and strategically ask questions to do well.
The game board presents single and two cause correlations as well as confidence
intervals to the user. Better players find such data useful.
This thesis gives an overview of games to teach statistics and then presents the
implementation of our Pandemic game and the user tests we have conducted.
1

Chapter 2
Background
2.1
Statistics
In today’s world, we are faced with many situations where statistics can be applied.
For example, the methods of statistics can be used in explaining the group behaviour
of organisms, marketing research, and other large-scale patterns. A good example is
how scientists infer the behaviour of groups of animals. Scientists can record data
from a group of elephants and determine that a certain percentage of elephant herds
will defend themselves from predators while the rest may run away. This kind of data
can help scientists predict elephant lifestyle and culture.
Several factors contribute to the unpopularity of statistics: it is often one of the
few quantitative courses required for social science majors, who may be less interested
in maths subjects to begin with; it’s not always well-taught; the logic of hypothesis
testing is not terribly intuitive.
2

CHAPTER 2. BACKGROUND
3
2.2
Games Based Learning(GBL)
Game based learning (GBL) has as its goal to learn subject matter through games.
Generally learning games try to insert subject matter into game playing. The theory
is that players retain subject material better when they apply it.
Game based element has become popular because of the imagination factor in-
volved in it which keeps players into that world. Today’s games try to put imagination
into real world. They create a real world scenario and gives you power to change it
according to your liking or create your favourite imaginary characters and put you
into them. These games motivate players and allow them to develop intuition of
consequentiality.
2.3
Pandemic Game Idea
There are many pandemics around the world, including HIV, cholera, influenza, and
plagues. Disease categories are Germ, Plague, Endemic, Epidemic and Pandemic.
Finding the cause or causes of a pandemic can help find cures and can help restrict
the harmful effects of the pandemic to the population already exposed to it. Many
people are interested in pandemics, either because they fear being exposed to them
or they enjoy the thrill of helping others.

Chapter 3
Pandemic
3.1
Objective
The players are supposed to make decisions based on questions answered by a set
of ”subjects.” At the beginning, each subject has answered zero or more questions.
The player can click on a question mark to ”ask” a question. The result is either
Yes (y) or No (n). The player can then guess which potential factor or factors have
caused the pandemic. After guessing a factor/cause, the game simulates that a day
has passed to eliminate that cause. The correctness of the potential cause can only
be determined only after the day required to eliminate the cause has passed.
3.2
Overview
Initially, the player is given a screen to adjust the game parameters. The parameters
include the following: actual causes of the pandemic, delay in showing symptoms,
probability of a cause not showing any symptom, number of people surveyed and
4

CHAPTER 3. PANDEMIC
5
percentage of the data revealed. Once all parameters are selected, the game calculates
the correlation, bootstrap and confidence interval of each cause with the symptom.
This can help in deciding whether a given factor might contribute to the pandemic.
The less the data, the more unreliable these statistics are.
3.3
Key Concepts
3.3.1
Correlation
This is used to determine how well one variable can predict another (i.e. if a linear
relationship exists between the two). The correlation coefficient always lies between
’-1’ and ’1’. Values close to ’1’ indicate a strong positive association, meaning that
as X (a potential cause for the Pandemic Game) increases we expect Y (whether
symptoms show for the Pandemic Game) to increase (positive sloping line). Values
close to ’-1’ indicate a strong negative association, meaning that as X increases we
expect Y to decrease (negative sloping line). A value of zero indicates that there is no
relationship between the variables, that is, that knowing X does not help you predict
Y. The correlation coefficient, r, is computed by comparing the observed covariance
(a measure of how much X and Y vary together) to the maximum possible positive
covariance of X and Y.:
XY
r =
SP

(3.1)
XSSYSS
where XYSP is the sum of products, XSS is the sum of squares for X, and YSS is
the sum of squares for Y:

CHAPTER 3. PANDEMIC
6
N
XYSP =
(xi − X)(yi − Y )
(3.2)
i=1
N
XSS =
(xi − X)2
(3.3)
i=1
N
YSS =
(yi − Y )2
(3.4)
i=1
Let us look closely at XYSP , the numerator in our equation. First, notice that
neither (xi − X) nor (yi − Y ) is squared. So both could be positive, both could be
negative, or one could be positive and one could be negative. This means the product
of these can be either negative or positive. Now let us think about a few scenarios.
Suppose there is a positive relationship between X and Y, so in general as X
increases Y increases. That means in general we would expect that while we are
looking at X’s below the mean of all X’s, we also expect the Y’s we look at, to be
below the mean of all Y’s. So we would get a negative times a negative resulting in
a positive number. We would also expect that when we examine X’s above the mean
of all X’s, we would see Y’s that are also above the mean of all Y’s. So we would
get a positive times a positive, resulting in yet another positive number. In the end
we should get a sum of mostly positive numbers. If you go through the same steps
for a hypothetical negative relationship you would end up summing mostly negative
numbers, resulting in a relatively large negative number. The numerator gives us
our positive or negative association. If there is no relationship between X and Y, we
expect get some negative products and some positive products, the sum of which will
cancel many of these values out. In the denominator,(xi − X) and (yi − Y ) are both

CHAPTER 3. PANDEMIC
7
squared before being multiplied together resulting in a positive number. This means
the denominator is always positive, and (xi − X) and (yi − Y ) cannot cancel each
other out when these products are summed. If X and Y have a strong relationship,
the absolute value of the observed covariance will be close to the maximum possible
positive covariance, yielding answer close to ’1’ or ’- [end may not need]
Figure 3.1: Calcualting Correlation
3.3.2
Bootstrap
The main idea behind bootstrap is, we create new samples of the same size as the
original by choosing values from the original sample uniformly at random and with
replacement. Lets break down the phrase. Uniformly at random means each new
sample element is chosen from the original sample in such a way that every original
sample element has the same chance of being picked. With replacement means that
even though an original sample element has been picked, its chance of getting picked

CHAPTER 3. PANDEMIC
8
again remains the same. Simply put, in forming a new sample (called a bootstrap
sample), we choose uniformly at random on the original sample and may choose some
elements twice or more and some elements no times at all.
3.3.3
Confidence Interval
The confidence interval of an imperfectly repeatable measurement is defined by the
range of values the measurement is likely to take. In re-sampling statistics as in
traditional statistics, this range is commonly defined as the middle 90% (or sometimes
95%) of the possible values. The set of possible values will be based on repeated
random samples of some sort, i.e., a bootstrap.

Chapter 4
Architecture and Design
4.1
Overview
The game is divided into 5 modules: Scores, Configuration, Statistics, Main, Correla-
tion. All these modules are controlled by the Main module. Mostly the Main module
makes changes in other modules, displaying current state of about game state.
4.2
Modules
4.2.1
Scores Module
This module is responsible for fetching the top 5 scores from the database. Its func-
tionality is independent of the Main module. When a new game starts, control is
passed to ’Scores’ and after execution, the ’Main’ module gains the control back.
9

CHAPTER 4. ARCHITECTURE AND DESIGN
10
4.2.2
Configuration Module
This module displays the initial configuration of the game selected by the player.
This decides the course of the game, the difficulty of the game, and the uncertainty
of the data revealed.
4.2.3
Statistics Module
This calculates correlation, bootstrap and confidence interval based on revealed data
(already answered questions) and the symptoms of the corresponding subject. Cor-
relation is calculated by treating ’y’ as ’+1’ and ’n’ as ’-1’ and taking the simple dot
product of the two columns. The confidence interval is computed based on the §3.3.2
of answered questions and the symptoms of the corresponding subjects.
4.2.4
Main Module
This module simulates a survey with randomly generated data. It also makes sure
that the randomness involved in the logic is consistent with the game’s configuration
parameter. For example, if there is a 0 day delay and a real cause has a Y, then the
symptom will also say Yes. During play, the player can either request that certain
questions be answered (five per day) or can guess a cause. After a guess of a cause,
one day elapses to eliminate that cause completely.
4.2.5
Correlation Module
This module shows the correlation (in pairs) of all possible columns calculated after
taking ’OR’ on currently available data of each column.

CHAPTER 4. ARCHITECTURE AND DESIGN
11
Figure 4.1: Use Case Diagram/Entity relationship diagram

CHAPTER 4. ARCHITECTURE AND DESIGN
12
Figure 4.2: Configuration module
Figure 4.3: Statistics module

CHAPTER 4. ARCHITECTURE AND DESIGN
13
Figure 4.4: Main module
Figure 4.5: Correlation module

Chapter 5
User Interface
The User Interface is divided into 2 parts, the first part is the Initial configuration
and the second is the actual game.
5.1
Initial Configuration
There are 5 sliders in this part. Each corresponding to an Initial parameter. Each
slider appears together with a text box, which shows the current value indicated by
the slider. The player can also put a value in text-box instead of moving slider to
a desired value. Once all the values have been selected, the actual game scenario is
loaded.
14

CHAPTER 5. USER INTERFACE
15
Figure 5.1: Initial Configuration Screen
5.2
Actual Game
All modules have been assigned different HTML frames. The Main module resides
in the central frame. Each frame consists of HTML table. Table in ’Main’ shows the
incomplete survey from which the player is trying to guess the cause. The part of
survey which is known to the player has either ’y’ or ’n’ entries and the remaining
part has a ’ ?’. The player may choose to click on any ’ ?’ at the cost of consuming
1/5 of the day. At each click, the player will know whether that ? was a ’y’ or a ’n’.
Once revealed, all the statistics related to that column will also change making the
player’s decision process simpler.
Headings of all the columns are buttons having column-names as their labels. To
guess a cause, the player clicks on a button, and a pop-up appears telling whether
the day used in eliminating that cause was useful or was wasted.

CHAPTER 5. USER INTERFACE
16
Figure 5.2: Game Play Screen

Chapter 6
Implementation Details
This Game is implemented in Javascript.
Javascript is an object-oriented scripting language which is a descendent of EC-
MAScript standard. Its based on Java but runs on client-side web browser. As it
runs locally, it enhances player interface and dynamic websites.
Now we will look at some functions of actual code.
• populateConfig
All the configuration values (e.g., number of causes, days until symptoms show)
that the player chose in the initial screen are put into the ’Config’ module.
Those parameter values are displayed continuously during the whole game.
• generateQuestions
This function will create the main ”survey” table each of whose entries consists
of either a question mark, a y, or n. The generateQuestions function chooses
random causes from all possible causes and sets the symptoms column to be
equal to the OR of these chosen causes. It also generates the data for each
17

CHAPTER 6. IMPLEMENTATION DETAILS
18
Figure 6.1: Parameter Display
subject exposed to all causes when the subject starts showing the symptoms.
Finally, it chooses random positions other than in the symptoms column to put
the desired percentage of ’ ?’.
• displayTable
The Main module described in chapter 4 is handled in this function.
The
displayTable module displays the main playing matrix (the ”survey table”).
Then it calls displayStat for statistics module.
The on-click event is associated to each ’ ?’ and is attached to the updateStat
function. If only one cause is involved, this function displays a correlation row
instead of two-dimensional correlation table. If the number of causes is more
than one, it passes control to the correlation module in order to display the
correlation table.

CHAPTER 6. IMPLEMENTATION DETAILS
19
Figure 6.2: Entries After Generating Questions
• showCorrelationTable
This generates a table in the ’correlation’ module to show the correlation of any
two causes (ORed together) with the symptoms, based only on the entries in
the cause columns that are y or n. The 2 dimensional table is a square matrix
consisting of all possible causes along the rows and columns.
In above game state, lets focus on correlation of ’Rodents’ and ’Tap Water’. For
Subject ’Jecob’, ’y’ in ’Tap water’ and ’n’ in Rodents should effectively produce
’y’ in symptoms, but it is a not matching with ’symptoms shown’. Similarly, a
mismatch for subject ’Emma’. But for subject ’Emily’, we have a match. Thus
based on given entris, we can say that 1 match(1*1) and 2 mismatch(2*-1) in 3

CHAPTER 6. IMPLEMENTATION DETAILS
20
Figure 6.3: Correlation Entry
entris of ’Tap Water’ and ’Rodents’ produce correlation of -0.33.
(1 ∗ 1) + (2 ∗ −1) = −0.33
3
• displayStat
– Bootstrap
Bootstrapping is the process of estimating something about the whole pop-
ulation from a very small portion of that population. There are many kinds
of bootstrapping, but for the purposes of this game, we use sampling with
replacement.
Let’s take the example of tap water in the game scenario of the figure.
Now there are 2 pairs which show the relationship between Tap water and

CHAPTER 6. IMPLEMENTATION DETAILS
21
Figure 6.4: Bootstrapping
symptom. Those four data points (two per pair) give us the ”measured
correlation.” But we need to infer the relationship when all the 5 pairs
are revealed (i.e. all questions about Tap water are answered). Thus we
choose ’with replacement’ 5 pairs from these 2 available pairs. And find the
correlation of this new column. We repeat this procedure a fixed number
of times for every column (in our case its 1000 times). And hence we get
1000 correlations.
– Confidence Interval (involves merge sort)
We use the 1000 correlations found in bootstrapping to figure out the range
of correlations that are statistically consistent with this data. We sort all
these 1000 values and take the middle 95% of the values. The measured
correlation will normally lie inside this interval. The smaller the confidence

CHAPTER 6. IMPLEMENTATION DETAILS
22
interval, the better are the chances that the measured correlation is close
to the true correlation.
• updateStat
Whenever the player clicks on any Question mark in the survey table, this
function is called in the background. This function displays the answer and
recalculates both single and (when appropriate) pairwise correations.
The function also increases the time value and checks whether a day has passed
or not. If yes, it changes any of the symptoms values that may now be revealed
(when the delay factor is greater than 0). Once these symptoms have changed,
it calculates all the statistics again.
• guess
This function is called when the player clicks on the name of a cause. It checks
whether the cause chosen by the player is in the ’correctcause’ array. If yes, it
makes that button green, else that button becomes red. In any case, if the game
still needs to be played (not all causes have been found), it updates symptom
values if necessary to reflect the day that has passed. When symptom values
are updated, this function also updates correlations.

Chapter 7
User Tests
We conducted user tests on 18 users. 13 of them had technical background like
Computer Science, four worked outside of the mathematical sciences. 2 out of 13
were statisticians or mathematicians.
Each user played 5 games of increasing levels of difficulty. The first level had only
one cause and 0% of ’ ?’ with 0 delay. This was to give an idea of what actually needs
to be done in the game. Only about 5 users Out of 18 were able to understand what
needed to be done and how to go about finding the cause.
The second level had one cause again with 0 delay but this time 70% of the entries
in the survey table had question marks. By this time everybody knew what needs to
be done. This level introduced the concept of asking ’questions’. 16 out of 18 took
a guess without even asking question and almost all again got them right. But those
who did not learned that the correlation of few data points might be misleading.
In the third level, we introduced delay factor and taught users how those might
affect statistics. We had to explain the delay concept here because 50% of the users
thought that a subject might be exposed to a cause within this delay.
23

CHAPTER 7. USER TESTS
24
In the fourth level, we made delay 0 again but had 2 causes to find out. Users got
the idea of ’OR’ing the cause but only 2 of them actually got the strategy of using
’n’s in symptoms. For the others, we gave some hints.
In the fifth level, we had 2 causes to find out with delay 1. Users could play this
level without guidance and reported enjoying the game a lot. 5 out of 18 guessed
a cause without any strategy. Other 12 tried to rule out the cause pairs based on
column ’OR’.
Their reaction and thinking process was recorded in an mp3.
After playing, each of then was asked following questions :
• How did you decide the cause?
• At what point of the day do you generally guess a cause?
• At what value of statistics shown do you decide the cause?
• How do you use confidence interval?
• When your are given 2 causes, do you use bottom table?
• Do you use p-value?
• What improvements can be done?
• Feedback- what do you think about the game?
Our goal was to test what learning had taken place through the game playing,
whether a user could understand the concept of p-value or confidence interval. Many
Users did not use either p-value or confidence interval. They did however use the

CHAPTER 7. USER TESTS
25
correlation. We have since removed the p-value, because it seems so confusing in the
context of a column with few answers.
One user asked to change the game so that there will be more use of p-value
and confidence Interval in the game. One user suggested to have longer buttons for
question marks while other suggested to add mouse animation when hovered over a
question mark.
Nobody liked the apparently theoretical nature of the help file. Suggestions were
to add snapshots and videos to explain what’s going on.
Users had suggestion regarding the graphical interface as well. Many did not like
the idea of frames and scrolling up and down every now and then to see a certain
entry. They also suggested to have a pop-up when mouse hovers over certain place.
Some suggested to have lower bound on how bad a user can do e.g. not allowing
a user to take more than 4 days to find out answer.
Some also suggested that we should display a timer ticking as soon as the game
starts. That way, even if a user might not ask any question, there will be a sense of
urgency.
Also after testing, we sensed a need to tell people that this combination is easiest.
Thus, along with ’start game’ button, its better to have buttons like ’Level 0’, ’Level
1’ and so on at least till ’Level 5’.

Chapter 8
Conclusion
8.1
Evaluation
The game tries to teach statistics which it does pretty well. People use the displayed
statistics to play the game. They make useful guesses based on statistics.
8.2
Related Work
There are games that have been develped with simple animation whose purpose is to
explain statistical concepts. For example the University of Reading has developed a
very simple animated excel based game to explain concepts like statistics computed
from sampling such as ratios and variation. They also developed a few other games
that teach the concepts of estimation, sampling and the meaning of standard error.
The figure above shows one such example called Woods. In this example, every
plot has a certain number of small/big trees. We are supposed to choose a few plots
which will have the same ratio of small/big trees as in the all plots combined.
26

CHAPTER 8. CONCLUSION
27
Figure 8.1: Related Games : Woods
The objective of the game was to estimate the total number of trees in the forest
and estimate the proportion of large trees in the forest using the concept of stratefi-
cation (i.e. the way in which different groups can be formed over population for
estimating it) Similarly, Pandemic tries to teach concepts of correlation and boot-
strapping to help users make a guess about the cause.
There are similar games teaching different statistical concepts. For example, the
Game Tomato helps understand the issues involved in experimental design while the
game Mice teaches design of multi-stage survey.

CHAPTER 8. CONCLUSION
28
Figure 8.2: A Game State in Woods
8.3
Future Work
This work is taking a whole new dimension in statistical games. It generates its
own data and then lets user play around with that data. It tries to make active the
concepts of correlation and bootstrapping.
Of course, any concept can be taken and put it into real life situation to make
that concept easier to understand.
Talking about this specific game, the user interface has to be made more intuitive
and more fun.

Bibliography
[1] Dennis Shasha and Manda Wilson, NYU.
Statistics is easy!, 2008.
[2] Games for Learning.
Wikipedia.
[3] Statistical Services Centre at the University of Reading, England.
http://www.reading.ac.uk/ssc/software/games/stat games.html
[4] Jonathan Jay Koehler, UT Austin.
http://www.mccombs.utexas.edu/faculty/jonathan.koehler/links/statistics.asp,
2010.
[5] A. M. Garcia, UCSD.
http://math.ucsd.edu/ anistat/
29