V22.0436 - Prof. Grishman

Lecture 27:  Multiprocessors, cont'd

Discuss network topologies and their properties.

A more specialized parallel programming model:  mapReduce

map step (applied independently to each key-value input pair):
    map (in_key, in_value) -> list(out_key, intermediate_value)
reduce step (applied independently to each out_key):
    reduce (out_key, list(intermediate_value)) -> list(out_value)
combines all intermediate values for a particular key, producing a list of output values (most often, one value)

Example: Count word occurrences

  map(String input_key, String input_value):
// input_key: document name
// input_value: document contents
for each word w in input_value:
EmitIntermediate(w, "1");


reduce(String output_key, Iterator intermediate_values):
// output_key: a word
// output_values: a list of counts
int result = 0;
for each v in intermediate_values:
result += ParseInt(v);
Emit(AsString(result));

Example: Produce inverted index

  map(String input_key, String input_value):
// input_key: document name
// input_value: document contents
for each word w in input_value:
EmitIntermediate(w, input_key);


reduce(String output_key, Iterator intermediate_values):
// output_key: a word
// output_values: a sorted list of document names
Emit(uniq(sort(intermediate_values)));

Implementation of mapReduce takes care of parallelization