Review of "Case-Based Reasoning: A Textbook" by Michael Richter and Rosina Weber, Springer 2013. xviii+546 pages Review #CR142575, Computing Reviews, August 1, 2014
Learning from experience --- using knowledge of past problems and their solutions to find a solution to the problem at hand --- is one of the central forms of both natural and machine learning. Most machine learning algorithms divide this into two parts: first, a general rule or formula or method is abstracted from the corpus of past cases offline, and then this rule can be applied to new examples as they arise. An alternative approach, however, cuts out the middle man and works directly from the past examples to the current case. Confronted with a new problem, the reasoning mechanism searches through the corpus looking for previous cases that were similar, and then adapts the solution that worked there as necessary. No general rule is ever derived or applied. Intuitively, this idea seems very attractive; it seems both very plausible as a cognitive mechanism, and like an idea that should lend itself to an efficient and effective implementation.
Two versions of this general idea have been developed in the computer science literature. The k-nearest-neighbors (kNN) algorithm is a precisely defined version: a ``case'' is a feature vector, ``similarity'' is the distance between vectors under a metric, the ``solution'' is an additional feature to be predicted. The number k is a fixed parameter of the program. Presented with a new instance, one retrieves the k closest cases in the corpus of cases, and these ``vote'' on the solution. No solution adaptation is carried out.
The other method is case-based reasoning (CBR). In CBR the representation of cases and solutions have a much richer structure, and the measure of similarity and the method of solution adaptation reflect deep domain knowledge. The first studies of this techniques were by Janet Kolodner and Kristian Hammond in the 1980's; Kolodner published a general survey in 1993.
Both kNN and CBR are a little out of the mainstream of AI and machine learning; for instance, the current edition of Russell and Norvig contains only 4 pages on kNN and only a passing reference to CBR. Nonetheless I was aware of some of the progress over the years in kNN. On the theoretical side, more efficient algorithms for retrieving the k nearest neighbors (the computationally hard part of the process), such as locality-sensitive hashing have been developed by Piotr Indyk and others. On the application side, Torralba, Fergus, and Freeman in a tour de force demonstrated that kNN could be efficiently and effectively applied to a corpus of 80 million images, each 32 x 32 pixels, to do image classification. By contrast, I had no idea of the state of the art in CBR since Kolodner's book 20 years ago, so, when I heard there was a new textbook, I eagerly awaited the chance to bring myself up to date.
546 pages and several hours of reading later, I am, regrettably, not much the wiser. Richter and Weber's book is extremely poorly written; it veers from obvious to vague to meaningless. Occasionally it drops into highly technical detail without providing nearly enough background e.g. complicated information-theoretic formulas and obscure theorems in the chapter on probability.
The exercises are similarly flawed. Most of them are so vague that, as a student, I would have no idea they are asking, and, as an instructor, I would have no idea how to grade them, e.g. (p. 52) "Define a recommender system for students who want to study in your area. Formulate typical queries the students may have." A few require technical knowledge far beyond and unrelated to the content of the book, e.g. (p. 186) "Prove that the average number of Voronoi edges per Voronoi polygon (in two dimensions) is at most 6." In a class on computational geometry this would be a reasonable problem (the proof, using Euler's formula, is short, though a little tricky); in this course, it is completely out of place.
The later chapters on "complex knowledge sources" are short surveys of various AI topics, with little clear of the relation between these theories and CBR. For instance chapter 17 on "Textual CBR" is mostly the same kind of material one would find in an IR textbook such as Manning, Raghavan, and Schutze, presented much more superficially and much less intelligibly. The discussion of the connection to CBR is short and mostly vague; the main specific information is a list of a collection of similarity measures from the IR literature, inadequately described or explained. Basic questions remain unanswered: What does CBR contribute to the problem of text understanding that is missed by more mainstream approaches? What understanding problems can be solved using CBR? What kind of corpus of texts can be used, What adaptation techniques are useful, if any? The chapters on image recognition and speech recognition are similar.
The discussion of specific applications are frustratingly vague. For example, section 4.6.1 discusses spam filtering. The claim is made, "CBR is recommended for spam filtering classification because of how it compares to its alternatives". The reader naturally wants to know how CBR does compare to its alternatives. Has an industrial-strength CBR spam filter been built, or even a toy CBR spam filter? How large a corpus has it been used on, what features and what techniques does it use, how does it compare in accuracy or in any other respect to the alternative? These key questions are not discussed nor is a reference given.
I still do not know whether CBR is, at this point, a viable technique, a promising line of research, or essentially moribund. So I do not know whether there is any point in teaching a course on the topic at all. But certainly, if one were to teach a course on CBR, this textbook would not be very helpful.