The 1R Learning Algorithm

The 1R learning algorithms is the simplest rule-based classification learning algorithm for discrete attributes. Given a table T of labelled instances, and a classification attribute C, the 1R algorithms returns a rule that predicts C on the basis of a single predictive attributed A in T; that is, it returns a rule of the form
  
  case (X.A) {
    V1 : predict X.C == W1;
    V2 : predict X.C == W2;
    ...
    Vk : predict X.C == Wk;
  }
where V1 ... Vk range over the possible values of X.A, and W1 ... Wk are possible values of X.C. The Wi need not all be different and need not cover all possible values of X.C.

The algorithm can be written as follows:

Select(T : table; A : attribute; V : value) return table;
  { return({ X | X in T and X.A=V })

MostCommonValue(C,A : attribute; V : value; T : table) return value;
  { return(the most common value of X.C in select(T,A,V) }

/* BestRule(C,A,T) returns the rule that best predicts X.C from X.A 
   based on the instances in table T */

BestRule(C,A : attribute; T : table) return rule;
 { let {V1 ...Vk} be the set of values for A in T;
   return("case(X.A) {
             _V1_ : predict X.C == _MostCommonValue(C,A,V1,T)_; 
             ...
             _Vk_ : predict X.C == _MostCommonValue(C,A,Vk,T)_;
            }"
 }

(The underscore notation about is anti-quoting; the actual value is 
substituted into a quoted string.  For instance, if Z=4 then
"Z + _Z_ == _2*Z_" denotes the string "Z + 4 = 8". The values of Z and 2*Z
are substituted for _Z_ and _2*Z_)

Accuracy(R : rule; T: table) return number;
  return(the fraction of X in T where X.C = apply(R,X.A))

/* 1R(C,T) returns a rule for predicting X.C based on table T. */
1R(C : attribute; T : table) return rule;
{ BestAcc = 0;
  for (A in Attributes(T), A != C) {
      R = BestRule(C,A,T);
      Acc = Accuracy(R,T);
      if (Acc > BestAcc) {
        BestR = R;
        BestAcc = Acc;
      } }
  return(BestR)
}











Example

Let T be the following table:
Office Party State Vote
House Dem NY Yes
House Dem NJ No
House Dem CT Yes
House Rep NJ No
House Rep CT Yes
Senate Dem NY No
Senate Dem NJ No
Senate Rep NY No
Senate Rep NJ Yes
Senate Rep CT Yes

Then BestRule(Vote,Office,T) is "case (X.Office) { House: predict (X.Vote==Yes); Senate: predict(X.Vote==No); }" with an accuracy of 6/10.

BestRule(Vote,Party,T) is "case (X.Party) { Dem: predict (X.Vote==No); Rep: predict(X.Vote==Yes); }" with an accuracy of 6/10.

BestRule(Vote,State,T) is
"case (X.State) { NY: predict (X.Vote==No); NJ: predict (X.Vote==No); CT: predict(X.Vote==Yes) }"
with an accuracy of 8/10.

The 1R algorithm therefore returns the rule
"case (X.State) { NY: predict (X.Vote==No); NJ: predict (X.Vote==No); CT: predict(X.Vote==Yes) }"