The 1R Learning Algorithm

The 1R learning algorithms is the simplest rule-based classification learning algorithm for discrete attributes. Given a table T of labelled instances, and a classification attribute C, the 1R algorithms returns a rule that predicts C on the basis of a single predictive attributed A in T; that is, it returns a rule of the form

  
  case (X.A) {
    V1 : predict X.C == W1;
    V2 : predict X.C == W2;
    ...
    Vk : predict X.C == Wk;
  }

where V1 ... Vk range over the possible values of X.A, and W1 ... Wk are possible values of X.C. The Wi need not all be different and need not cover all possible values of X.C.

The algorithm can be written as follows:

Select(T : table; A : attribute; V : value) return table;
  { return({ X | X in T and X.A=V })

MostCommonValue(C,A : attribute; V : value; T : table) return value;
  { return(the most common value of X.C in select(T,A,V) }

/* BestRule(C,A,T) returns the rule that best predicts X.C from X.A 
   based on the instances in table T */

BestRule(C,A : attribute; T : table) return rule;
 { let {V1 ...Vk} be the set of values for A in T;
   return("case(X.A) {
             _V1_ : predict X.C == _MostCommonValue(C,A,V1,T)_; 
             ...
             _Vk_ : predict X.C == _MostCommonValue(C,A,Vk,T)_;
            }"
 }

(The underscore notation about is anti-quoting; the actual value is 
substituted into a quoted string.  For instance, if Z=4 then
"Z + _Z_ == _2*Z_" denotes the string "Z + 4 = 8". The values of Z and 2*Z
are substituted for _Z_ and _2*Z_)

Accuracy(R : rule; T: table) return number;
  return(the fraction of X in T where X.C = apply(R,X.A))

/* 1R(C,T) returns a rule for predicting X.C based on table T. */
1R(C : attribute; T : table) return rule;
{ BestAcc = 0;
  for (A in Attributes(T), A != C) {
      R = BestRule(C,A,T);
      Acc = Accuracy(R,T);
      if (Acc > BestAcc) {
        BestR = R;
        BestAcc = Acc;
      } }
  return(BestR)
}

Example

Let T be the following table:

Office	Party	State	Vote
House	Dem	NY	Yes
House	Dem	NJ	No
House	Dem	CT	Yes
House	Rep	NJ	No
House	Rep	CT	Yes
Senate	Dem	NY	No
Senate	Dem	NJ	No
Senate	Rep	NY	No
Senate	Rep	NJ	Yes
Senate	Rep	CT	Yes

Then BestRule(Vote,Office,T) is "case (X.Office) { House: predict (X.Vote==Yes); Senate: predict(X.Vote==No); }" with an accuracy of 6/10.

BestRule(Vote,Party,T) is "case (X.Party) { Dem: predict (X.Vote==No); Rep: predict(X.Vote==Yes); }" with an accuracy of 6/10.

BestRule(Vote,State,T) is
"case (X.State) { NY: predict (X.Vote==No); NJ: predict (X.Vote==No); CT: predict(X.Vote==Yes) }"
with an accuracy of 8/10.

The 1R algorithm therefore returns the rule
"case (X.State) { NY: predict (X.Vote==No); NJ: predict (X.Vote==No); CT: predict(X.Vote==Yes) }"