next up previous contents
Next: Boosting Up: Round Robin Classification Previous: Empirical Evaluation   Contents


Round Robin Learning as an Ensemble Technique

Ensemble techniques have received considerable attention within the recent machine learning literature (Bauer and Kohavi, 1999; Dietterich, 2000a;1997; Opitz and Maclin, 1999). The idea to obtain a diverse set of classifiers for a single learning problem and to vote or average their predictions is both simple and powerful, and the obtained accuracy gains often have a sound theoretical foundation (Breiman, 1996; Freund and Schapire, 1997). Averaging the predictions of these classifiers helps to reduce the variance and often increase the reliability of the predictions. There are several techniques for obtaining a diverse set of classifiers. The most common technique is to use subsampling to diversify the training sets as in bagging (Breiman, 1996) and boosting (Freund and Schapire, 1997). Other techniques include the use of different feature subsets (Bay, 1999), to exploit the randomness of the base algorithms (Kolen and Pollack, 1991), possibly by artificially randomizing their behavior (Dietterich, 2000b), or to use multiple representations of the domain objects, for example by using information originating from different hyperlinks pointing to a web page (Fürnkranz, 2001a; Fürnkranz, 1999a). Finally, classifier diversity can be ensured by modifying the output labels, i.e., by transforming the learning tasks into a collection of related learning tasks that use the same input examples but a different assignments of the class labels. Error-correcting output codes are the most prominent example for this type of ensemble method (Dietterich and Bakiri, 1995).

Clearly, round robin classification may also be interpreted as a member of this last group, and its performance gain may be seen in this context. Obviously, the final prediction is made by exploiting the redundancy provided by multiple models, each of them being constructed from a subset of the original data. However, contrary to subsampling approaches like bagging and boosting, these datasets are constructed deterministically.14 In this respect pairwise classification shares more similarities with error-correcting output codes, but differs from it through the fixed procedure for setting up the new binary problems and the fact that each of the new problems is smaller than the original problem. In particular the latter fact may often cause the sub-problems in pairwise classification to be conceptually simpler than the original problem (as illustrated in Figure 1).

In previous work (Fürnkranz, 2001b), we observed that the improvements in accuracy obtained by R3 over RIPPER were quite similar to those obtained by C5.0-BOOST over C5.0 on the same problems. Round robin binarization seemed to work whenever boosting worked, and vice versa. The correlation coefficient $ r^2$ of the the error ratios of C5.0-BOOST/C5.0 and R3/RIPPER on the 20 datasets was about 0.618, which is in the same range as correlation coefficients for bagging and boosting (Opitz and Maclin, 1999). We interpreted this as weak evidence that the performance gains of round robin learning may be comparable to that of other ensemble methods and that it could be used as a general method for improving a learner's performance on multi-class problems. We will further investigate this question in this section and will in particular focus upon a comparison of round robin learning with boosting (Section 6.1) and bagging (Section 6.2), and upon the potential of combining it with these techniques.


Table: Boosting: A comparison between round robin binarization and boosting, both with C5.0 as a base learner. The first column shows the results of C5.0, while the next three column pairs show the results of round robin learning, boosting, and the combination of both, all with C5.0 as a base learner. For these, we give both the absolute error rate and the performance ratio relative to the base learner C5.0. The last line shows the geometric average of these ratios.
dataset C5.0 round robin boosting both
abalone 78.48 75.08 0.957 77.88 0.992 74.67 0.951
car 7.58 5.84 0.771 3.82 0.504 1.85 0.244
glass 35.05 24.77 0.707 27.57 0.787 22.90 0.653
image 3.20 2.90 0.905 1.60 0.500 1.73 0.541
lr spectrometer 51.22 51.79 1.011 46.70 0.912 51.98 1.015
optical 9.20 5.04 0.547 2.46 0.267 2.54 0.277
page-blocks 3.09 2.98 0.964 2.58 0.834 2.78 0.899
sat 13.82 13.16 0.953 9.32 0.675 9.00 0.651
solar flares (c) 15.77 15.69 0.995 16.41 1.041 16.70 1.059
solar flares (m) 4.90 4.90 1.000 5.90 1.206 5.83 1.191
soybean 9.66 6.73 0.697 6.59 0.682 6.44 0.667
thyroid (hyper) 1.11 1.14 1.024 1.03 0.929 1.33 1.190
thyroid (hypo) 0.58 0.69 1.182 0.32 0.545 0.53 0.909
thyroid (repl.) 0.72 0.74 1.037 0.90 1.259 0.90 1.259
vehicle 26.24 29.20 1.113 24.11 0.919 23.17 0.883
vowel 21.72 19.49 0.898 8.89 0.409 14.75 0.679
yeast 43.26 40.63 0.939 41.85 0.967 40.77 0.942
average 0.909 0.735 0.757




Subsections
next up previous contents
Next: Boosting Up: Round Robin Classification Previous: Empirical Evaluation   Contents
Johannes Fürnkranz 2002-03-11