These publications are provided on the
LIVE website for research purposes ONLY. No part of these documents
may be distributed for commercial purposes
Indexes for three-class classification performance assessment – An empirical comparison
M.P. Sampat, A.C. Patel, Y. Wang, S. Gupta, W. Kan, A.C. Bovik and M.K. Markey
IEEE Transactions on Information Technology in Biomedicine, Special Issue on Computational Intelligence in Medical Systems
Abstract
Assessment of classifier performance is critical for fair
comparison of methods, including considering alternative models
or parameters during system design. The assessment must not only
provide meaningful data on the classifier efficacy, but it must do
so in a concise and clear manner. For two-class classification problems,
receiver operating characteristic analysis provides a clear
and concise assessment methodology for reporting performance
and comparing competing systems. However, many other important
biomedical questions cannot be posed as “two-class” classification
tasks and more than two classes are often necessary. While
severalmethods have been proposed for assessing the performance
of classifiers for such multiclass problems, none has been widely
accepted. The purpose of this paper is to critically review methods
that have been proposed for assessing multiclass classifiers. A
number of these methods provide a classifier performance index
called the volume under surface (VUS). Empirical comparisons
are carried out using 4 three-class case studies, in which three popular
classification techniques are evaluated with these methods.
Since the same classifier was assessed using multiple performance
indexes, it is possible to gain insight into the relative strengths
and weakness of the measures. We conclude that: 1) the method
proposed by Scurfield provides the most detailed description of
classifier performance and insight about the sources of error in a
given classification task and 2) the methods proposed by He and
Nakas also have great practical utility as they provide both the
VUS and an estimate of the variance of the VUS. These estimates
can be used to statistically compare two classification algorithms.
[Download PDF]