Comparison of different methods for multiclass classification problems
The project is divided into three sections. Section 1 discusses the One-vs-All and One-vs-One method which
are techniques to transform a multiclass classification problem into multiple binary classification problems.
Section 2 outlines the multinomial logistic regression and discusses why it is not sufficient in many cases.
Section 3 applies different multiclass classification methods to the Letter Image Recognition dataset,
and compares the model performance as well as provides some insights on each method.
Programming in this project are done using Python’s scikit-learn library.
Transform a multiclass classification problem into multiple binary classification problems
One-vs-All method
- Training a single binary classifier for each class by treating training samples in that class as
the positive samples and training samples not in that class as negative samples
- Usually has problems when the training data has an unbalanced or skewed class label distribution.
Can be solved by applying common techniques of balancing the training data such as over-sample
the minority class or under-sample the majority class.
One-vs-One method
- Also called the All-Pairs or All-vs-All method
- Usually much less sensitive to the problem of unbalanced class distribution as each binary
classifier is built only on a pair of classes.
Extend binary classification techniques to multiclass classification problems
Statistical Method - Multinomial logistic regression
- Use one of the K classes as the base or reference class and set up K-1 independent binary logistic
regression models by comparing each of the K-1 classes against the reference class.
- Allowing both prediction and inference to be made easily, which is important for almost all
statistical methods.
- Linear classifier (linear decision boundary for the classification). Inputs into the model are
assumed to be linearly separable and the predicted value is a linear function of the inputs
(here, the predicted log-odds is a linear model of the inputs)
Machine Learning Algorithms - Non-linear classifiers
- k-nearest neighbors (KNN)
- naive bayes
- decision trees (random forest)
- neural networks (multilayer perceptron (MLP))
- support vector machines (SVM)
Letter Image Recognition dataset
The data contains 20000 rectangular pixel images where each image (observation) is classified as one of
the 26 capital letters (therefore this is a 26-class classification problem) in the English alphabet.
In this data, each observation has 16 attributes or features, where each feature is either a statistical
moment or edge counts that has already been scaled into a range of integer values from 0 to 15.
Exploratory Data Analysis (EDA)
- Class label distribution
- Feature distributions
- Correlation of features
- Data Characteristics
Summary of model performance for all multiclass classification techniques
- The following table summarizes the predictive power of each model used in the data analysis
(from most predictive to least predictive).
- In conclusion, for this Letter Image Recognition dataset, the best model (in terms of predictive
power) is the One-vs-One method with SVM classifier.
Last updated on Jan 1, 2019