Kernel selection in SVM for high-dimensional & sparse binary data

A beautiful picture

One of the biggest challenges of using support vector machine (SVM) for non-linear classification problems is how to select an appropriate kernel class. The goal of the project is to compare the generalization bound for the SVM classifier using different kernel class in case where all features are binary. A generalization bound measures a hypothesis class's predictive performance in PAC learning - that is, with high probability, we select function that will have low generalization error.


Experiment

Mushroom data: response variable is binary (edible or poisonous), features are all categorical and are encoded to binary variables using one-hot encoding (in total 112 features) using a train-test split of 70%/30%.


Theoretical analysis

Use statistical learning theory tools to get some insights.


Summary of analysis results

In practice it is very difficult to test performance of many kernel classes in fine grid search when data size is large. Using the results above, one can first identify a very rough hyper-parameter space, then choose an appropriate kernel class. This will save much time compared to first obtain the optimal hypothesis from each kernel class and then select the best kernel.


Last updated on Nov 1, 2019