Multi-label classification

In machine learning, multi-label classification and the strongly related problem of multi-output classification are variants of the classification problem where multiple target labels must be assigned to each instance. Multi-label classification should not be confused with multiclass classification, which is the problem of categorizing instances into more than two classes. Formally, multi-label learning can be phrased as the problem of finding a model that maps inputs x to vectors y, rather than scalar outputs as in the ordinary classification problem.

There are two main methods for tackling the multi-label classification problem: problem transformation methods and algorithm adaptation methods. Problem transformation methods transform the multi-label problem into a set of binary classification problems. Algorithm adaptation methods adapt the algorithms to directly perform multi-label classification.

Problem transformation
Several problem transformation methods exist for multi-label classification; the baseline approach amounts to training one classifier per label, similar to the one-vs.-all (OvA, also one-vs.-rest, OvR) method for multiclass classification. Given an unseen sample, the combined model then predicts all labels for this sample for which the respective binary classifiers predict a positive result. (This method has also been called the "binary relevance" method. )

Various other transformations exist: the label combinations (LC) transformation, creates one binary classifier for every possible label combination. Other transformation methods include RAkEL and classifier chains. Various problem transformation methods have been developed such as Ml-kNN, a variant of the k-nearest neighbors lazy classifiers. A more detailed description of the most well known methods for multi-label classification and an extensive empirical evaluation can be found here.

Evaluation
Evaluation metrics for multi-label classification are inherently different from those used in multi-class (or binary) classification, due to the inherent differences of the classification problem. The following metrics are typically used:
 * Hamming loss: the percentage of the wrong labels to the total number of labels. This is a loss function, so the optimal value is zero.
 * Label-based accuracy
 * Exact match: is the most strict metric, indicating the percentage of samples that have all their labels classified correctly.

Implementations and datasets
Java implementations of multi-label algorithms are available in the Mulan and Meka software packages, both based on Weka.

The scikit-learn python package implements some multi-labels algorithms and metrics.

A list of commonly used multi-label data-sets is available at the Mulan website.