In multi label setting, an instance is associated with multiple labels simultaneously.

In active learning literature, there are 2 ways to query labels for multi label datasets:

1. Query all labels of an instance.

2. Query an instance-label pair at a time.

For the 1st situation, it is the same with the single label setting in implementation.

And for the 2nd situation, alipy provides many tools for supporting this setting.

Next, we will introduce the tools for multi label setting in alipy.

We first give a definition of a multi-label index:

Each index should be a tuple with 2 elements. The first element represents the index of instance, while the second one represents the indexes of labels. If you want to query all labels of an instance, your index should only have 1 element: (example_index, ). Otherwise, set 2 elements (example_index, [label_indexes]) to query specific labels.

Some examples of valid multi-label indexes include:

```
queried_index = (1, [3,4]) # query the 4th, 5th labels of the 2nd instance
queried_index = (1, [3])
queried_index = (1, 3)
queried_index = (1, (3))
queried_index = (1, (3,4))
queried_index = (1, ) # query all labels
```

To split the multi label datasets, you can use
```
alipy.data_manipulate.split_multi_label
```

function. It will split the dataset into training, testing set, and in training set,
there are a small fully labeled set and a large unlabeled pool.

Note that, the returned indexes of label and unlabel set are the multi-label indexes we define above.

```
from alipy.data_manipulate import split_multi_label
mult_y = [[1, 1, 1], [0, 1, 1], [0, 1, 0]] # 3 instances with 3 labels.
train_idx, test_idx, label_idx, unlabel_idx = split_multi_label(
y=mult_y, split_count=1, all_class=False,
test_ratio=0.3, initial_label_rate=0.5,
saving_path=None
)
```

The values in train_idx, test_idx, label_idx, unlabel_idx are:

```
[array([0, 1])]
[array([2])]
[[(0,)]]
[[(1,)]]
```

ALiPy provide an another IndexCollection class for multi label setting. The interfaces of this class is mainly the same with the IndexCollection in single label setting. However, we add many useful functions to support the multi label settings. These functions include : 1. Accept different types of ndexes. 2. Accept mask matrix. 3. Provide retrieving methods.

Since the introductions to
```
MultiLabelIndexCollection
```

will take up a lot of space. We refer users to the
introduction toMultiLabelIndexCollection
page for more details.

ALiPy provides
```
alipy.oracle.OracleQueryMultiLabel
```

for instance-label pair querying.
The initialization of this class is the same as the
Oracle
.

```
from alipy.oracle import OracleQueryMultiLabel
oracle = OracleQueryMultiLabel(labels=mult_y)
```

When querying, you need to provide a single or list of valid
*
multi label index
*
we define above.

```
label, cost = oracle.query_by_index((1, 2)) # query the 3rd label of 2nd instance
labels, cost = oracle.query_by_index([(1, 2), (0, 1)])
```

The available multi label metrics in alipy are
```
accuracy_score, hamming_loss,
one_error, coverage_error, label_ranking_loss, average_precision_score,
label_ranking_average_precision_score, micro_auc_score
```

.

You can use them by import the
```
metrics
```

module:

```
from alipy.metrics import hamming_loss
hl = hamming_loss(y_true=[[0, 1, 0]], y_pred=[[1, 1, 0]])
```

ALiPy provides some existing algorithms for experiment comparing:

AUDI (ICDM 2013) : Select an instance-label pair based on uncertainty and diversity.

QUIRE (TPAMI 2014) : Select an instance-label pair based on the informativeness and representativeness.

MMC (KDD 2009) : Select instance to query all of its labels based on maximum loss reduction with maximal confidence.

Adaptive (IJCAI 2013) : Select instance to query all of its labels based on max margin uncertainty and label cardinality inconsistency.

Random : Select instances or instance-label pairs randomly.

The usages of these methods are mainly the same with the normal setting. Note that, the returned indexes is a list of multi label index we define above.

```
import copy
import numpy as np
from sklearn.datasets import load_iris
from sklearn.preprocessing import OneHotEncoder
from alipy.query_strategy.multi_label import *
from alipy.index.multi_label_tools import get_Xy_in_multilabel
from alipy import ToolBox
X, y = load_iris(return_X_y=True)
mlb = OneHotEncoder()
mult_y = mlb.fit_transform(y.reshape((-1,1)))
mult_y = np.asarray(mult_y.todense())
mult_y[mult_y == 0] = -1
alibox = ToolBox(X=X, y=mult_y, query_type='PartLabels')
alibox.split_AL(test_ratio=0.2, initial_label_rate=0.05, all_class=False)
def main_loop(alibox, round, strategy):
train_idx, test_idx, label_ind, unlab_ind = alibox.get_split(round)
# Get intermediate results saver for one fold experiment
saver = alibox.get_stateio(round)
# base model
model = LabelRankingModel()
while len(label_ind) <= 120:
# query and update
select_labs = strategy.select(label_ind, unlab_ind)
# use cost to record the amount of queried instance-label pairs
if len(select_labs[0]) == 1:
cost = mult_y.shape[1]
else:
cost = len(select_labs)
label_ind.update(select_labs)
unlab_ind.difference_update(select_labs)
# train/test
X_tr, y_tr, _ = get_Xy_in_multilabel(label_ind, X=X, y=mult_y)
model.fit(X=X_tr, y=y_tr)
pres, pred = model.predict(X[test_idx])
perf = alibox.calc_performance_metric(y_true=mult_y[test_idx], y_pred=pred, performance_metric='hamming_loss')
# save
st = alibox.State(select_index=select_labs, performance=perf, cost=cost)
saver.add_state(st)
return copy.deepcopy(saver)
audi_result = []
quire_result = []
random_result = []
mmc_result = []
adaptive_result = []
for round in range(5):
# init strategies
audi = QueryMultiLabelAUDI(X, mult_y)
quire = QueryMultiLabelQUIRE(X, mult_y)
mmc = QueryMultiLabelMMC(X, mult_y)
adaptive = QueryMultiLabelAdaptive(X, mult_y)
random = QueryMultiLabelRandom()
audi_result.append(main_loop(alibox, round, strategy=audi))
quire_result.append(main_loop(alibox, round, strategy=quire))
mmc_result.append(main_loop(alibox, round, strategy=mmc))
adaptive_result.append(main_loop(alibox, round, strategy=adaptive))
random_result.append(main_loop(alibox, round, strategy=random))
analyser = alibox.get_experiment_analyser(x_axis='cost')
analyser.add_method(method_name='AUDI', method_results=audi_result)
analyser.add_method(method_name='QUIRE', method_results=quire_result)
analyser.add_method(method_name='RANDOM', method_results=random_result)
analyser.add_method(method_name='MMC', method_results=mmc_result)
analyser.add_method(method_name='Adaptive', method_results=adaptive_result)
analyser.plot_learning_curves()
```

Copyright © 2018, alipy developers (BSD 3 License).