Query type in active learning

Most existing active learning approaches query instances for their class assignment. However, human expert can provide much more complex information. In some applications, a specialized query type which decides what information to query for the selected instance may be more effective than label querying.

ALiPy implements IJCAI'15 Multi-Label Active Learning: Query Type Matters (AURO) method which queries the relevance ordering of the 2 selected labels of an instance in multi label setting, i.e., ask the oracle which of the two labels is more relevant to the instance.

Due to the less attention to this direction, we only implement AURO for query type. More strategies will be added when new advanced methods are proposed in the future.

In the following content, we will first introduce the usage of AURO, and then an example of query type experiment will be presented.

Query type strategies

ALiPy implements AURO (IJCAI 2015) method for multi label classification problem. This method selects one instance and its 2 labels to query which one is more relevant.

The usage of this method is basically equal to the multi label strategy . One different is that it returns the indexes of instance, label 1 and label 2 for querying their relevence.

For more detailed usage, please referto the api page and the following example code.

Query type experiment example

import copy
import numpy as np
from sklearn.datasets import load_iris
from sklearn.preprocessing import OneHotEncoder
from alipy.query_strategy.query_type import QueryTypeAURO
from alipy.query_strategy.multi_label import LabelRankingModel
from alipy.index.multi_label_tools import get_Xy_in_multilabel
from alipy import ToolBox

X, y = load_iris(return_X_y=True)
mlb = OneHotEncoder()
mult_y = mlb.fit_transform(y.reshape((-1,1)))
mult_y = np.asarray(mult_y.todense())
mult_y[mult_y == 0] = -1

alibox = ToolBox(X=X, y=mult_y, query_type='PartLabels')
alibox.split_AL(test_ratio=0.2, initial_label_rate=0.05, all_class=False)

# query type strategy
AURO_results = []

for round in range(10):

    train_idx, test_idx, label_ind, unlab_ind = alibox.get_split(round)
    # Get intermediate results saver for one fold experiment
    saver = alibox.get_stateio(round)
    query_y = mult_y.copy()
    AURO_strategy = QueryTypeAURO(X=X, y=mult_y)
    # base model
    model = LabelRankingModel()

    for iter in range(100):

        select_ins, select_y1, select_y2 = AURO_strategy.select(label_ind, unlab_ind, query_y)

        # relevance
        y1 = mult_y[select_ins, select_y1]
        y2 = mult_y[select_ins, select_y2]
        if y1 == -1.0 and y2 == -1.0:
            query_y[select_ins, select_y1] = -1
            query_y[select_ins, select_y2] = -1
        elif y1 >= y2:
            query_y[select_ins, select_y1] = 1
            query_y[select_ins, select_y2] = 0.5
        else:
            query_y[select_ins, select_y1] = 0.5
            query_y[select_ins, select_y2] = 1

        # record results
        label_ind.update([(select_ins, select_y1), (select_ins, select_y2)])
        unlab_ind.difference_update([(select_ins, select_y1), (select_ins, select_y2)])

        if iter % 3 == 0:
            # train/test
            X_tr, y_tr, _ = get_Xy_in_multilabel(label_ind, X=X, y=query_y)
            model.fit(X=X_tr, y=y_tr)
            pres, pred = model.predict(X[test_idx])

            perf = alibox.calc_performance_metric(y_true=mult_y[test_idx], y_pred=pred, performance_metric='hamming_loss')

            # save
            st = alibox.State(select_index=[(select_ins, select_y1), (select_ins, select_y2)], performance=perf)
            saver.add_state(st)


    AURO_results.append(copy.copy(saver))

analyser = alibox.get_experiment_analyser()
analyser.add_method(method_name='AURO', method_results=AURO_results)
analyser.plot_learning_curves()

Copyright © 2018, alipy developers (BSD 3 License).