alipy.query_strategy.multi_label.
LabelRankingModel
Cost-Effective Active Learning from Diverse Labelers (CEAL) method assume
that different oracles have different expertise. Even the very noisy oracle
may perform well on some kind of examples. The cost of a labeler is proportional
to its overall labeling quality and it is thus necessary to query from the right oracle
according to the selected instance.
This method will select an instance-labeler pair (x, a), and queries the label of x
from a, where the selection of both the instance and labeler is based on a
evaluation function Q(x, a).
The selection of instance is depend on its uncertainty. The selection of oracle is
depend on the oracle's performance on the nearest neighbors of selected instance.
The cost of each oracle is proportional to its overall labeling quality.
References
----------
[1] Sheng-Jun Huang, Jia-Lve Chen, Xin Mu, Zhi-Hua Zhou. 2017.
Cost-Effective Active Learning from Diverse Labelers. In The
Proceedings of the 26th International Joint Conference
on Artificial Intelligence (IJCAI-17), 1879-1885.
Methods
init
__init__(self, X, y, oracles, initial_labeled_indexes)
Parameters:
|
-
X: 2D array, optional (default=None)
-
Feature matrix of the whole dataset. It is a reference which will not use additional memory.
-
y: array-like, optional (default=None)
-
Label matrix of the whole dataset. It is a reference which will not use additional memory.
-
oracles: {list, alipy.oracle.Oracles}
-
An alipy.oracle.Oracle object that contains all the
available oracles or a list of oracles.
Each oracle should be a alipy.oracle.Oracle object.
-
initial_labeled_indexes: {list, np.ndarray, IndexCollection}
-
The indexes of initially labeled samples. Used for initializing the scores of each oracle.
|
select
select(self, label_index, unlabel_index, model=None, **kwargs)
Query from oracles. Return the index of selected instance and oracle.
Parameters:
|
-
label_index: {list, np.ndarray, IndexCollection}
-
The indexes of labeled samples.
-
unlabel_index: {list, np.ndarray, IndexCollection}
-
The indexes of unlabeled samples.
-
model: object, optional (default=None)
-
Current classification model, should have the 'predict_proba' method for probabilistic output.
If not provided, LogisticRegression with default parameters implemented by sklearn will be used.
-
n_neighbors: int, optional (default=10)
-
How many neighbors of the selected instance will be used
to evaluate the oracles.
|
Returns:
|
-
selected_instance: int
-
The index of selected instance.
-
selected_oracle: int or str
-
The index of selected oracle.
If a list is given, the index of oracle will be returned.
If a Oracles object is given, the oracle name will be returned.
|
select_by_prediction_mat
select_by_prediction_mat(self, label_index, unlabel_index, predict, **kwargs)
Query from oracles. Return the index of selected instance and oracle.
Parameters:
|
-
label_index: {list, np.ndarray, IndexCollection}
-
The indexes of labeled samples.
-
unlabel_index: {list, np.ndarray, IndexCollection}
-
The indexes of unlabeled samples.
-
predict: : 2d array, shape [n_samples, n_classes]
-
The probabilistic prediction matrix for the unlabeled set.
-
n_neighbors: int, optional (default=10)
-
How many neighbors of the selected instance will be used
to evaluate the oracles.
|
Returns:
|
-
selected_instance: int
-
The index of selected instance.
-
selected_oracle: int or str
-
The index of selected oracle.
If a list is given, the index of oracle will be returned.
If a Oracles object is given, the oracle name will be returned.
|