alipy.query_strategy.multi_label. LabelRankingModel

Cost-Effective Active Learning from Diverse Labelers (CEAL) method assume that different oracles have different expertise. Even the very noisy oracle may perform well on some kind of examples. The cost of a labeler is proportional to its overall labeling quality and it is thus necessary to query from the right oracle according to the selected instance.

This method will select an instance-labeler pair (x, a), and queries the label of x from a, where the selection of both the instance and labeler is based on a evaluation function Q(x, a).

The selection of instance is depend on its uncertainty. The selection of oracle is depend on the oracle's performance on the nearest neighbors of selected instance. The cost of each oracle is proportional to its overall labeling quality.

References

----------

[1] Sheng-Jun Huang, Jia-Lve Chen, Xin Mu, Zhi-Hua Zhou. 2017. Cost-Effective Active Learning from Diverse Labelers. In The Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI-17), 1879-1885.

Methods

init

__init__(self, X, y, oracles, initial_labeled_indexes)

Parameters:	X: 2D array, optional (default=None) Feature matrix of the whole dataset. It is a reference which will not use additional memory. y: array-like, optional (default=None) Label matrix of the whole dataset. It is a reference which will not use additional memory. oracles: {list, alipy.oracle.Oracles} An alipy.oracle.Oracle object that contains all the available oracles or a list of oracles. Each oracle should be a alipy.oracle.Oracle object. initial_labeled_indexes: {list, np.ndarray, IndexCollection} The indexes of initially labeled samples. Used for initializing the scores of each oracle.

Parameters:

X: 2D array, optional (default=None): Feature matrix of the whole dataset. It is a reference which will not use additional memory.
y: array-like, optional (default=None): Label matrix of the whole dataset. It is a reference which will not use additional memory.
oracles: {list, alipy.oracle.Oracles}: An alipy.oracle.Oracle object that contains all the
available oracles or a list of oracles.
Each oracle should be a alipy.oracle.Oracle object.
initial_labeled_indexes: {list, np.ndarray, IndexCollection}: The indexes of initially labeled samples. Used for initializing the scores of each oracle.

select

select(self, label_index, unlabel_index, model=None, **kwargs)

Query from oracles. Return the index of selected instance and oracle.

Parameters:	label_index: {list, np.ndarray, IndexCollection} The indexes of labeled samples. unlabel_index: {list, np.ndarray, IndexCollection} The indexes of unlabeled samples. model: object, optional (default=None) Current classification model, should have the 'predict_proba' method for probabilistic output. If not provided, LogisticRegression with default parameters implemented by sklearn will be used. n_neighbors: int, optional (default=10) How many neighbors of the selected instance will be used to evaluate the oracles.
Returns:	selected_instance: int The index of selected instance. selected_oracle: int or str The index of selected oracle. If a list is given, the index of oracle will be returned. If a Oracles object is given, the oracle name will be returned.

Parameters:

label_index: {list, np.ndarray, IndexCollection}: The indexes of labeled samples.
unlabel_index: {list, np.ndarray, IndexCollection}: The indexes of unlabeled samples.
model: object, optional (default=None): Current classification model, should have the 'predict_proba' method for probabilistic output.
If not provided, LogisticRegression with default parameters implemented by sklearn will be used.
n_neighbors: int, optional (default=10): How many neighbors of the selected instance will be used
to evaluate the oracles.

Returns:

selected_instance: int: The index of selected instance.
selected_oracle: int or str: The index of selected oracle.
If a list is given, the index of oracle will be returned.
If a Oracles object is given, the oracle name will be returned.

select_by_prediction_mat

select_by_prediction_mat(self, label_index, unlabel_index, predict, **kwargs)

Query from oracles. Return the index of selected instance and oracle.

Parameters:	label_index: {list, np.ndarray, IndexCollection} The indexes of labeled samples. unlabel_index: {list, np.ndarray, IndexCollection} The indexes of unlabeled samples. predict: : 2d array, shape [n_samples, n_classes] The probabilistic prediction matrix for the unlabeled set. n_neighbors: int, optional (default=10) How many neighbors of the selected instance will be used to evaluate the oracles.
Returns:	selected_instance: int The index of selected instance. selected_oracle: int or str The index of selected oracle. If a list is given, the index of oracle will be returned. If a Oracles object is given, the oracle name will be returned.

Parameters:

label_index: {list, np.ndarray, IndexCollection}: The indexes of labeled samples.
unlabel_index: {list, np.ndarray, IndexCollection}: The indexes of unlabeled samples.
predict: : 2d array, shape [n_samples, n_classes]: The probabilistic prediction matrix for the unlabeled set.
n_neighbors: int, optional (default=10): How many neighbors of the selected instance will be used
to evaluate the oracles.

Returns:

selected_instance: int: The index of selected instance.
selected_oracle: int or str: The index of selected oracle.
If a list is given, the index of oracle will be returned.
If a Oracles object is given, the oracle name will be returned.