alipy.query_strategy.multi_label.
QueryNoisyOraclesIEthresh
IEthresh will select a batch of oracles to label the selected instance.
It will score for each oracle according to the difference between their
labeling results and the majority vote results.
At each iteration, a batch of oracles whose scores are larger than a threshold will be selected.
Oracle with a higher score is more likely to be selected.
References
----------
[1] Donmez P , Carbonell J G , Schneider J . Efficiently learning the accuracy of labeling
sources for selective sampling.[C] ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining. ACM, 2009.
Methods
init
__init__(self, X, y, oracles, initial_labeled_indexes, **kwargs)
Parameters:
|
-
X: 2D array, optional (default=None)
-
Feature matrix of the whole dataset. It is a reference which will not use additional memory.
-
y: array-like, optional (default=None)
-
Label matrix of the whole dataset. It is a reference which will not use additional memory.
-
oracles: {list, alipy.oracle.Oracles}
-
An alipy.oracle.Oracle object that contains all the
available oracles or a list of oracles.
Each oracle should be a alipy.oracle.Oracle object.
-
initial_labeled_indexes: {list, np.ndarray, IndexCollection}
-
The indexes of initially labeled samples. Used for initializing the scores of each oracle.
-
epsilon: float, optional (default=0.1)
-
The value to determine how many oracles will be selected.
S_t = {a|UI(a) >= epsilon * max UI(a)}
|
select
select(self, label_index, unlabel_index, model=None, **kwargs)
Query from oracles. Return the index of selected instance and oracle.
Parameters:
|
-
label_index: {list, np.ndarray, IndexCollection}
-
The indexes of labeled samples.
-
unlabel_index: {list, np.ndarray, IndexCollection}
-
The indexes of unlabeled samples.
-
model: object, optional (default=None)
-
Current classification model, should have the 'predict_proba' method for probabilistic output.
If not provided, LogisticRegression with default parameters implemented by sklearn will be used.
-
n_neighbors: int, optional (default=10)
-
How many neighbors of the selected instance will be used
to evaluate the oracles.
|
Returns:
|
-
selected_instance: int
-
The index of selected instance.
-
selected_oracle: int or str
-
The index of selected oracle.
If a list is given, the index of oracle will be returned.
If a Oracles object is given, the oracle name will be returned.
|
select_by_prediction_mat
select_by_prediction_mat(self, label_index, unlabel_index, predict, **kwargs)
Query from oracles. Return the index of selected instance and oracle.
Parameters:
|
-
label_index: {list, np.ndarray, IndexCollection}
-
The indexes of labeled samples.
-
unlabel_index: {list, np.ndarray, IndexCollection}
-
The indexes of unlabeled samples.
-
predict: : 2d array, shape [n_samples, n_classes]
-
The probabilistic prediction matrix for the unlabeled set.
-
n_neighbors: int, optional (default=10)
-
How many neighbors of the selected instance will be used
to evaluate the oracles.
|
Returns:
|
-
selected_instance: int
-
The index of selected instance.
-
selected_oracle: int or str
-
The index of selected oracle.
If a list is given, the index of oracle will be returned.
If a Oracles object is given, the oracle name will be returned.
|
select_by_given_instance
select_by_given_instance(self, selected_instance)
Select oracle to query by providing the index of selected instance.
Parameters:
|
-
selected_instance: int
-
The indexes of selected samples. Should be a member of unlabeled set.
|
Returns:
|
-
selected_oracles: list
-
The selected oracles for querying.
|