alipy.query_strategy.multi_label. QueryNoisyOraclesIEthresh

IEthresh will select a batch of oracles to label the selected instance. It will score for each oracle according to the difference between their labeling results and the majority vote results.

At each iteration, a batch of oracles whose scores are larger than a threshold will be selected. Oracle with a higher score is more likely to be selected.

References

----------

[1] Donmez P , Carbonell J G , Schneider J . Efficiently learning the accuracy of labeling sources for selective sampling.[C] ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2009.

Methods

init

__init__(self, X, y, oracles, initial_labeled_indexes, **kwargs)
Parameters:
X: 2D array, optional (default=None)
Feature matrix of the whole dataset. It is a reference which will not use additional memory.
y: array-like, optional (default=None)
Label matrix of the whole dataset. It is a reference which will not use additional memory.
oracles: {list, alipy.oracle.Oracles}
An alipy.oracle.Oracle object that contains all the
available oracles or a list of oracles.
Each oracle should be a alipy.oracle.Oracle object.
initial_labeled_indexes: {list, np.ndarray, IndexCollection}
The indexes of initially labeled samples. Used for initializing the scores of each oracle.
epsilon: float, optional (default=0.1)
The value to determine how many oracles will be selected.
S_t = {a|UI(a) >= epsilon * max UI(a)}

select

select(self, label_index, unlabel_index, model=None, **kwargs)

Query from oracles. Return the index of selected instance and oracle.

Parameters:
label_index: {list, np.ndarray, IndexCollection}
The indexes of labeled samples.
unlabel_index: {list, np.ndarray, IndexCollection}
The indexes of unlabeled samples.
model: object, optional (default=None)
Current classification model, should have the 'predict_proba' method for probabilistic output.
If not provided, LogisticRegression with default parameters implemented by sklearn will be used.
n_neighbors: int, optional (default=10)
How many neighbors of the selected instance will be used
to evaluate the oracles.
Returns:
selected_instance: int
The index of selected instance.
selected_oracle: int or str
The index of selected oracle.
If a list is given, the index of oracle will be returned.
If a Oracles object is given, the oracle name will be returned.

select_by_prediction_mat

select_by_prediction_mat(self, label_index, unlabel_index, predict, **kwargs)

Query from oracles. Return the index of selected instance and oracle.

Parameters:
label_index: {list, np.ndarray, IndexCollection}
The indexes of labeled samples.
unlabel_index: {list, np.ndarray, IndexCollection}
The indexes of unlabeled samples.
predict: : 2d array, shape [n_samples, n_classes]
The probabilistic prediction matrix for the unlabeled set.
n_neighbors: int, optional (default=10)
How many neighbors of the selected instance will be used
to evaluate the oracles.
Returns:
selected_instance: int
The index of selected instance.
selected_oracle: int or str
The index of selected oracle.
If a list is given, the index of oracle will be returned.
If a Oracles object is given, the oracle name will be returned.

select_by_given_instance

select_by_given_instance(self, selected_instance)

Select oracle to query by providing the index of selected instance.

Parameters:
selected_instance: int
The indexes of selected samples. Should be a member of unlabeled set.
Returns:
selected_oracles: list
The selected oracles for querying.

Copyright © 2018, alipy developers (BSD 3 License).